2023-11-25 20:20:52,063 INFO [train_asr.py:1303] (3/4) Training started 2023-11-25 20:20:52,063 INFO [train_asr.py:1313] (3/4) Device: cuda:3 2023-11-25 20:20:52,078 INFO [train_asr.py:1325] (3/4) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '2b2ac14b326d61d79d04e53fbd69b1ff6d630411', 'k2-git-date': 'Thu Aug 24 05:58:26 2023', 'lhotse-version': '1.16.0', 'torch-version': '2.0.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.1', 'icefall-git-branch': 'multi_KD', 'icefall-git-sha1': 'a9ea720f-dirty', 'icefall-git-date': 'Wed Nov 22 17:48:49 2023', 'icefall-path': '/star-xy/softwares/icefall_development/icefall_multi_KD', 'k2-path': '/star-xy/softwares/k2_development/k2/k2/python/k2/__init__.py', 'lhotse-path': '/star-xy/softwares/anaconda3/envs/multi_KD/lib/python3.10/site-packages/lhotse/__init__.py', 'hostname': 'de-74279-k2-train-10-1125112954-6d844cbdd8-m6xmg', 'IP address': '10.177.94.19'}, 'world_size': 4, 'master_port': 13490, 'tensorboard': True, 'num_epochs': 50, 'start_epoch': 39, 'start_batch': 0, 'exp_dir': PosixPath('multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0'), 'bpe_model': 'data/lang_bpe_500/bpe.model', 'base_lr': 0.045, 'lr_batches': 7500, 'lr_epochs': 3.5, 'ref_duration': 600, 'context_size': 2, 'prune_range': 5, 'lm_scale': 0.25, 'am_scale': 0.0, 'simple_loss_scale': 0.5, 'ctc_loss_scale': 0.2, 'audio_tagging_loss_scale': 1.0, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'use_fp16': True, 'stop_early': False, 'do_finetune': False, 'init_modules': None, 'freeze_modules': None, 'finetune_ckpt': None, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': False, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'use_transducer': True, 'use_ctc': False, 'do_audio_tagging': True, 'use_encoder_projection': False, 'encoder_projection_dim': -1, 'freeze_encoder': False, 'freezing_encoder_layer_index': '-1', 'freeze_encoder_steps': -1, 'encoder_lr_scale': 1.0, 'beats_label': True, 'full_libri': True, 'mini_libri': False, 'use_vox2': False, 'use_libriheavy': False, 'libriheavy_subset': 'small', 'use_audioset': True, 'audioset_subset': 'unbalanced', 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 1000, 'bucketing_sampler': False, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'enable_audioset': False, 'use_musan_separately': False, 'input_strategy': 'PrecomputedFeatures', 'drop_features': False, 'return_audio': False, 'use_beats': True, 'use_ecapa': True, 'use_whisper': True, 'whisper_mvq': False, 'beats_ckpt': 'data/models/BEATs/BEATs_iter3_plus_AS2M_finetuned_on_AS2M_cpt2.pt', 'whisper_version': 'small.en', 'blank_id': 0, 'vocab_size': 500} 2023-11-25 20:20:52,079 INFO [train_asr.py:1334] (3/4) About to create model 2023-11-25 20:20:52,787 INFO [train_asr.py:1338] (3/4) Number of model parameters: 65819362 2023-11-25 20:20:52,787 INFO [train_asr.py:1362] (3/4) Using CED labels! 2023-11-25 20:20:52,788 INFO [checkpoint.py:112] (3/4) Loading checkpoint from multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/epoch-38.pt 2023-11-25 20:20:56,538 INFO [train_asr.py:1370] (3/4) Setting the lr scale of parameters in encoder and encoder_embed to 1.0 2023-11-25 20:20:59,868 INFO [train_asr.py:1379] (3/4) Using DDP 2023-11-25 20:21:00,337 INFO [train_asr.py:1402] (3/4) Loading optimizer state dict 2023-11-25 20:21:00,822 INFO [train_asr.py:1410] (3/4) Loading scheduler state dict 2023-11-25 20:21:00,877 INFO [train_asr.py:1432] (3/4) Getting audioset cuts 2023-11-25 20:21:00,877 INFO [kd_datamodule.py:784] (3/4) About to get the audioset cuts. 2023-11-25 20:21:00,964 INFO [train_asr.py:1438] (3/4) Using mux to combine Librispeech with audioset 2023-11-25 20:21:00,964 INFO [train_asr.py:1449] (3/4) CutSet(len=2748469) [underlying data type: ] 2023-11-25 20:21:09,894 INFO [kd_datamodule.py:396] (3/4) Enable MUSAN 2023-11-25 20:21:09,894 INFO [kd_datamodule.py:397] (3/4) About to get Musan cuts 2023-11-25 20:21:12,685 INFO [kd_datamodule.py:427] (3/4) Enable SpecAugment 2023-11-25 20:21:12,685 INFO [kd_datamodule.py:428] (3/4) Time warp factor: 80 2023-11-25 20:21:12,685 INFO [kd_datamodule.py:438] (3/4) Num frame mask: 10 2023-11-25 20:21:12,686 INFO [kd_datamodule.py:451] (3/4) About to create train dataset 2023-11-25 20:21:12,687 INFO [kd_datamodule.py:487] (3/4) Using SimpleCutSampler 2023-11-25 20:21:12,687 INFO [kd_datamodule.py:495] (3/4) About to create train dataloader 2023-11-25 20:21:12,690 INFO [kd_datamodule.py:802] (3/4) About to get the audioset eval cuts. 2023-11-25 20:21:12,691 INFO [train_asr.py:1513] (3/4) CutSet(len=20681) [underlying data type: ] 2023-11-25 20:21:12,744 INFO [kd_datamodule.py:529] (3/4) About to create dev dataset 2023-11-25 20:21:13,203 INFO [kd_datamodule.py:550] (3/4) About to create dev dataloader 2023-11-25 20:21:13,204 INFO [train_asr.py:1527] (3/4) Loading grad scaler state dict 2023-11-25 20:21:48,360 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 0, loss[loss=0.1316, simple_loss=0.1078, pruned_loss=0.01193, audio_tagging_loss=0.06571, over 15471.00 frames. ], tot_loss[loss=0.1316, simple_loss=0.1078, pruned_loss=0.01193, audio_tagging_loss=0.06571, over 15471.00 frames. ], batch size: 55, lr: 1.75e-03, grad_scale: 32.0 2023-11-25 20:21:48,361 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-25 20:22:12,469 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.3500, 5.0102, 4.6358, 5.1773], device='cuda:3') 2023-11-25 20:22:20,736 INFO [train_asr.py:1267] (3/4) Epoch 39, validation: loss=0.127, simple_loss=0.05083, pruned_loss=0.005243, audio_tagging_loss=0.09629, over 4681554.00 frames. 2023-11-25 20:22:20,737 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-25 20:22:25,486 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3046020.0, ans=0.125 2023-11-25 20:22:28,121 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.whiten.whitening_limit, batch_count=3046020.0, ans=12.0 2023-11-25 20:22:29,948 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3046020.0, ans=0.0 2023-11-25 20:22:41,589 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3046086.6666666665, ans=0.0 2023-11-25 20:22:59,786 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.98 vs. limit=15.0 2023-11-25 20:23:00,469 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3046220.0, ans=0.0 2023-11-25 20:23:00,826 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.69 vs. limit=22.5 2023-11-25 20:23:09,131 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.71 vs. limit=15.0 2023-11-25 20:23:10,289 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.25 vs. limit=15.0 2023-11-25 20:23:10,926 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 456950 2023-11-25 20:23:16,258 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 50, loss[loss=0.08668, simple_loss=0.0902, pruned_loss=0.01113, audio_tagging_loss=0.03045, over 14589.00 frames. ], tot_loss[loss=0.1001, simple_loss=0.09119, pruned_loss=0.01264, audio_tagging_loss=0.0419, over 690879.47 frames. ], batch size: 55, lr: 1.75e-03, grad_scale: 32.0 2023-11-25 20:23:28,847 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3046420.0, ans=0.0 2023-11-25 20:23:36,331 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.43 vs. limit=22.5 2023-11-25 20:23:37,455 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.62 vs. limit=15.0 2023-11-25 20:23:39,123 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.865e+01 9.525e+01 1.035e+02 1.246e+02 6.272e+02, threshold=2.069e+02, percent-clipped=17.0 2023-11-25 20:23:43,643 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3046486.6666666665, ans=0.1 2023-11-25 20:23:59,848 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.75 vs. limit=12.0 2023-11-25 20:24:06,651 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 457000 2023-11-25 20:24:12,185 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 100, loss[loss=0.08841, simple_loss=0.095, pruned_loss=0.01406, audio_tagging_loss=0.02685, over 15334.00 frames. ], tot_loss[loss=0.09395, simple_loss=0.09117, pruned_loss=0.01313, audio_tagging_loss=0.03523, over 1212908.73 frames. ], batch size: 57, lr: 1.75e-03, grad_scale: 32.0 2023-11-25 20:24:14,483 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3046686.6666666665, ans=0.125 2023-11-25 20:24:48,092 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3046886.6666666665, ans=0.0 2023-11-25 20:25:00,722 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 457050 2023-11-25 20:25:05,938 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 150, loss[loss=0.07479, simple_loss=0.1028, pruned_loss=0.01157, audio_tagging_loss=0.01184, over 14886.00 frames. ], tot_loss[loss=0.0874, simple_loss=0.09114, pruned_loss=0.01287, audio_tagging_loss=0.02896, over 1625670.22 frames. ], batch size: 56, lr: 1.75e-03, grad_scale: 32.0 2023-11-25 20:25:11,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3047020.0, ans=0.0 2023-11-25 20:25:22,128 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-25 20:25:28,190 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.181e+01 8.752e+01 9.435e+01 1.031e+02 1.991e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-25 20:25:43,814 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-25 20:25:44,880 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 20:25:55,176 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 457100 2023-11-25 20:26:00,405 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 200, loss[loss=0.06908, simple_loss=0.08774, pruned_loss=0.01142, audio_tagging_loss=0.01379, over 14999.00 frames. ], tot_loss[loss=0.08189, simple_loss=0.09052, pruned_loss=0.01286, audio_tagging_loss=0.02377, over 1940682.07 frames. ], batch size: 55, lr: 1.75e-03, grad_scale: 32.0 2023-11-25 20:26:20,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3047420.0, ans=0.1 2023-11-25 20:26:37,748 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3047553.3333333335, ans=0.125 2023-11-25 20:26:42,024 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3047553.3333333335, ans=0.0 2023-11-25 20:26:45,103 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3047620.0, ans=0.125 2023-11-25 20:26:50,191 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 457150 2023-11-25 20:26:54,492 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3047686.6666666665, ans=0.125 2023-11-25 20:26:55,839 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 250, loss[loss=0.07227, simple_loss=0.08499, pruned_loss=0.01574, audio_tagging_loss=0.01403, over 14915.00 frames. ], tot_loss[loss=0.07807, simple_loss=0.09027, pruned_loss=0.01295, audio_tagging_loss=0.02, over 2184942.80 frames. ], batch size: 54, lr: 1.75e-03, grad_scale: 32.0 2023-11-25 20:27:02,310 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3047686.6666666665, ans=0.125 2023-11-25 20:27:16,744 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.631e+01 9.232e+01 9.778e+01 1.082e+02 1.251e+02, threshold=1.956e+02, percent-clipped=0.0 2023-11-25 20:27:19,019 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3047820.0, ans=0.95 2023-11-25 20:27:26,324 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3047886.6666666665, ans=0.125 2023-11-25 20:27:36,872 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3047886.6666666665, ans=0.2 2023-11-25 20:27:42,496 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3047953.3333333335, ans=0.0 2023-11-25 20:27:44,444 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 457200 2023-11-25 20:27:50,101 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 300, loss[loss=0.06708, simple_loss=0.08964, pruned_loss=0.009759, audio_tagging_loss=0.0125, over 15270.00 frames. ], tot_loss[loss=0.07681, simple_loss=0.09225, pruned_loss=0.01321, audio_tagging_loss=0.01747, over 2373512.99 frames. ], batch size: 56, lr: 1.75e-03, grad_scale: 32.0 2023-11-25 20:28:00,692 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3048086.6666666665, ans=0.125 2023-11-25 20:28:01,322 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.17 vs. limit=22.5 2023-11-25 20:28:02,153 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.80 vs. limit=22.5 2023-11-25 20:28:02,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3048086.6666666665, ans=0.125 2023-11-25 20:28:08,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3048086.6666666665, ans=0.125 2023-11-25 20:28:22,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3048220.0, ans=0.125 2023-11-25 20:28:23,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3048220.0, ans=0.0 2023-11-25 20:28:29,192 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3048220.0, ans=0.1 2023-11-25 20:28:34,394 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3048286.6666666665, ans=0.1 2023-11-25 20:28:38,409 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 457250 2023-11-25 20:28:43,509 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 350, loss[loss=0.05497, simple_loss=0.07237, pruned_loss=0.007559, audio_tagging_loss=0.01122, over 15678.00 frames. ], tot_loss[loss=0.07488, simple_loss=0.09197, pruned_loss=0.01328, audio_tagging_loss=0.01562, over 2523648.54 frames. ], batch size: 60, lr: 1.75e-03, grad_scale: 32.0 2023-11-25 20:29:05,851 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.598e+01 8.941e+01 9.403e+01 1.014e+02 1.528e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-25 20:29:09,271 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3048486.6666666665, ans=0.2 2023-11-25 20:29:17,526 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3048553.3333333335, ans=0.0 2023-11-25 20:29:26,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3048620.0, ans=0.2 2023-11-25 20:29:28,950 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3048620.0, ans=0.5 2023-11-25 20:29:31,905 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 457300 2023-11-25 20:29:38,103 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 400, loss[loss=0.0776, simple_loss=0.1089, pruned_loss=0.01536, audio_tagging_loss=0.007804, over 14729.00 frames. ], tot_loss[loss=0.07337, simple_loss=0.09144, pruned_loss=0.01315, audio_tagging_loss=0.0145, over 2637982.32 frames. ], batch size: 53, lr: 1.75e-03, grad_scale: 32.0 2023-11-25 20:29:47,240 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3048686.6666666665, ans=0.09899494936611666 2023-11-25 20:29:56,429 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3048753.3333333335, ans=0.0 2023-11-25 20:30:04,727 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 20:30:11,106 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3048886.6666666665, ans=10.0 2023-11-25 20:30:19,346 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3048886.6666666665, ans=0.07 2023-11-25 20:30:26,437 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 457350 2023-11-25 20:30:26,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3048953.3333333335, ans=0.2 2023-11-25 20:30:32,074 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 450, loss[loss=0.05592, simple_loss=0.06516, pruned_loss=0.01066, audio_tagging_loss=0.01269, over 14427.00 frames. ], tot_loss[loss=0.0727, simple_loss=0.09198, pruned_loss=0.01318, audio_tagging_loss=0.01352, over 2726283.46 frames. ], batch size: 57, lr: 1.75e-03, grad_scale: 32.0 2023-11-25 20:30:52,807 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.321e+01 8.642e+01 9.445e+01 1.011e+02 1.527e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-25 20:31:02,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3049153.3333333335, ans=0.2 2023-11-25 20:31:05,644 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3049220.0, ans=0.125 2023-11-25 20:31:20,674 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 457400 2023-11-25 20:31:26,190 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 500, loss[loss=0.07228, simple_loss=0.09755, pruned_loss=0.0162, audio_tagging_loss=0.007307, over 14547.00 frames. ], tot_loss[loss=0.0719, simple_loss=0.09169, pruned_loss=0.01319, audio_tagging_loss=0.01287, over 2797828.70 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 20:31:37,218 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3049420.0, ans=0.125 2023-11-25 20:31:55,006 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3049486.6666666665, ans=0.125 2023-11-25 20:31:58,015 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3049553.3333333335, ans=0.125 2023-11-25 20:31:59,089 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3049553.3333333335, ans=0.1 2023-11-25 20:31:59,478 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.06 vs. limit=15.0 2023-11-25 20:32:06,346 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3049553.3333333335, ans=0.125 2023-11-25 20:32:07,320 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3049553.3333333335, ans=0.1 2023-11-25 20:32:10,389 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3049620.0, ans=0.125 2023-11-25 20:32:14,260 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 457450 2023-11-25 20:32:14,467 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3049620.0, ans=0.125 2023-11-25 20:32:18,487 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.89 vs. limit=15.0 2023-11-25 20:32:20,575 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 550, loss[loss=0.06151, simple_loss=0.08323, pruned_loss=0.008663, audio_tagging_loss=0.01123, over 15462.00 frames. ], tot_loss[loss=0.07085, simple_loss=0.09091, pruned_loss=0.01299, audio_tagging_loss=0.01241, over 2847348.57 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:32:23,939 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3049686.6666666665, ans=0.125 2023-11-25 20:32:36,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3049753.3333333335, ans=0.0 2023-11-25 20:32:42,782 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.403e+01 8.908e+01 9.696e+01 1.036e+02 1.301e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-25 20:33:08,806 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 457500 2023-11-25 20:33:13,976 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 600, loss[loss=0.07956, simple_loss=0.1007, pruned_loss=0.01564, audio_tagging_loss=0.01358, over 16080.00 frames. ], tot_loss[loss=0.07033, simple_loss=0.09087, pruned_loss=0.01283, audio_tagging_loss=0.01207, over 2896546.30 frames. ], batch size: 61, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:33:16,732 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3050020.0, ans=0.0 2023-11-25 20:33:21,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3050020.0, ans=0.0 2023-11-25 20:33:55,649 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.90 vs. limit=15.0 2023-11-25 20:34:02,998 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 457550 2023-11-25 20:34:03,192 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3050286.6666666665, ans=0.125 2023-11-25 20:34:07,409 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3050353.3333333335, ans=0.0 2023-11-25 20:34:08,172 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 650, loss[loss=0.06666, simple_loss=0.08541, pruned_loss=0.01075, audio_tagging_loss=0.01321, over 15840.00 frames. ], tot_loss[loss=0.06999, simple_loss=0.09082, pruned_loss=0.01289, audio_tagging_loss=0.01168, over 2924373.01 frames. ], batch size: 62, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:34:12,622 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3050353.3333333335, ans=0.1 2023-11-25 20:34:31,123 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.625e+01 8.926e+01 9.451e+01 1.004e+02 1.811e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-25 20:34:37,526 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3050486.6666666665, ans=0.04949747468305833 2023-11-25 20:34:39,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3050553.3333333335, ans=0.2 2023-11-25 20:34:45,405 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3050553.3333333335, ans=0.125 2023-11-25 20:34:56,695 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 457600 2023-11-25 20:35:02,819 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 700, loss[loss=0.07078, simple_loss=0.09493, pruned_loss=0.01209, audio_tagging_loss=0.01122, over 15320.00 frames. ], tot_loss[loss=0.07003, simple_loss=0.09167, pruned_loss=0.01286, audio_tagging_loss=0.01133, over 2952715.29 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:35:05,712 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3050686.6666666665, ans=0.125 2023-11-25 20:35:13,135 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.67 vs. limit=15.0 2023-11-25 20:35:14,118 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.11 vs. limit=22.5 2023-11-25 20:35:15,212 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2023-11-25 20:35:22,166 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3050753.3333333335, ans=0.125 2023-11-25 20:35:28,955 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3050820.0, ans=0.0 2023-11-25 20:35:36,161 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3050886.6666666665, ans=0.125 2023-11-25 20:35:52,104 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 457650 2023-11-25 20:35:57,274 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 750, loss[loss=0.08699, simple_loss=0.1222, pruned_loss=0.01878, audio_tagging_loss=0.007117, over 15210.00 frames. ], tot_loss[loss=0.07058, simple_loss=0.09284, pruned_loss=0.01317, audio_tagging_loss=0.01099, over 2979147.77 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:36:02,634 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3051020.0, ans=0.0 2023-11-25 20:36:19,683 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.528e+01 8.878e+01 9.438e+01 1.006e+02 1.228e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-25 20:36:41,709 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3051286.6666666665, ans=0.125 2023-11-25 20:36:45,765 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 457700 2023-11-25 20:36:51,357 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 800, loss[loss=0.05043, simple_loss=0.06681, pruned_loss=0.0073, audio_tagging_loss=0.009727, over 15939.00 frames. ], tot_loss[loss=0.07013, simple_loss=0.09248, pruned_loss=0.01304, audio_tagging_loss=0.01085, over 2996667.70 frames. ], batch size: 60, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 20:37:08,041 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.25 vs. limit=15.0 2023-11-25 20:37:08,667 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3051420.0, ans=0.1 2023-11-25 20:37:09,589 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.15 vs. limit=15.0 2023-11-25 20:37:23,305 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3051553.3333333335, ans=0.0 2023-11-25 20:37:33,092 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3051553.3333333335, ans=0.1 2023-11-25 20:37:37,168 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3051620.0, ans=0.1 2023-11-25 20:37:40,150 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 457750 2023-11-25 20:37:45,297 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 850, loss[loss=0.08063, simple_loss=0.1077, pruned_loss=0.01596, audio_tagging_loss=0.01081, over 15367.00 frames. ], tot_loss[loss=0.07008, simple_loss=0.09244, pruned_loss=0.01302, audio_tagging_loss=0.01083, over 3006142.72 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 20:38:01,742 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3051753.3333333335, ans=0.0 2023-11-25 20:38:08,813 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.903e+01 8.812e+01 9.277e+01 9.996e+01 1.418e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-25 20:38:16,960 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3051820.0, ans=0.2 2023-11-25 20:38:25,218 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3051886.6666666665, ans=0.2 2023-11-25 20:38:35,660 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 457800 2023-11-25 20:38:41,241 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 900, loss[loss=0.0706, simple_loss=0.09815, pruned_loss=0.01222, audio_tagging_loss=0.009307, over 14865.00 frames. ], tot_loss[loss=0.06975, simple_loss=0.09204, pruned_loss=0.01292, audio_tagging_loss=0.01081, over 3014303.66 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 20:38:50,022 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.82 vs. limit=10.0 2023-11-25 20:38:51,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3052086.6666666665, ans=10.0 2023-11-25 20:38:57,280 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.96 vs. limit=15.0 2023-11-25 20:39:00,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3052086.6666666665, ans=0.125 2023-11-25 20:39:03,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3052153.3333333335, ans=0.125 2023-11-25 20:39:11,089 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3052153.3333333335, ans=0.2 2023-11-25 20:39:29,903 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 457850 2023-11-25 20:39:32,105 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3052286.6666666665, ans=0.125 2023-11-25 20:39:35,139 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 950, loss[loss=0.07434, simple_loss=0.1058, pruned_loss=0.01347, audio_tagging_loss=0.00798, over 14847.00 frames. ], tot_loss[loss=0.06989, simple_loss=0.0928, pruned_loss=0.01297, audio_tagging_loss=0.01051, over 3023430.95 frames. ], batch size: 53, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 20:39:35,356 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3052353.3333333335, ans=0.125 2023-11-25 20:39:40,599 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3052353.3333333335, ans=0.125 2023-11-25 20:39:58,561 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.755e+01 8.948e+01 9.425e+01 1.001e+02 1.201e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-25 20:39:58,805 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3052486.6666666665, ans=0.2 2023-11-25 20:40:13,541 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.19 vs. limit=12.0 2023-11-25 20:40:14,788 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.42 vs. limit=15.0 2023-11-25 20:40:16,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3052553.3333333335, ans=0.0 2023-11-25 20:40:19,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3052620.0, ans=0.0 2023-11-25 20:40:23,495 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 457900 2023-11-25 20:40:27,602 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.48 vs. limit=22.5 2023-11-25 20:40:29,243 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 1000, loss[loss=0.09649, simple_loss=0.1499, pruned_loss=0.016, audio_tagging_loss=0.005513, over 15767.00 frames. ], tot_loss[loss=0.0697, simple_loss=0.09291, pruned_loss=0.01302, audio_tagging_loss=0.01022, over 3024565.42 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 20:40:42,604 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3052753.3333333335, ans=0.2 2023-11-25 20:40:42,747 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.75 vs. limit=15.0 2023-11-25 20:40:49,935 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3052753.3333333335, ans=0.125 2023-11-25 20:40:52,823 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 20:41:02,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3052886.6666666665, ans=0.04949747468305833 2023-11-25 20:41:19,409 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 457950 2023-11-25 20:41:24,730 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 1050, loss[loss=0.07163, simple_loss=0.08419, pruned_loss=0.01755, audio_tagging_loss=0.01198, over 15637.00 frames. ], tot_loss[loss=0.06988, simple_loss=0.09345, pruned_loss=0.01323, audio_tagging_loss=0.009931, over 3032646.86 frames. ], batch size: 60, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:41:36,541 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3053086.6666666665, ans=0.0 2023-11-25 20:41:37,591 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3053086.6666666665, ans=0.2 2023-11-25 20:41:43,805 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3053086.6666666665, ans=0.1 2023-11-25 20:41:43,841 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3053086.6666666665, ans=0.125 2023-11-25 20:41:46,400 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=3053153.3333333335, ans=6.0 2023-11-25 20:41:48,835 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.327e+01 8.789e+01 9.440e+01 1.020e+02 1.231e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-25 20:41:54,929 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3053153.3333333335, ans=0.0 2023-11-25 20:41:56,016 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-25 20:42:05,393 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3053220.0, ans=0.125 2023-11-25 20:42:06,469 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3053220.0, ans=0.125 2023-11-25 20:42:10,503 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3053286.6666666665, ans=0.125 2023-11-25 20:42:13,595 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 458000 2023-11-25 20:42:19,168 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 1100, loss[loss=0.04902, simple_loss=0.05553, pruned_loss=0.009195, audio_tagging_loss=0.01206, over 14281.00 frames. ], tot_loss[loss=0.06901, simple_loss=0.09226, pruned_loss=0.01304, audio_tagging_loss=0.009844, over 3030102.06 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 20:42:21,250 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 20:42:24,470 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3053353.3333333335, ans=0.0 2023-11-25 20:42:24,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3053353.3333333335, ans=0.1 2023-11-25 20:42:31,231 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.88 vs. limit=15.0 2023-11-25 20:42:36,469 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3053420.0, ans=0.125 2023-11-25 20:42:53,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3053553.3333333335, ans=0.125 2023-11-25 20:43:07,623 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 458050 2023-11-25 20:43:12,715 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 1150, loss[loss=0.08685, simple_loss=0.1223, pruned_loss=0.01589, audio_tagging_loss=0.009802, over 15830.00 frames. ], tot_loss[loss=0.06903, simple_loss=0.09244, pruned_loss=0.01309, audio_tagging_loss=0.009725, over 3035363.66 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 20:43:38,705 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.982e+01 8.733e+01 9.355e+01 1.009e+02 1.328e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-25 20:43:54,597 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3053886.6666666665, ans=0.125 2023-11-25 20:44:02,215 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 458100 2023-11-25 20:44:08,912 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 1200, loss[loss=0.06162, simple_loss=0.08285, pruned_loss=0.009932, audio_tagging_loss=0.01027, over 14372.00 frames. ], tot_loss[loss=0.06834, simple_loss=0.09132, pruned_loss=0.01298, audio_tagging_loss=0.009696, over 3028149.17 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:44:24,958 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3054086.6666666665, ans=0.125 2023-11-25 20:44:27,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3054086.6666666665, ans=0.1 2023-11-25 20:44:57,736 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 458150 2023-11-25 20:44:57,998 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3054286.6666666665, ans=0.125 2023-11-25 20:45:02,905 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 1250, loss[loss=0.07567, simple_loss=0.1026, pruned_loss=0.01506, audio_tagging_loss=0.009318, over 15562.00 frames. ], tot_loss[loss=0.06795, simple_loss=0.09073, pruned_loss=0.0129, audio_tagging_loss=0.009682, over 3024909.07 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:45:12,546 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3054420.0, ans=0.125 2023-11-25 20:45:25,382 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3054486.6666666665, ans=0.125 2023-11-25 20:45:27,922 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.972e+01 9.172e+01 9.748e+01 1.035e+02 1.279e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-25 20:45:33,797 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3054486.6666666665, ans=0.0 2023-11-25 20:45:38,595 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.22 vs. limit=22.5 2023-11-25 20:45:51,546 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 458200 2023-11-25 20:45:51,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3054620.0, ans=0.05 2023-11-25 20:45:56,301 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3054686.6666666665, ans=0.125 2023-11-25 20:45:57,119 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 1300, loss[loss=0.04949, simple_loss=0.05866, pruned_loss=0.007664, audio_tagging_loss=0.0125, over 14657.00 frames. ], tot_loss[loss=0.06783, simple_loss=0.09071, pruned_loss=0.01283, audio_tagging_loss=0.009646, over 3031885.92 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:46:18,985 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.09 vs. limit=15.0 2023-11-25 20:46:20,111 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.19 vs. limit=15.0 2023-11-25 20:46:22,884 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3054820.0, ans=0.125 2023-11-25 20:46:45,800 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 458250 2023-11-25 20:46:51,264 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3055020.0, ans=0.125 2023-11-25 20:46:52,083 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 1350, loss[loss=0.05934, simple_loss=0.07571, pruned_loss=0.01156, audio_tagging_loss=0.009923, over 14562.00 frames. ], tot_loss[loss=0.06726, simple_loss=0.08982, pruned_loss=0.01267, audio_tagging_loss=0.009674, over 3032376.78 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:47:12,065 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.34 vs. limit=15.0 2023-11-25 20:47:12,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3055153.3333333335, ans=0.2 2023-11-25 20:47:16,591 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.563e+01 8.731e+01 9.396e+01 1.007e+02 1.248e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-25 20:47:24,178 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3055220.0, ans=0.2 2023-11-25 20:47:24,549 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.30 vs. limit=12.0 2023-11-25 20:47:31,804 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 20:47:41,160 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 458300 2023-11-25 20:47:46,332 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 1400, loss[loss=0.06072, simple_loss=0.08598, pruned_loss=0.01115, audio_tagging_loss=0.006578, over 16022.00 frames. ], tot_loss[loss=0.06748, simple_loss=0.08997, pruned_loss=0.01272, audio_tagging_loss=0.009775, over 3038442.53 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:47:52,820 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3055353.3333333335, ans=0.1 2023-11-25 20:48:13,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3055486.6666666665, ans=0.125 2023-11-25 20:48:35,038 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 458350 2023-11-25 20:48:36,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3055620.0, ans=0.125 2023-11-25 20:48:40,190 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 1450, loss[loss=0.09768, simple_loss=0.1168, pruned_loss=0.02229, audio_tagging_loss=0.01701, over 16874.00 frames. ], tot_loss[loss=0.06774, simple_loss=0.09011, pruned_loss=0.01278, audio_tagging_loss=0.009904, over 3039003.54 frames. ], batch size: 63, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:48:46,681 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3055686.6666666665, ans=0.125 2023-11-25 20:48:54,756 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.39 vs. limit=12.0 2023-11-25 20:48:57,463 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.17 vs. limit=15.0 2023-11-25 20:49:03,036 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3055820.0, ans=0.025 2023-11-25 20:49:05,880 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.455e+01 8.671e+01 9.348e+01 1.019e+02 1.564e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-25 20:49:10,244 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3055820.0, ans=0.125 2023-11-25 20:49:28,707 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 458400 2023-11-25 20:49:34,813 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 1500, loss[loss=0.06027, simple_loss=0.07519, pruned_loss=0.008585, audio_tagging_loss=0.01409, over 15133.00 frames. ], tot_loss[loss=0.06725, simple_loss=0.08937, pruned_loss=0.01258, audio_tagging_loss=0.009992, over 3040797.96 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:49:35,313 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.03 vs. limit=15.0 2023-11-25 20:49:38,982 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.79 vs. limit=10.0 2023-11-25 20:49:43,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3056020.0, ans=0.2 2023-11-25 20:49:47,754 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3056086.6666666665, ans=0.1 2023-11-25 20:49:53,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3056086.6666666665, ans=0.125 2023-11-25 20:50:12,970 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3056220.0, ans=0.0 2023-11-25 20:50:17,619 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3056286.6666666665, ans=0.125 2023-11-25 20:50:25,383 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 458450 2023-11-25 20:50:30,452 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 1550, loss[loss=0.05901, simple_loss=0.06744, pruned_loss=0.01342, audio_tagging_loss=0.01187, over 15135.00 frames. ], tot_loss[loss=0.06776, simple_loss=0.0901, pruned_loss=0.01277, audio_tagging_loss=0.009946, over 3037509.20 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:50:44,244 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3056420.0, ans=0.2 2023-11-25 20:50:45,729 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.19 vs. limit=6.0 2023-11-25 20:50:48,240 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-25 20:50:53,883 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.21 vs. limit=15.0 2023-11-25 20:50:54,353 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.420e+01 8.789e+01 9.295e+01 1.002e+02 1.264e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-25 20:51:05,451 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3056553.3333333335, ans=0.0 2023-11-25 20:51:05,815 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.97 vs. limit=15.0 2023-11-25 20:51:09,659 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.85 vs. limit=15.0 2023-11-25 20:51:11,257 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3056553.3333333335, ans=0.1 2023-11-25 20:51:19,491 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 458500 2023-11-25 20:51:23,117 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.46 vs. limit=22.5 2023-11-25 20:51:24,687 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 1600, loss[loss=0.0699, simple_loss=0.09195, pruned_loss=0.01335, audio_tagging_loss=0.01057, over 15130.00 frames. ], tot_loss[loss=0.06805, simple_loss=0.09026, pruned_loss=0.01282, audio_tagging_loss=0.01009, over 3047993.01 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 20:51:30,053 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3056686.6666666665, ans=0.125 2023-11-25 20:52:13,799 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 458550 2023-11-25 20:52:18,976 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 1650, loss[loss=0.07444, simple_loss=0.1018, pruned_loss=0.01159, audio_tagging_loss=0.01194, over 15846.00 frames. ], tot_loss[loss=0.0681, simple_loss=0.09032, pruned_loss=0.01285, audio_tagging_loss=0.01008, over 3051442.69 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 20:52:19,189 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3057020.0, ans=0.0 2023-11-25 20:52:20,702 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3057020.0, ans=0.125 2023-11-25 20:52:20,982 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.16 vs. limit=15.0 2023-11-25 20:52:23,356 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.42 vs. limit=22.5 2023-11-25 20:52:24,319 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3057020.0, ans=0.1 2023-11-25 20:52:34,908 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3057086.6666666665, ans=0.125 2023-11-25 20:52:35,964 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3057086.6666666665, ans=0.125 2023-11-25 20:52:37,075 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.89 vs. limit=10.0 2023-11-25 20:52:44,999 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.639e+01 8.829e+01 9.450e+01 1.011e+02 1.260e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-25 20:53:04,192 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.13 vs. limit=10.0 2023-11-25 20:53:09,736 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 458600 2023-11-25 20:53:15,267 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 1700, loss[loss=0.07492, simple_loss=0.1067, pruned_loss=0.0141, audio_tagging_loss=0.007466, over 16111.00 frames. ], tot_loss[loss=0.06785, simple_loss=0.09017, pruned_loss=0.01274, audio_tagging_loss=0.01002, over 3047316.48 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:53:34,842 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3057420.0, ans=0.035 2023-11-25 20:54:02,040 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3057620.0, ans=0.1 2023-11-25 20:54:05,006 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 458650 2023-11-25 20:54:07,227 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3057620.0, ans=0.125 2023-11-25 20:54:08,162 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3057620.0, ans=0.035 2023-11-25 20:54:10,163 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 1750, loss[loss=0.06764, simple_loss=0.08827, pruned_loss=0.01295, audio_tagging_loss=0.01056, over 16069.00 frames. ], tot_loss[loss=0.06815, simple_loss=0.09082, pruned_loss=0.01281, audio_tagging_loss=0.009937, over 3043761.27 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:54:14,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3057686.6666666665, ans=0.0 2023-11-25 20:54:17,710 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3057686.6666666665, ans=0.125 2023-11-25 20:54:20,883 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3057753.3333333335, ans=0.125 2023-11-25 20:54:27,643 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3057753.3333333335, ans=0.2 2023-11-25 20:54:36,745 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.715e+01 8.599e+01 9.189e+01 9.882e+01 1.189e+02, threshold=1.838e+02, percent-clipped=0.0 2023-11-25 20:54:59,289 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 458700 2023-11-25 20:55:04,394 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 1800, loss[loss=0.04475, simple_loss=0.05378, pruned_loss=0.007788, audio_tagging_loss=0.01008, over 16606.00 frames. ], tot_loss[loss=0.06768, simple_loss=0.09019, pruned_loss=0.01275, audio_tagging_loss=0.009846, over 3039859.92 frames. ], batch size: 65, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:55:28,677 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3058153.3333333335, ans=0.1 2023-11-25 20:55:43,300 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3058220.0, ans=0.1 2023-11-25 20:55:54,747 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 458750 2023-11-25 20:55:55,101 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.14 vs. limit=6.0 2023-11-25 20:56:00,478 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 1850, loss[loss=0.076, simple_loss=0.0911, pruned_loss=0.01815, audio_tagging_loss=0.0123, over 14770.00 frames. ], tot_loss[loss=0.06743, simple_loss=0.09005, pruned_loss=0.01266, audio_tagging_loss=0.009748, over 3036623.28 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:56:26,236 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.929e+01 8.557e+01 9.640e+01 1.041e+02 1.665e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-25 20:56:32,375 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3058553.3333333335, ans=0.1 2023-11-25 20:56:33,277 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3058553.3333333335, ans=0.125 2023-11-25 20:56:34,382 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3058553.3333333335, ans=0.0 2023-11-25 20:56:36,910 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3058553.3333333335, ans=0.1 2023-11-25 20:56:49,847 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 458800 2023-11-25 20:56:55,913 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 1900, loss[loss=0.05013, simple_loss=0.06665, pruned_loss=0.007981, audio_tagging_loss=0.008825, over 14262.00 frames. ], tot_loss[loss=0.06727, simple_loss=0.09005, pruned_loss=0.01269, audio_tagging_loss=0.009555, over 3035469.10 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:56:56,145 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3058686.6666666665, ans=0.125 2023-11-25 20:57:44,947 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 458850 2023-11-25 20:57:50,114 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 1950, loss[loss=0.06442, simple_loss=0.07949, pruned_loss=0.01445, audio_tagging_loss=0.01023, over 15805.00 frames. ], tot_loss[loss=0.06712, simple_loss=0.08994, pruned_loss=0.01266, audio_tagging_loss=0.009488, over 3031063.13 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:58:16,761 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.741e+01 8.855e+01 9.248e+01 9.928e+01 1.852e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-25 20:58:23,227 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.96 vs. limit=15.0 2023-11-25 20:58:39,017 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3059286.6666666665, ans=0.125 2023-11-25 20:58:39,131 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3059286.6666666665, ans=0.125 2023-11-25 20:58:39,951 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 458900 2023-11-25 20:58:46,027 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 2000, loss[loss=0.07813, simple_loss=0.09513, pruned_loss=0.01984, audio_tagging_loss=0.01072, over 15286.00 frames. ], tot_loss[loss=0.06712, simple_loss=0.08967, pruned_loss=0.01271, audio_tagging_loss=0.009571, over 3034078.49 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 20:58:58,706 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3059420.0, ans=0.125 2023-11-25 20:59:04,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3059420.0, ans=0.125 2023-11-25 20:59:23,332 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.92 vs. limit=15.0 2023-11-25 20:59:31,357 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2023-11-25 20:59:34,732 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.63 vs. limit=15.0 2023-11-25 20:59:35,130 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 458950 2023-11-25 20:59:38,435 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3059620.0, ans=0.125 2023-11-25 20:59:40,298 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 2050, loss[loss=0.06847, simple_loss=0.09825, pruned_loss=0.01123, audio_tagging_loss=0.008106, over 16569.00 frames. ], tot_loss[loss=0.06668, simple_loss=0.08907, pruned_loss=0.01259, audio_tagging_loss=0.009551, over 3034082.88 frames. ], batch size: 62, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 20:59:50,770 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.06 vs. limit=15.0 2023-11-25 20:59:56,853 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3059753.3333333335, ans=0.125 2023-11-25 21:00:02,090 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.69 vs. limit=22.5 2023-11-25 21:00:02,700 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3059820.0, ans=0.125 2023-11-25 21:00:08,231 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.735e+01 8.675e+01 9.370e+01 9.808e+01 1.405e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-25 21:00:10,680 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3059820.0, ans=0.125 2023-11-25 21:00:16,261 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.84 vs. limit=6.0 2023-11-25 21:00:27,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3059953.3333333335, ans=0.5 2023-11-25 21:00:29,783 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 459000 2023-11-25 21:00:35,277 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 2100, loss[loss=0.05812, simple_loss=0.07049, pruned_loss=0.01225, audio_tagging_loss=0.01062, over 15710.00 frames. ], tot_loss[loss=0.06701, simple_loss=0.08962, pruned_loss=0.01272, audio_tagging_loss=0.009475, over 3038970.74 frames. ], batch size: 62, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:00:44,186 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3060020.0, ans=0.0 2023-11-25 21:00:54,439 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3060086.6666666665, ans=0.0 2023-11-25 21:01:14,778 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.54 vs. limit=15.0 2023-11-25 21:01:24,402 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 459050 2023-11-25 21:01:30,067 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 2150, loss[loss=0.07163, simple_loss=0.09526, pruned_loss=0.01645, audio_tagging_loss=0.007557, over 15017.00 frames. ], tot_loss[loss=0.06698, simple_loss=0.08967, pruned_loss=0.0127, audio_tagging_loss=0.009452, over 3041986.68 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:01:40,255 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3060420.0, ans=0.125 2023-11-25 21:01:41,318 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3060420.0, ans=0.125 2023-11-25 21:01:47,439 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3060420.0, ans=0.0 2023-11-25 21:01:49,862 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.62 vs. limit=22.5 2023-11-25 21:01:52,931 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.24 vs. limit=15.0 2023-11-25 21:01:58,072 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.190e+01 8.469e+01 9.111e+01 9.697e+01 1.371e+02, threshold=1.822e+02, percent-clipped=0.0 2023-11-25 21:02:03,363 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 21:02:14,494 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3060620.0, ans=0.1 2023-11-25 21:02:19,676 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 459100 2023-11-25 21:02:24,874 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 2200, loss[loss=0.06869, simple_loss=0.0924, pruned_loss=0.01422, audio_tagging_loss=0.008273, over 15796.00 frames. ], tot_loss[loss=0.06719, simple_loss=0.09019, pruned_loss=0.01277, audio_tagging_loss=0.009327, over 3045863.73 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:02:32,415 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3060686.6666666665, ans=0.1 2023-11-25 21:02:36,634 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3060753.3333333335, ans=0.125 2023-11-25 21:02:41,512 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.08 vs. limit=15.0 2023-11-25 21:02:55,016 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.44 vs. limit=15.0 2023-11-25 21:03:00,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3060886.6666666665, ans=0.1 2023-11-25 21:03:13,471 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 459150 2023-11-25 21:03:18,635 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 2250, loss[loss=0.06019, simple_loss=0.08296, pruned_loss=0.01147, audio_tagging_loss=0.007236, over 14899.00 frames. ], tot_loss[loss=0.06762, simple_loss=0.09086, pruned_loss=0.01281, audio_tagging_loss=0.009379, over 3043633.47 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:03:32,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3061086.6666666665, ans=0.125 2023-11-25 21:03:33,424 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3061086.6666666665, ans=0.125 2023-11-25 21:03:36,105 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3061086.6666666665, ans=0.125 2023-11-25 21:03:41,921 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.41 vs. limit=22.5 2023-11-25 21:03:47,474 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.267e+01 8.734e+01 9.500e+01 1.033e+02 1.214e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-25 21:03:55,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3061220.0, ans=0.125 2023-11-25 21:04:07,691 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 459200 2023-11-25 21:04:13,888 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 2300, loss[loss=0.09026, simple_loss=0.1124, pruned_loss=0.02282, audio_tagging_loss=0.01125, over 17110.00 frames. ], tot_loss[loss=0.06772, simple_loss=0.09101, pruned_loss=0.01285, audio_tagging_loss=0.009364, over 3051793.82 frames. ], batch size: 65, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:04:26,332 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.71 vs. limit=15.0 2023-11-25 21:04:40,527 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:04:41,550 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3061486.6666666665, ans=0.1 2023-11-25 21:04:52,943 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3061553.3333333335, ans=0.125 2023-11-25 21:05:01,714 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.22 vs. limit=15.0 2023-11-25 21:05:03,120 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 21:05:03,170 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 459250 2023-11-25 21:05:08,342 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 2350, loss[loss=0.05815, simple_loss=0.07136, pruned_loss=0.008398, audio_tagging_loss=0.01407, over 14802.00 frames. ], tot_loss[loss=0.06811, simple_loss=0.09115, pruned_loss=0.01302, audio_tagging_loss=0.009517, over 3043918.03 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:05:36,647 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.495e+01 8.670e+01 9.358e+01 1.017e+02 1.318e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-25 21:05:40,555 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3061886.6666666665, ans=0.125 2023-11-25 21:05:55,207 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3061953.3333333335, ans=0.0 2023-11-25 21:05:57,201 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 459300 2023-11-25 21:06:01,828 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.74 vs. limit=22.5 2023-11-25 21:06:02,361 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 2400, loss[loss=0.05884, simple_loss=0.0783, pruned_loss=0.009154, audio_tagging_loss=0.01053, over 14272.00 frames. ], tot_loss[loss=0.06839, simple_loss=0.09169, pruned_loss=0.01302, audio_tagging_loss=0.009525, over 3041197.13 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:06:10,395 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=14.50 vs. limit=15.0 2023-11-25 21:06:24,939 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3062153.3333333335, ans=0.1 2023-11-25 21:06:26,038 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3062153.3333333335, ans=0.1 2023-11-25 21:06:28,154 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3062153.3333333335, ans=0.125 2023-11-25 21:06:37,395 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3062220.0, ans=0.0 2023-11-25 21:06:50,893 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 459350 2023-11-25 21:06:56,592 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 2450, loss[loss=0.07721, simple_loss=0.1046, pruned_loss=0.0166, audio_tagging_loss=0.008304, over 15359.00 frames. ], tot_loss[loss=0.06815, simple_loss=0.09101, pruned_loss=0.01295, audio_tagging_loss=0.009698, over 3041043.10 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:07:24,407 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.498e+01 8.490e+01 9.348e+01 1.014e+02 1.568e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-25 21:07:29,876 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3062553.3333333335, ans=0.125 2023-11-25 21:07:41,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3062620.0, ans=0.015 2023-11-25 21:07:45,842 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 459400 2023-11-25 21:07:48,445 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3062620.0, ans=0.125 2023-11-25 21:07:51,343 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 2500, loss[loss=0.05314, simple_loss=0.06547, pruned_loss=0.01006, audio_tagging_loss=0.01034, over 14828.00 frames. ], tot_loss[loss=0.06841, simple_loss=0.0915, pruned_loss=0.013, audio_tagging_loss=0.009658, over 3045023.26 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:08:01,275 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.02 vs. limit=15.0 2023-11-25 21:08:04,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3062753.3333333335, ans=0.125 2023-11-25 21:08:12,357 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3062820.0, ans=0.125 2023-11-25 21:08:21,741 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3062820.0, ans=0.5 2023-11-25 21:08:30,710 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3062886.6666666665, ans=0.0 2023-11-25 21:08:39,827 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 459450 2023-11-25 21:08:43,044 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3062953.3333333335, ans=0.125 2023-11-25 21:08:44,905 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 2550, loss[loss=0.07382, simple_loss=0.09407, pruned_loss=0.01477, audio_tagging_loss=0.01202, over 15140.00 frames. ], tot_loss[loss=0.06819, simple_loss=0.09151, pruned_loss=0.01293, audio_tagging_loss=0.009503, over 3054010.25 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:09:00,549 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.58 vs. limit=22.5 2023-11-25 21:09:06,368 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3063153.3333333335, ans=0.1 2023-11-25 21:09:13,332 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.057e+01 8.639e+01 9.480e+01 1.019e+02 1.523e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-25 21:09:14,534 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3063153.3333333335, ans=0.125 2023-11-25 21:09:33,281 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 459500 2023-11-25 21:09:36,510 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3063286.6666666665, ans=0.125 2023-11-25 21:09:38,361 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 2600, loss[loss=0.04722, simple_loss=0.06125, pruned_loss=0.005544, audio_tagging_loss=0.01105, over 15256.00 frames. ], tot_loss[loss=0.06742, simple_loss=0.09058, pruned_loss=0.01278, audio_tagging_loss=0.009347, over 3043923.73 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:09:49,093 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3063353.3333333335, ans=0.125 2023-11-25 21:09:50,006 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3063420.0, ans=0.0 2023-11-25 21:10:23,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3063620.0, ans=0.125 2023-11-25 21:10:26,711 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.94 vs. limit=15.0 2023-11-25 21:10:28,899 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 459550 2023-11-25 21:10:29,104 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3063620.0, ans=0.2 2023-11-25 21:10:34,010 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 2650, loss[loss=0.0679, simple_loss=0.09598, pruned_loss=0.00984, audio_tagging_loss=0.01007, over 15456.00 frames. ], tot_loss[loss=0.06766, simple_loss=0.09134, pruned_loss=0.01275, audio_tagging_loss=0.009236, over 3044708.47 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:10:34,226 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3063686.6666666665, ans=0.1 2023-11-25 21:10:52,036 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3063753.3333333335, ans=0.05 2023-11-25 21:10:56,461 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.13 vs. limit=15.0 2023-11-25 21:11:01,176 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.873e+01 8.401e+01 9.228e+01 9.795e+01 1.294e+02, threshold=1.846e+02, percent-clipped=0.0 2023-11-25 21:11:01,463 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3063820.0, ans=0.125 2023-11-25 21:11:05,363 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.21 vs. limit=22.5 2023-11-25 21:11:07,131 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3063886.6666666665, ans=0.0 2023-11-25 21:11:21,919 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.39 vs. limit=15.0 2023-11-25 21:11:22,694 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 459600 2023-11-25 21:11:23,922 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3063953.3333333335, ans=0.125 2023-11-25 21:11:28,313 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 2700, loss[loss=0.04976, simple_loss=0.06356, pruned_loss=0.007095, audio_tagging_loss=0.01089, over 13794.00 frames. ], tot_loss[loss=0.06757, simple_loss=0.0912, pruned_loss=0.0128, audio_tagging_loss=0.009164, over 3043549.29 frames. ], batch size: 52, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:11:37,964 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3064086.6666666665, ans=0.2 2023-11-25 21:11:40,000 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3064086.6666666665, ans=10.0 2023-11-25 21:11:48,326 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.29 vs. limit=15.0 2023-11-25 21:12:05,328 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3064220.0, ans=0.1 2023-11-25 21:12:16,704 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 459650 2023-11-25 21:12:21,807 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 2750, loss[loss=0.06862, simple_loss=0.09664, pruned_loss=0.01257, audio_tagging_loss=0.00773, over 16198.00 frames. ], tot_loss[loss=0.06717, simple_loss=0.09052, pruned_loss=0.01264, audio_tagging_loss=0.009267, over 3048370.15 frames. ], batch size: 63, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:12:22,066 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3064353.3333333335, ans=0.0 2023-11-25 21:12:30,148 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3064353.3333333335, ans=0.0 2023-11-25 21:12:45,739 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3064486.6666666665, ans=0.125 2023-11-25 21:12:47,780 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3064486.6666666665, ans=10.0 2023-11-25 21:12:50,729 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.974e+01 8.675e+01 9.044e+01 9.844e+01 1.238e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-25 21:12:50,978 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3064486.6666666665, ans=0.2 2023-11-25 21:12:54,174 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3064553.3333333335, ans=0.0 2023-11-25 21:13:08,994 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 21:13:11,620 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 459700 2023-11-25 21:13:17,206 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 2800, loss[loss=0.07227, simple_loss=0.09694, pruned_loss=0.01516, audio_tagging_loss=0.008637, over 14805.00 frames. ], tot_loss[loss=0.06737, simple_loss=0.09063, pruned_loss=0.01279, audio_tagging_loss=0.00927, over 3050686.59 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:13:36,733 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3064753.3333333335, ans=0.125 2023-11-25 21:13:37,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3064820.0, ans=0.125 2023-11-25 21:13:38,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=3064820.0, ans=22.5 2023-11-25 21:13:42,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3064820.0, ans=0.09899494936611666 2023-11-25 21:13:48,705 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.88 vs. limit=12.0 2023-11-25 21:13:50,861 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3064886.6666666665, ans=0.125 2023-11-25 21:13:54,538 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3064886.6666666665, ans=0.125 2023-11-25 21:14:00,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3064953.3333333335, ans=0.125 2023-11-25 21:14:06,799 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 459750 2023-11-25 21:14:11,862 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 2850, loss[loss=0.04485, simple_loss=0.05236, pruned_loss=0.006067, audio_tagging_loss=0.0126, over 16382.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.08972, pruned_loss=0.01263, audio_tagging_loss=0.009285, over 3041233.44 frames. ], batch size: 65, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:14:13,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3065020.0, ans=0.1 2023-11-25 21:14:16,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3065020.0, ans=0.125 2023-11-25 21:14:17,281 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3065020.0, ans=0.125 2023-11-25 21:14:30,833 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3065086.6666666665, ans=0.125 2023-11-25 21:14:39,517 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.78 vs. limit=10.0 2023-11-25 21:14:39,979 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.345e+01 8.595e+01 9.142e+01 9.903e+01 1.163e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-25 21:14:47,024 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3065220.0, ans=0.2 2023-11-25 21:14:55,529 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.35 vs. limit=15.0 2023-11-25 21:15:00,256 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 459800 2023-11-25 21:15:03,864 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3065286.6666666665, ans=0.04949747468305833 2023-11-25 21:15:05,816 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 2900, loss[loss=0.07114, simple_loss=0.1002, pruned_loss=0.01414, audio_tagging_loss=0.006924, over 14334.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.08936, pruned_loss=0.01257, audio_tagging_loss=0.009198, over 3031935.76 frames. ], batch size: 53, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:15:05,963 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3065353.3333333335, ans=0.2 2023-11-25 21:15:17,341 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3065420.0, ans=0.125 2023-11-25 21:15:37,883 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.53 vs. limit=15.0 2023-11-25 21:15:40,700 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3065553.3333333335, ans=0.0 2023-11-25 21:15:45,964 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3065553.3333333335, ans=10.0 2023-11-25 21:15:54,715 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 459850 2023-11-25 21:15:58,562 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:16:00,342 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 2950, loss[loss=0.06779, simple_loss=0.09085, pruned_loss=0.01154, audio_tagging_loss=0.01082, over 14390.00 frames. ], tot_loss[loss=0.06697, simple_loss=0.09029, pruned_loss=0.01259, audio_tagging_loss=0.009232, over 3044175.92 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:16:12,648 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3065753.3333333335, ans=0.0 2023-11-25 21:16:24,065 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3065820.0, ans=0.125 2023-11-25 21:16:27,996 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.331e+01 8.766e+01 9.471e+01 1.038e+02 1.516e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-25 21:16:29,345 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:16:31,442 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3065886.6666666665, ans=0.125 2023-11-25 21:16:38,264 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3065886.6666666665, ans=0.125 2023-11-25 21:16:42,897 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3065953.3333333335, ans=0.2 2023-11-25 21:16:49,122 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 459900 2023-11-25 21:16:53,937 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3066020.0, ans=0.125 2023-11-25 21:16:54,790 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 3000, loss[loss=0.06227, simple_loss=0.08878, pruned_loss=0.009001, audio_tagging_loss=0.008882, over 17499.00 frames. ], tot_loss[loss=0.06734, simple_loss=0.09086, pruned_loss=0.01263, audio_tagging_loss=0.009277, over 3048974.89 frames. ], batch size: 64, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:16:54,791 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-25 21:17:19,770 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([6.0145, 5.8344, 5.6377, 5.5691], device='cuda:3') 2023-11-25 21:17:26,406 INFO [train_asr.py:1267] (3/4) Epoch 39, validation: loss=0.05939, simple_loss=0.05076, pruned_loss=0.005254, audio_tagging_loss=0.02875, over 4681554.00 frames. 2023-11-25 21:17:26,407 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-25 21:17:57,360 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.30 vs. limit=10.0 2023-11-25 21:18:16,073 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 459950 2023-11-25 21:18:21,895 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 3050, loss[loss=0.05691, simple_loss=0.08056, pruned_loss=0.006729, audio_tagging_loss=0.009905, over 15982.00 frames. ], tot_loss[loss=0.06685, simple_loss=0.09025, pruned_loss=0.01244, audio_tagging_loss=0.009279, over 3054284.29 frames. ], batch size: 61, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:18:39,525 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3066420.0, ans=0.125 2023-11-25 21:18:47,897 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3066486.6666666665, ans=0.125 2023-11-25 21:18:49,816 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.661e+01 8.558e+01 9.280e+01 1.009e+02 1.298e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-25 21:18:53,058 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 21:19:01,760 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3066553.3333333335, ans=0.0 2023-11-25 21:19:03,828 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3066553.3333333335, ans=0.1 2023-11-25 21:19:11,074 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 460000 2023-11-25 21:19:19,032 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 3100, loss[loss=0.06653, simple_loss=0.08915, pruned_loss=0.01146, audio_tagging_loss=0.0105, over 15433.00 frames. ], tot_loss[loss=0.06728, simple_loss=0.0908, pruned_loss=0.01256, audio_tagging_loss=0.009321, over 3057259.66 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:19:38,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3066753.3333333335, ans=0.125 2023-11-25 21:19:44,752 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.68 vs. limit=12.0 2023-11-25 21:19:49,003 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.71 vs. limit=10.0 2023-11-25 21:20:07,922 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 460050 2023-11-25 21:20:09,105 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3066953.3333333335, ans=0.1 2023-11-25 21:20:13,162 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 3150, loss[loss=0.06824, simple_loss=0.09643, pruned_loss=0.01143, audio_tagging_loss=0.008595, over 15996.00 frames. ], tot_loss[loss=0.0683, simple_loss=0.09207, pruned_loss=0.01296, audio_tagging_loss=0.009304, over 3057514.49 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:20:42,066 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.488e+01 8.732e+01 9.474e+01 1.004e+02 1.246e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-25 21:20:53,729 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.23 vs. limit=15.0 2023-11-25 21:20:56,708 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.91 vs. limit=22.5 2023-11-25 21:21:03,033 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 460100 2023-11-25 21:21:09,290 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 3200, loss[loss=0.06453, simple_loss=0.0834, pruned_loss=0.01268, audio_tagging_loss=0.01014, over 15104.00 frames. ], tot_loss[loss=0.0684, simple_loss=0.09179, pruned_loss=0.01302, audio_tagging_loss=0.009483, over 3055910.06 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:21:18,859 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3067420.0, ans=0.2 2023-11-25 21:21:24,266 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3067420.0, ans=0.125 2023-11-25 21:21:27,789 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.61 vs. limit=22.5 2023-11-25 21:21:31,444 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.55 vs. limit=22.5 2023-11-25 21:21:44,081 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.78 vs. limit=15.0 2023-11-25 21:21:58,636 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 460150 2023-11-25 21:22:00,835 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3067620.0, ans=0.1 2023-11-25 21:22:03,790 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 3250, loss[loss=0.07944, simple_loss=0.1158, pruned_loss=0.01176, audio_tagging_loss=0.00976, over 15374.00 frames. ], tot_loss[loss=0.06837, simple_loss=0.09168, pruned_loss=0.01299, audio_tagging_loss=0.00955, over 3054397.72 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:22:05,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3067686.6666666665, ans=0.125 2023-11-25 21:22:14,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3067753.3333333335, ans=0.125 2023-11-25 21:22:28,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3067820.0, ans=0.125 2023-11-25 21:22:31,181 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3067820.0, ans=0.0 2023-11-25 21:22:32,985 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.538e+01 8.616e+01 9.104e+01 1.013e+02 1.269e+02, threshold=1.821e+02, percent-clipped=0.0 2023-11-25 21:22:33,243 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3067820.0, ans=0.125 2023-11-25 21:22:52,358 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:22:53,161 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 460200 2023-11-25 21:22:59,144 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 3300, loss[loss=0.06171, simple_loss=0.08004, pruned_loss=0.01049, audio_tagging_loss=0.0112, over 14749.00 frames. ], tot_loss[loss=0.06851, simple_loss=0.09189, pruned_loss=0.01301, audio_tagging_loss=0.009558, over 3055377.41 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:23:01,354 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3068020.0, ans=0.125 2023-11-25 21:23:08,181 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3068020.0, ans=0.125 2023-11-25 21:23:15,541 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3068086.6666666665, ans=0.05 2023-11-25 21:23:24,285 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.10 vs. limit=22.5 2023-11-25 21:23:24,879 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3068153.3333333335, ans=0.1 2023-11-25 21:23:32,221 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3068220.0, ans=0.125 2023-11-25 21:23:36,455 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3068220.0, ans=0.1 2023-11-25 21:23:48,279 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 460250 2023-11-25 21:23:49,893 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=3068286.6666666665, ans=0.05 2023-11-25 21:23:54,389 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 3350, loss[loss=0.0884, simple_loss=0.1165, pruned_loss=0.01747, audio_tagging_loss=0.01266, over 15124.00 frames. ], tot_loss[loss=0.06784, simple_loss=0.0912, pruned_loss=0.01282, audio_tagging_loss=0.009416, over 3051894.41 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:24:04,554 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3068420.0, ans=0.125 2023-11-25 21:24:05,652 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3068420.0, ans=0.1 2023-11-25 21:24:09,505 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.77 vs. limit=6.0 2023-11-25 21:24:22,948 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.162e+01 8.673e+01 9.369e+01 1.012e+02 1.333e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-25 21:24:44,780 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 460300 2023-11-25 21:24:50,026 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 3400, loss[loss=0.07252, simple_loss=0.1057, pruned_loss=0.01189, audio_tagging_loss=0.007801, over 15113.00 frames. ], tot_loss[loss=0.06771, simple_loss=0.09104, pruned_loss=0.01292, audio_tagging_loss=0.009273, over 3051823.75 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:25:23,479 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3068886.6666666665, ans=0.0 2023-11-25 21:25:39,053 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 460350 2023-11-25 21:25:42,692 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.51 vs. limit=15.0 2023-11-25 21:25:44,265 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 3450, loss[loss=0.05918, simple_loss=0.08052, pruned_loss=0.01003, audio_tagging_loss=0.008895, over 14888.00 frames. ], tot_loss[loss=0.0672, simple_loss=0.09039, pruned_loss=0.01284, audio_tagging_loss=0.009162, over 3049342.13 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:25:53,536 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3069020.0, ans=0.1 2023-11-25 21:26:00,911 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3069086.6666666665, ans=0.125 2023-11-25 21:26:05,538 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3069086.6666666665, ans=0.0 2023-11-25 21:26:05,601 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3069086.6666666665, ans=0.125 2023-11-25 21:26:13,688 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.466e+01 8.826e+01 9.469e+01 1.006e+02 1.325e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-25 21:26:24,464 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3069220.0, ans=0.2 2023-11-25 21:26:34,497 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 460400 2023-11-25 21:26:37,241 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3069286.6666666665, ans=0.0 2023-11-25 21:26:40,241 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 3500, loss[loss=0.06141, simple_loss=0.08064, pruned_loss=0.0103, audio_tagging_loss=0.01078, over 15031.00 frames. ], tot_loss[loss=0.06767, simple_loss=0.09106, pruned_loss=0.01298, audio_tagging_loss=0.009158, over 3051654.73 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:27:08,738 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 21:27:10,258 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.04 vs. limit=6.0 2023-11-25 21:27:19,561 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3069553.3333333335, ans=0.125 2023-11-25 21:27:30,900 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 460450 2023-11-25 21:27:36,118 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 3550, loss[loss=0.06603, simple_loss=0.08668, pruned_loss=0.01336, audio_tagging_loss=0.009334, over 16329.00 frames. ], tot_loss[loss=0.0674, simple_loss=0.09083, pruned_loss=0.01277, audio_tagging_loss=0.009211, over 3049876.86 frames. ], batch size: 61, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:28:04,238 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3069820.0, ans=0.125 2023-11-25 21:28:05,014 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.043e+01 8.535e+01 9.230e+01 9.860e+01 1.398e+02, threshold=1.846e+02, percent-clipped=0.0 2023-11-25 21:28:11,378 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3069886.6666666665, ans=0.125 2023-11-25 21:28:24,818 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 460500 2023-11-25 21:28:30,004 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 3600, loss[loss=0.06316, simple_loss=0.0831, pruned_loss=0.00907, audio_tagging_loss=0.01254, over 15699.00 frames. ], tot_loss[loss=0.06723, simple_loss=0.09053, pruned_loss=0.01271, audio_tagging_loss=0.009254, over 3054924.61 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:28:44,887 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3070086.6666666665, ans=0.2 2023-11-25 21:28:50,416 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.62 vs. limit=15.0 2023-11-25 21:29:19,217 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 460550 2023-11-25 21:29:24,870 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 3650, loss[loss=0.05729, simple_loss=0.07745, pruned_loss=0.01273, audio_tagging_loss=0.005834, over 15117.00 frames. ], tot_loss[loss=0.06802, simple_loss=0.09171, pruned_loss=0.01298, audio_tagging_loss=0.009187, over 3053039.45 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:29:46,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3070486.6666666665, ans=0.1 2023-11-25 21:29:54,661 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.402e+01 8.628e+01 9.158e+01 1.002e+02 1.364e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-25 21:29:57,902 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3070553.3333333335, ans=0.125 2023-11-25 21:30:05,241 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3070553.3333333335, ans=0.0 2023-11-25 21:30:15,076 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 460600 2023-11-25 21:30:20,539 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 3700, loss[loss=0.06384, simple_loss=0.08857, pruned_loss=0.009796, audio_tagging_loss=0.009758, over 16217.00 frames. ], tot_loss[loss=0.06784, simple_loss=0.0915, pruned_loss=0.01292, audio_tagging_loss=0.009169, over 3058513.04 frames. ], batch size: 61, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:30:26,000 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3070686.6666666665, ans=0.125 2023-11-25 21:30:49,651 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3070820.0, ans=0.125 2023-11-25 21:30:53,519 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.77 vs. limit=10.0 2023-11-25 21:31:09,781 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 460650 2023-11-25 21:31:14,933 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 3750, loss[loss=0.05657, simple_loss=0.07507, pruned_loss=0.008628, audio_tagging_loss=0.0104, over 16801.00 frames. ], tot_loss[loss=0.06765, simple_loss=0.09112, pruned_loss=0.0129, audio_tagging_loss=0.009196, over 3052319.74 frames. ], batch size: 63, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:31:19,402 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3071020.0, ans=0.0 2023-11-25 21:31:33,609 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3071086.6666666665, ans=0.0 2023-11-25 21:31:45,430 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.367e+01 8.797e+01 9.429e+01 1.022e+02 1.345e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-25 21:31:52,244 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.26 vs. limit=15.0 2023-11-25 21:31:53,788 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 21:31:55,011 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:32:02,262 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3071286.6666666665, ans=0.0 2023-11-25 21:32:04,166 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 460700 2023-11-25 21:32:09,374 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 3800, loss[loss=0.0455, simple_loss=0.05458, pruned_loss=0.006941, audio_tagging_loss=0.01126, over 15207.00 frames. ], tot_loss[loss=0.06722, simple_loss=0.09055, pruned_loss=0.01266, audio_tagging_loss=0.00929, over 3050387.69 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:32:17,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3071353.3333333335, ans=0.05 2023-11-25 21:32:36,070 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.29 vs. limit=15.0 2023-11-25 21:32:44,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3071553.3333333335, ans=0.1 2023-11-25 21:32:50,397 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3071553.3333333335, ans=0.0 2023-11-25 21:32:59,695 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 460750 2023-11-25 21:33:02,427 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3071620.0, ans=0.0 2023-11-25 21:33:02,573 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3071620.0, ans=0.2 2023-11-25 21:33:05,951 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 3850, loss[loss=0.07277, simple_loss=0.09946, pruned_loss=0.01475, audio_tagging_loss=0.008294, over 15913.00 frames. ], tot_loss[loss=0.06769, simple_loss=0.09077, pruned_loss=0.01295, audio_tagging_loss=0.009348, over 3048201.48 frames. ], batch size: 60, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:33:13,918 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.79 vs. limit=15.0 2023-11-25 21:33:34,152 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.205e+01 8.515e+01 9.072e+01 9.640e+01 1.260e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-25 21:33:38,017 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3071886.6666666665, ans=0.1 2023-11-25 21:33:42,858 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3071886.6666666665, ans=0.0 2023-11-25 21:33:43,808 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=3071886.6666666665, ans=0.025 2023-11-25 21:33:51,340 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3071953.3333333335, ans=0.025 2023-11-25 21:33:55,451 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 460800 2023-11-25 21:33:57,977 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3071953.3333333335, ans=0.0 2023-11-25 21:34:00,956 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 3900, loss[loss=0.05841, simple_loss=0.07257, pruned_loss=0.009859, audio_tagging_loss=0.01226, over 15848.00 frames. ], tot_loss[loss=0.06835, simple_loss=0.09157, pruned_loss=0.01321, audio_tagging_loss=0.009355, over 3043588.69 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:34:04,804 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.02 vs. limit=15.0 2023-11-25 21:34:21,446 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3072153.3333333335, ans=0.1 2023-11-25 21:34:49,262 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3072286.6666666665, ans=0.0 2023-11-25 21:34:50,190 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 460850 2023-11-25 21:34:51,608 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.46 vs. limit=15.0 2023-11-25 21:34:53,415 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=3072286.6666666665, ans=0.025 2023-11-25 21:34:55,276 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 3950, loss[loss=0.05582, simple_loss=0.07456, pruned_loss=0.009072, audio_tagging_loss=0.009465, over 14911.00 frames. ], tot_loss[loss=0.0683, simple_loss=0.09171, pruned_loss=0.0131, audio_tagging_loss=0.009352, over 3045780.84 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:34:58,650 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3072353.3333333335, ans=0.125 2023-11-25 21:35:08,161 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:35:15,952 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3072420.0, ans=0.125 2023-11-25 21:35:21,885 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.13 vs. limit=15.0 2023-11-25 21:35:24,726 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3072486.6666666665, ans=0.0 2023-11-25 21:35:26,689 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.900e+01 8.593e+01 9.164e+01 9.900e+01 1.243e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-25 21:35:35,851 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.41 vs. limit=15.0 2023-11-25 21:35:38,641 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3072620.0, ans=0.0 2023-11-25 21:35:45,353 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 460900 2023-11-25 21:35:51,039 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 4000, loss[loss=0.06713, simple_loss=0.08086, pruned_loss=0.01504, audio_tagging_loss=0.01166, over 15082.00 frames. ], tot_loss[loss=0.06834, simple_loss=0.09153, pruned_loss=0.01309, audio_tagging_loss=0.009482, over 3044274.05 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:36:34,656 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:36:40,674 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 460950 2023-11-25 21:36:46,397 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 4050, loss[loss=0.06208, simple_loss=0.09129, pruned_loss=0.0107, audio_tagging_loss=0.005733, over 15166.00 frames. ], tot_loss[loss=0.06868, simple_loss=0.09194, pruned_loss=0.01322, audio_tagging_loss=0.009481, over 3051768.17 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:36:48,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3073020.0, ans=0.125 2023-11-25 21:36:50,623 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 21:36:50,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3073020.0, ans=0.125 2023-11-25 21:37:19,534 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.319e+01 8.880e+01 9.594e+01 1.042e+02 1.593e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-25 21:37:22,883 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3073220.0, ans=0.0 2023-11-25 21:37:22,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3073220.0, ans=0.95 2023-11-25 21:37:25,996 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:37:34,357 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.48 vs. limit=6.0 2023-11-25 21:37:35,835 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 461000 2023-11-25 21:37:41,378 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 4100, loss[loss=0.07655, simple_loss=0.09855, pruned_loss=0.01737, audio_tagging_loss=0.00991, over 14214.00 frames. ], tot_loss[loss=0.06902, simple_loss=0.09255, pruned_loss=0.01333, audio_tagging_loss=0.009424, over 3054205.08 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:37:59,831 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3073420.0, ans=0.2 2023-11-25 21:38:02,013 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3073420.0, ans=0.125 2023-11-25 21:38:27,787 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3073620.0, ans=0.125 2023-11-25 21:38:30,715 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 461050 2023-11-25 21:38:30,912 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3073620.0, ans=0.0 2023-11-25 21:38:34,629 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3073620.0, ans=0.05 2023-11-25 21:38:34,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3073620.0, ans=0.1 2023-11-25 21:38:36,935 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 4150, loss[loss=0.05594, simple_loss=0.07344, pruned_loss=0.006004, audio_tagging_loss=0.01321, over 15860.00 frames. ], tot_loss[loss=0.06829, simple_loss=0.09184, pruned_loss=0.01309, audio_tagging_loss=0.009278, over 3048560.98 frames. ], batch size: 62, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:38:44,974 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3073686.6666666665, ans=0.1 2023-11-25 21:38:56,937 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3073753.3333333335, ans=0.1 2023-11-25 21:38:59,060 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3073820.0, ans=0.125 2023-11-25 21:39:03,395 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3073820.0, ans=0.125 2023-11-25 21:39:05,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3073820.0, ans=0.1 2023-11-25 21:39:05,518 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3073820.0, ans=0.125 2023-11-25 21:39:06,660 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3073820.0, ans=0.0 2023-11-25 21:39:09,589 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.340e+01 8.760e+01 9.274e+01 9.766e+01 1.268e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-25 21:39:17,899 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 21:39:19,146 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3073886.6666666665, ans=0.0 2023-11-25 21:39:20,202 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3073953.3333333335, ans=0.07 2023-11-25 21:39:26,812 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 461100 2023-11-25 21:39:32,005 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 4200, loss[loss=0.07496, simple_loss=0.106, pruned_loss=0.01406, audio_tagging_loss=0.007918, over 15187.00 frames. ], tot_loss[loss=0.06855, simple_loss=0.09226, pruned_loss=0.01317, audio_tagging_loss=0.00925, over 3049794.79 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:39:41,238 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3074020.0, ans=0.0 2023-11-25 21:39:49,621 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3074086.6666666665, ans=0.1 2023-11-25 21:40:10,167 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3074220.0, ans=0.125 2023-11-25 21:40:15,296 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3074286.6666666665, ans=0.1 2023-11-25 21:40:21,663 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 461150 2023-11-25 21:40:27,299 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 4250, loss[loss=0.0743, simple_loss=0.09968, pruned_loss=0.01472, audio_tagging_loss=0.00974, over 14614.00 frames. ], tot_loss[loss=0.06912, simple_loss=0.0933, pruned_loss=0.01335, audio_tagging_loss=0.009121, over 3047397.04 frames. ], batch size: 53, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:40:29,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3074353.3333333335, ans=0.0 2023-11-25 21:40:30,532 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3074353.3333333335, ans=0.1 2023-11-25 21:40:44,436 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.59 vs. limit=12.0 2023-11-25 21:40:58,867 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.07 vs. limit=15.0 2023-11-25 21:41:00,147 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.068e+01 8.649e+01 9.520e+01 1.007e+02 1.325e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-25 21:41:01,797 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.68 vs. limit=10.0 2023-11-25 21:41:07,214 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3074553.3333333335, ans=0.09899494936611666 2023-11-25 21:41:16,504 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 461200 2023-11-25 21:41:19,450 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.22 vs. limit=22.5 2023-11-25 21:41:22,524 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 4300, loss[loss=0.05245, simple_loss=0.06773, pruned_loss=0.008599, audio_tagging_loss=0.009985, over 14824.00 frames. ], tot_loss[loss=0.06874, simple_loss=0.09296, pruned_loss=0.01324, audio_tagging_loss=0.009024, over 3046612.79 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:41:27,718 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.36 vs. limit=15.0 2023-11-25 21:41:31,549 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.88 vs. limit=15.0 2023-11-25 21:41:37,491 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3074753.3333333335, ans=0.0 2023-11-25 21:41:43,053 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.41 vs. limit=22.5 2023-11-25 21:42:02,554 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.56 vs. limit=15.0 2023-11-25 21:42:04,804 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3074886.6666666665, ans=0.05 2023-11-25 21:42:13,023 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 461250 2023-11-25 21:42:18,144 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 4350, loss[loss=0.07092, simple_loss=0.1003, pruned_loss=0.01251, audio_tagging_loss=0.00826, over 15862.00 frames. ], tot_loss[loss=0.06833, simple_loss=0.09238, pruned_loss=0.01311, audio_tagging_loss=0.009025, over 3043379.73 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:42:31,480 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.47 vs. limit=15.0 2023-11-25 21:42:42,561 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.81 vs. limit=22.5 2023-11-25 21:42:51,207 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.343e+01 8.819e+01 9.341e+01 1.009e+02 3.956e+02, threshold=1.868e+02, percent-clipped=1.0 2023-11-25 21:42:55,214 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3075220.0, ans=0.125 2023-11-25 21:43:00,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3075220.0, ans=0.0 2023-11-25 21:43:07,489 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 461300 2023-11-25 21:43:09,720 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3075286.6666666665, ans=0.0 2023-11-25 21:43:12,644 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 4400, loss[loss=0.07925, simple_loss=0.1092, pruned_loss=0.01861, audio_tagging_loss=0.006037, over 15541.00 frames. ], tot_loss[loss=0.06889, simple_loss=0.09336, pruned_loss=0.01333, audio_tagging_loss=0.00888, over 3047289.24 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:43:22,718 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3075420.0, ans=0.0 2023-11-25 21:43:25,217 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.03 vs. limit=15.0 2023-11-25 21:43:46,325 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3075553.3333333335, ans=0.125 2023-11-25 21:43:57,758 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3075620.0, ans=0.1 2023-11-25 21:44:02,309 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 461350 2023-11-25 21:44:07,980 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 4450, loss[loss=0.06059, simple_loss=0.08839, pruned_loss=0.007718, audio_tagging_loss=0.008676, over 13226.00 frames. ], tot_loss[loss=0.06884, simple_loss=0.09312, pruned_loss=0.01337, audio_tagging_loss=0.008918, over 3049834.74 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:44:11,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3075686.6666666665, ans=0.125 2023-11-25 21:44:23,716 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3075753.3333333335, ans=0.2 2023-11-25 21:44:31,103 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3075820.0, ans=0.1 2023-11-25 21:44:41,816 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.256e+01 8.665e+01 9.390e+01 1.023e+02 1.193e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-25 21:44:46,164 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3075886.6666666665, ans=0.0 2023-11-25 21:44:51,966 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3075953.3333333335, ans=0.125 2023-11-25 21:44:54,094 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3075953.3333333335, ans=0.1 2023-11-25 21:44:57,523 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 461400 2023-11-25 21:45:03,432 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 4500, loss[loss=0.08762, simple_loss=0.1239, pruned_loss=0.01885, audio_tagging_loss=0.006811, over 15627.00 frames. ], tot_loss[loss=0.06859, simple_loss=0.09283, pruned_loss=0.01321, audio_tagging_loss=0.008963, over 3055382.21 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:45:04,597 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3076020.0, ans=0.0 2023-11-25 21:45:09,303 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.18 vs. limit=15.0 2023-11-25 21:45:26,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3076153.3333333335, ans=0.0 2023-11-25 21:45:29,088 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3076153.3333333335, ans=0.125 2023-11-25 21:45:34,751 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3076220.0, ans=0.125 2023-11-25 21:45:52,301 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 461450 2023-11-25 21:45:57,459 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 4550, loss[loss=0.07636, simple_loss=0.1024, pruned_loss=0.01793, audio_tagging_loss=0.007227, over 15036.00 frames. ], tot_loss[loss=0.06812, simple_loss=0.09196, pruned_loss=0.01309, audio_tagging_loss=0.009044, over 3055007.36 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:46:01,844 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3076353.3333333335, ans=0.125 2023-11-25 21:46:18,989 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3076486.6666666665, ans=0.1 2023-11-25 21:46:25,662 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3076486.6666666665, ans=0.125 2023-11-25 21:46:26,092 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.48 vs. limit=12.0 2023-11-25 21:46:27,107 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.21 vs. limit=15.0 2023-11-25 21:46:27,725 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3076486.6666666665, ans=0.125 2023-11-25 21:46:31,610 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.239e+01 8.489e+01 8.972e+01 9.819e+01 1.195e+02, threshold=1.794e+02, percent-clipped=0.0 2023-11-25 21:46:33,874 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3076553.3333333335, ans=0.125 2023-11-25 21:46:40,029 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 21:46:40,308 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3076620.0, ans=0.125 2023-11-25 21:46:46,235 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 461500 2023-11-25 21:46:51,814 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 4600, loss[loss=0.06727, simple_loss=0.09559, pruned_loss=0.008935, audio_tagging_loss=0.01054, over 15053.00 frames. ], tot_loss[loss=0.06783, simple_loss=0.09131, pruned_loss=0.01303, audio_tagging_loss=0.009147, over 3059171.32 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:46:52,078 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3076686.6666666665, ans=0.125 2023-11-25 21:46:53,117 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3076686.6666666665, ans=0.125 2023-11-25 21:47:12,724 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.42 vs. limit=15.0 2023-11-25 21:47:21,527 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3076820.0, ans=0.0 2023-11-25 21:47:37,139 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3076953.3333333335, ans=0.0 2023-11-25 21:47:41,688 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 461550 2023-11-25 21:47:46,493 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3077020.0, ans=0.0 2023-11-25 21:47:47,363 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 4650, loss[loss=0.0719, simple_loss=0.1029, pruned_loss=0.01339, audio_tagging_loss=0.007043, over 15151.00 frames. ], tot_loss[loss=0.0679, simple_loss=0.0911, pruned_loss=0.01305, audio_tagging_loss=0.009306, over 3052799.15 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:48:03,233 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3077086.6666666665, ans=0.125 2023-11-25 21:48:20,598 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.306e+01 8.563e+01 9.172e+01 1.006e+02 1.160e+02, threshold=1.834e+02, percent-clipped=0.0 2023-11-25 21:48:32,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3077286.6666666665, ans=0.125 2023-11-25 21:48:36,268 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 461600 2023-11-25 21:48:41,722 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 4700, loss[loss=0.06016, simple_loss=0.08043, pruned_loss=0.01001, audio_tagging_loss=0.009937, over 14612.00 frames. ], tot_loss[loss=0.06808, simple_loss=0.09119, pruned_loss=0.01316, audio_tagging_loss=0.009325, over 3053053.18 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:48:57,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3077420.0, ans=0.0 2023-11-25 21:49:06,712 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3077486.6666666665, ans=0.125 2023-11-25 21:49:07,841 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3077486.6666666665, ans=0.1 2023-11-25 21:49:09,330 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3077486.6666666665, ans=0.125 2023-11-25 21:49:13,486 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3077553.3333333335, ans=0.09899494936611666 2023-11-25 21:49:29,923 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 461650 2023-11-25 21:49:35,068 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 4750, loss[loss=0.03144, simple_loss=0.0356, pruned_loss=0.004597, audio_tagging_loss=0.009041, over 13874.00 frames. ], tot_loss[loss=0.068, simple_loss=0.09089, pruned_loss=0.01315, audio_tagging_loss=0.009406, over 3054906.83 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 8.0 2023-11-25 21:49:54,974 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=3077753.3333333335, ans=0.1 2023-11-25 21:49:56,003 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3077753.3333333335, ans=0.2 2023-11-25 21:50:04,312 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3077820.0, ans=0.125 2023-11-25 21:50:06,330 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3077820.0, ans=0.125 2023-11-25 21:50:09,277 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.330e+01 8.924e+01 9.307e+01 1.025e+02 1.203e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-25 21:50:15,752 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3077886.6666666665, ans=0.0 2023-11-25 21:50:24,995 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 461700 2023-11-25 21:50:28,027 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.66 vs. limit=15.0 2023-11-25 21:50:30,548 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 4800, loss[loss=0.08174, simple_loss=0.1141, pruned_loss=0.01644, audio_tagging_loss=0.008278, over 15934.00 frames. ], tot_loss[loss=0.06815, simple_loss=0.09137, pruned_loss=0.01308, audio_tagging_loss=0.009381, over 3056084.34 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:50:34,346 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3078020.0, ans=0.125 2023-11-25 21:50:41,581 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3078086.6666666665, ans=0.125 2023-11-25 21:50:43,650 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:50:44,604 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3078086.6666666665, ans=0.1 2023-11-25 21:50:46,048 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.17 vs. limit=6.0 2023-11-25 21:50:46,791 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3078086.6666666665, ans=0.0 2023-11-25 21:51:16,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3078286.6666666665, ans=0.2 2023-11-25 21:51:19,474 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 461750 2023-11-25 21:51:20,826 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.24 vs. limit=15.0 2023-11-25 21:51:22,736 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3078286.6666666665, ans=0.1 2023-11-25 21:51:24,560 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 4850, loss[loss=0.04819, simple_loss=0.05801, pruned_loss=0.009407, audio_tagging_loss=0.009778, over 15551.00 frames. ], tot_loss[loss=0.06715, simple_loss=0.08984, pruned_loss=0.01273, audio_tagging_loss=0.009497, over 3047561.09 frames. ], batch size: 60, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:51:30,036 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3078353.3333333335, ans=0.125 2023-11-25 21:51:36,436 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.05 vs. limit=15.0 2023-11-25 21:51:39,229 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3078420.0, ans=0.125 2023-11-25 21:51:58,151 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.385e+01 8.740e+01 9.474e+01 1.031e+02 1.193e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-25 21:52:12,715 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 461800 2023-11-25 21:52:15,264 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3078620.0, ans=0.0 2023-11-25 21:52:18,110 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 4900, loss[loss=0.05771, simple_loss=0.08407, pruned_loss=0.008435, audio_tagging_loss=0.007245, over 16953.00 frames. ], tot_loss[loss=0.06778, simple_loss=0.09103, pruned_loss=0.01287, audio_tagging_loss=0.00939, over 3050647.14 frames. ], batch size: 63, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:53:07,346 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 461850 2023-11-25 21:53:12,990 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 4950, loss[loss=0.07533, simple_loss=0.0973, pruned_loss=0.01505, audio_tagging_loss=0.01163, over 15307.00 frames. ], tot_loss[loss=0.06755, simple_loss=0.09081, pruned_loss=0.01288, audio_tagging_loss=0.00927, over 3041081.51 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:53:38,074 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3079153.3333333335, ans=0.0 2023-11-25 21:53:46,067 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.137e+01 8.693e+01 9.293e+01 9.943e+01 1.246e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-25 21:53:57,322 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3079286.6666666665, ans=0.07 2023-11-25 21:53:57,588 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.96 vs. limit=22.5 2023-11-25 21:54:02,733 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 461900 2023-11-25 21:54:04,059 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3079286.6666666665, ans=0.2 2023-11-25 21:54:07,057 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3079353.3333333335, ans=0.125 2023-11-25 21:54:07,911 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 5000, loss[loss=0.07786, simple_loss=0.1088, pruned_loss=0.01637, audio_tagging_loss=0.007096, over 17066.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.08953, pruned_loss=0.01261, audio_tagging_loss=0.009157, over 3047633.84 frames. ], batch size: 64, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:54:36,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3079486.6666666665, ans=0.0 2023-11-25 21:54:40,614 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3079553.3333333335, ans=0.1 2023-11-25 21:54:46,605 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.90 vs. limit=12.0 2023-11-25 21:54:56,515 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 461950 2023-11-25 21:55:01,284 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.49 vs. limit=10.0 2023-11-25 21:55:01,799 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 5050, loss[loss=0.05531, simple_loss=0.07539, pruned_loss=0.00992, audio_tagging_loss=0.007698, over 14936.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.09005, pruned_loss=0.01272, audio_tagging_loss=0.009034, over 3042550.91 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:55:09,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3079686.6666666665, ans=0.0 2023-11-25 21:55:20,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3079753.3333333335, ans=0.125 2023-11-25 21:55:21,257 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.27 vs. limit=15.0 2023-11-25 21:55:30,917 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3079820.0, ans=0.0 2023-11-25 21:55:35,798 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.630e+01 8.539e+01 9.066e+01 9.676e+01 1.144e+02, threshold=1.813e+02, percent-clipped=0.0 2023-11-25 21:55:43,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3079886.6666666665, ans=0.125 2023-11-25 21:55:50,296 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 462000 2023-11-25 21:55:50,384 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3079953.3333333335, ans=0.125 2023-11-25 21:55:54,201 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 21:55:56,005 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 5100, loss[loss=0.06442, simple_loss=0.07535, pruned_loss=0.0147, audio_tagging_loss=0.01205, over 14627.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.08945, pruned_loss=0.01265, audio_tagging_loss=0.009048, over 3041559.78 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:55:56,180 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3080020.0, ans=0.0 2023-11-25 21:56:02,310 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3080020.0, ans=0.125 2023-11-25 21:56:03,440 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3080020.0, ans=0.125 2023-11-25 21:56:36,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3080220.0, ans=0.125 2023-11-25 21:56:45,801 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 462050 2023-11-25 21:56:51,455 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 5150, loss[loss=0.06291, simple_loss=0.08899, pruned_loss=0.008238, audio_tagging_loss=0.01017, over 15647.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.09006, pruned_loss=0.01281, audio_tagging_loss=0.008949, over 3041857.78 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:56:55,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3080353.3333333335, ans=0.0 2023-11-25 21:57:01,006 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3080420.0, ans=0.0 2023-11-25 21:57:11,269 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3080486.6666666665, ans=0.125 2023-11-25 21:57:12,391 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3080486.6666666665, ans=0.125 2023-11-25 21:57:13,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3080486.6666666665, ans=0.0 2023-11-25 21:57:25,139 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.840e+01 8.763e+01 9.349e+01 9.902e+01 1.210e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-25 21:57:27,505 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3080553.3333333335, ans=0.0 2023-11-25 21:57:28,792 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.08 vs. limit=22.5 2023-11-25 21:57:32,552 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3080553.3333333335, ans=0.0 2023-11-25 21:57:40,146 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 462100 2023-11-25 21:57:44,557 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3080686.6666666665, ans=0.0 2023-11-25 21:57:45,285 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 5200, loss[loss=0.06493, simple_loss=0.08763, pruned_loss=0.01458, audio_tagging_loss=0.006539, over 15181.00 frames. ], tot_loss[loss=0.06731, simple_loss=0.0909, pruned_loss=0.013, audio_tagging_loss=0.008865, over 3046431.65 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:57:51,646 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3080686.6666666665, ans=0.04949747468305833 2023-11-25 21:58:11,351 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.65 vs. limit=10.0 2023-11-25 21:58:11,902 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3080820.0, ans=0.125 2023-11-25 21:58:12,202 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.91 vs. limit=15.0 2023-11-25 21:58:33,951 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 462150 2023-11-25 21:58:39,646 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 5250, loss[loss=0.07822, simple_loss=0.1124, pruned_loss=0.01475, audio_tagging_loss=0.007261, over 13940.00 frames. ], tot_loss[loss=0.06724, simple_loss=0.09071, pruned_loss=0.01296, audio_tagging_loss=0.008916, over 3041115.74 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 21:59:14,201 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.326e+01 8.546e+01 9.251e+01 9.912e+01 1.159e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-25 21:59:19,081 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.05 vs. limit=15.0 2023-11-25 21:59:29,350 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 462200 2023-11-25 21:59:34,794 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 5300, loss[loss=0.06922, simple_loss=0.08878, pruned_loss=0.01795, audio_tagging_loss=0.006888, over 14545.00 frames. ], tot_loss[loss=0.0669, simple_loss=0.09032, pruned_loss=0.01276, audio_tagging_loss=0.008983, over 3046472.65 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 21:59:43,874 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3081353.3333333335, ans=0.09899494936611666 2023-11-25 21:59:52,581 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.71 vs. limit=12.0 2023-11-25 22:00:13,401 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3081553.3333333335, ans=0.0 2023-11-25 22:00:23,664 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 462250 2023-11-25 22:00:23,861 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3081620.0, ans=0.125 2023-11-25 22:00:24,149 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.01 vs. limit=15.0 2023-11-25 22:00:29,310 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 5350, loss[loss=0.06329, simple_loss=0.08505, pruned_loss=0.0119, audio_tagging_loss=0.008859, over 15037.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.08938, pruned_loss=0.01252, audio_tagging_loss=0.008974, over 3047326.01 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 22:00:31,624 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3081686.6666666665, ans=0.125 2023-11-25 22:00:33,665 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3081686.6666666665, ans=0.0 2023-11-25 22:00:37,868 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3081686.6666666665, ans=0.1 2023-11-25 22:00:38,878 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3081753.3333333335, ans=0.2 2023-11-25 22:00:40,210 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.32 vs. limit=22.5 2023-11-25 22:00:51,278 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3081820.0, ans=0.0 2023-11-25 22:01:04,032 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.414e+01 8.802e+01 9.245e+01 1.006e+02 1.324e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-25 22:01:17,164 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3081953.3333333335, ans=0.1 2023-11-25 22:01:18,104 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 462300 2023-11-25 22:01:20,648 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.35 vs. limit=15.0 2023-11-25 22:01:23,253 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 5400, loss[loss=0.05649, simple_loss=0.07358, pruned_loss=0.008292, audio_tagging_loss=0.01141, over 15776.00 frames. ], tot_loss[loss=0.06716, simple_loss=0.09068, pruned_loss=0.01279, audio_tagging_loss=0.009033, over 3044843.27 frames. ], batch size: 60, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 22:01:32,809 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3082020.0, ans=0.2 2023-11-25 22:02:03,024 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3082220.0, ans=0.2 2023-11-25 22:02:04,062 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3082220.0, ans=0.1 2023-11-25 22:02:13,180 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 462350 2023-11-25 22:02:18,829 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 5450, loss[loss=0.0649, simple_loss=0.08612, pruned_loss=0.01406, audio_tagging_loss=0.007772, over 15334.00 frames. ], tot_loss[loss=0.06722, simple_loss=0.09095, pruned_loss=0.01275, audio_tagging_loss=0.009001, over 3046834.27 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 22:02:41,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3082486.6666666665, ans=0.125 2023-11-25 22:02:53,084 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.422e+01 8.739e+01 9.442e+01 1.018e+02 1.459e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-25 22:03:07,615 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 462400 2023-11-25 22:03:12,954 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 5500, loss[loss=0.06729, simple_loss=0.09064, pruned_loss=0.01349, audio_tagging_loss=0.008486, over 14652.00 frames. ], tot_loss[loss=0.0672, simple_loss=0.091, pruned_loss=0.01265, audio_tagging_loss=0.009055, over 3049980.02 frames. ], batch size: 53, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 22:03:30,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3082753.3333333335, ans=0.125 2023-11-25 22:04:02,252 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 462450 2023-11-25 22:04:03,852 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.20 vs. limit=22.5 2023-11-25 22:04:06,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3083020.0, ans=0.0 2023-11-25 22:04:07,412 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 5550, loss[loss=0.06309, simple_loss=0.08919, pruned_loss=0.01055, audio_tagging_loss=0.007949, over 15487.00 frames. ], tot_loss[loss=0.06717, simple_loss=0.0908, pruned_loss=0.01268, audio_tagging_loss=0.009093, over 3047292.34 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-25 22:04:27,760 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3083086.6666666665, ans=0.125 2023-11-25 22:04:42,729 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.227e+01 8.648e+01 9.293e+01 9.970e+01 1.288e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-25 22:04:51,285 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3083286.6666666665, ans=0.2 2023-11-25 22:04:56,901 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 462500 2023-11-25 22:05:03,007 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 5600, loss[loss=0.05822, simple_loss=0.07399, pruned_loss=0.009915, audio_tagging_loss=0.01131, over 14788.00 frames. ], tot_loss[loss=0.06737, simple_loss=0.09097, pruned_loss=0.0126, audio_tagging_loss=0.009278, over 3044606.76 frames. ], batch size: 60, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 22:05:18,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3083420.0, ans=0.125 2023-11-25 22:05:19,156 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.32 vs. limit=22.5 2023-11-25 22:05:22,937 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3083486.6666666665, ans=0.1 2023-11-25 22:05:23,163 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.16 vs. limit=15.0 2023-11-25 22:05:43,061 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 22:05:51,846 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 462550 2023-11-25 22:05:56,941 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 5650, loss[loss=0.08015, simple_loss=0.1048, pruned_loss=0.01573, audio_tagging_loss=0.01202, over 15742.00 frames. ], tot_loss[loss=0.06809, simple_loss=0.09169, pruned_loss=0.01286, audio_tagging_loss=0.009379, over 3045931.03 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 22:06:30,384 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3083886.6666666665, ans=0.125 2023-11-25 22:06:32,134 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.579e+01 8.554e+01 9.201e+01 9.858e+01 1.570e+02, threshold=1.840e+02, percent-clipped=0.0 2023-11-25 22:06:37,950 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.01 vs. limit=15.0 2023-11-25 22:06:45,110 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.51 vs. limit=10.0 2023-11-25 22:06:45,782 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 462600 2023-11-25 22:06:49,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3083953.3333333335, ans=0.2 2023-11-25 22:06:51,775 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 5700, loss[loss=0.05001, simple_loss=0.06206, pruned_loss=0.0084, audio_tagging_loss=0.01058, over 15724.00 frames. ], tot_loss[loss=0.06768, simple_loss=0.09103, pruned_loss=0.01281, audio_tagging_loss=0.009364, over 3045003.71 frames. ], batch size: 60, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 22:07:02,962 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3084086.6666666665, ans=0.125 2023-11-25 22:07:15,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3084153.3333333335, ans=0.0 2023-11-25 22:07:16,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3084153.3333333335, ans=0.0 2023-11-25 22:07:16,942 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.39 vs. limit=6.0 2023-11-25 22:07:17,668 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.41 vs. limit=15.0 2023-11-25 22:07:22,786 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3084153.3333333335, ans=0.125 2023-11-25 22:07:41,361 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 462650 2023-11-25 22:07:46,930 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 5750, loss[loss=0.06374, simple_loss=0.07939, pruned_loss=0.01395, audio_tagging_loss=0.0101, over 14329.00 frames. ], tot_loss[loss=0.06778, simple_loss=0.09115, pruned_loss=0.01289, audio_tagging_loss=0.009318, over 3050586.70 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 32.0 2023-11-25 22:08:21,774 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.113e+01 8.632e+01 9.114e+01 9.911e+01 1.968e+02, threshold=1.823e+02, percent-clipped=1.0 2023-11-25 22:08:36,344 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 462700 2023-11-25 22:08:36,437 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3084620.0, ans=0.125 2023-11-25 22:08:37,555 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3084620.0, ans=0.125 2023-11-25 22:08:41,433 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 5800, loss[loss=0.06693, simple_loss=0.08416, pruned_loss=0.01804, audio_tagging_loss=0.006803, over 14772.00 frames. ], tot_loss[loss=0.0681, simple_loss=0.09186, pruned_loss=0.01308, audio_tagging_loss=0.009094, over 3048666.00 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:08:50,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3084686.6666666665, ans=0.125 2023-11-25 22:08:54,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3084753.3333333335, ans=0.0 2023-11-25 22:09:11,567 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.39 vs. limit=15.0 2023-11-25 22:09:22,071 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3084886.6666666665, ans=0.1 2023-11-25 22:09:28,281 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3084953.3333333335, ans=0.125 2023-11-25 22:09:28,734 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.59 vs. limit=12.0 2023-11-25 22:09:30,148 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 462750 2023-11-25 22:09:30,839 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.40 vs. limit=22.5 2023-11-25 22:09:32,389 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 22:09:35,304 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 5850, loss[loss=0.0719, simple_loss=0.08939, pruned_loss=0.01716, audio_tagging_loss=0.01005, over 14886.00 frames. ], tot_loss[loss=0.06724, simple_loss=0.09068, pruned_loss=0.0129, audio_tagging_loss=0.009003, over 3046554.16 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:09:47,090 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.37 vs. limit=10.0 2023-11-25 22:09:58,932 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3085153.3333333335, ans=0.125 2023-11-25 22:10:02,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3085153.3333333335, ans=0.1 2023-11-25 22:10:12,171 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.279e+01 8.550e+01 9.214e+01 9.901e+01 1.645e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-25 22:10:12,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3085220.0, ans=0.0 2023-11-25 22:10:24,290 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 462800 2023-11-25 22:10:26,489 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.60 vs. limit=15.0 2023-11-25 22:10:27,210 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3085286.6666666665, ans=0.0 2023-11-25 22:10:30,185 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 5900, loss[loss=0.06647, simple_loss=0.08604, pruned_loss=0.0143, audio_tagging_loss=0.009152, over 14908.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.09019, pruned_loss=0.01268, audio_tagging_loss=0.009005, over 3037772.74 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:10:37,546 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3085353.3333333335, ans=0.07 2023-11-25 22:10:46,887 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3085420.0, ans=0.2 2023-11-25 22:10:48,932 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3085420.0, ans=0.125 2023-11-25 22:11:19,902 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 462850 2023-11-25 22:11:24,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3085686.6666666665, ans=0.125 2023-11-25 22:11:24,928 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 5950, loss[loss=0.06256, simple_loss=0.0862, pruned_loss=0.01115, audio_tagging_loss=0.008312, over 16343.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.09052, pruned_loss=0.0126, audio_tagging_loss=0.008972, over 3045056.59 frames. ], batch size: 62, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:11:37,841 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3085753.3333333335, ans=0.1 2023-11-25 22:12:02,512 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.382e+01 8.667e+01 9.136e+01 9.803e+01 1.331e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-25 22:12:07,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3085953.3333333335, ans=0.1 2023-11-25 22:12:14,021 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 462900 2023-11-25 22:12:14,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3085953.3333333335, ans=0.0 2023-11-25 22:12:19,223 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 6000, loss[loss=0.07359, simple_loss=0.1022, pruned_loss=0.01335, audio_tagging_loss=0.009146, over 15242.00 frames. ], tot_loss[loss=0.06674, simple_loss=0.09034, pruned_loss=0.01264, audio_tagging_loss=0.008933, over 3041510.58 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:12:19,224 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-25 22:12:50,932 INFO [train_asr.py:1267] (3/4) Epoch 39, validation: loss=0.05816, simple_loss=0.05073, pruned_loss=0.00518, audio_tagging_loss=0.02762, over 4681554.00 frames. 2023-11-25 22:12:50,934 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-25 22:13:06,224 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3086086.6666666665, ans=0.1 2023-11-25 22:13:16,716 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.12 vs. limit=15.0 2023-11-25 22:13:21,404 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3086153.3333333335, ans=0.125 2023-11-25 22:13:31,820 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 22:13:36,717 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3086286.6666666665, ans=0.1 2023-11-25 22:13:40,660 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 462950 2023-11-25 22:13:45,810 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 6050, loss[loss=0.04569, simple_loss=0.06191, pruned_loss=0.004853, audio_tagging_loss=0.009881, over 15260.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.08969, pruned_loss=0.01264, audio_tagging_loss=0.008927, over 3047171.24 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:13:49,197 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-25 22:13:53,343 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3086353.3333333335, ans=0.0 2023-11-25 22:14:23,447 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.667e+01 8.707e+01 9.356e+01 1.011e+02 1.518e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-25 22:14:25,920 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3086553.3333333335, ans=0.1 2023-11-25 22:14:32,117 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 22:14:33,234 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3086620.0, ans=0.2 2023-11-25 22:14:35,137 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 463000 2023-11-25 22:14:40,564 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 6100, loss[loss=0.066, simple_loss=0.07919, pruned_loss=0.016, audio_tagging_loss=0.0104, over 15116.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.08926, pruned_loss=0.01262, audio_tagging_loss=0.008976, over 3044188.23 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:14:41,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3086686.6666666665, ans=0.2 2023-11-25 22:14:44,131 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.66 vs. limit=12.0 2023-11-25 22:14:44,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3086686.6666666665, ans=0.0 2023-11-25 22:14:46,012 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3086686.6666666665, ans=0.0 2023-11-25 22:14:49,292 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3086686.6666666665, ans=0.125 2023-11-25 22:14:51,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3086753.3333333335, ans=0.0 2023-11-25 22:14:56,657 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff3.min_abs, batch_count=3086753.3333333335, ans=0.2 2023-11-25 22:15:25,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3086953.3333333335, ans=0.2 2023-11-25 22:15:29,937 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 463050 2023-11-25 22:15:36,143 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 6150, loss[loss=0.05773, simple_loss=0.0767, pruned_loss=0.01068, audio_tagging_loss=0.008699, over 14535.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.089, pruned_loss=0.01253, audio_tagging_loss=0.009036, over 3039408.15 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:15:36,542 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.37 vs. limit=15.0 2023-11-25 22:15:51,917 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3087086.6666666665, ans=0.5 2023-11-25 22:16:00,356 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3087153.3333333335, ans=0.0 2023-11-25 22:16:08,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3087220.0, ans=0.2 2023-11-25 22:16:13,127 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3087220.0, ans=0.1 2023-11-25 22:16:14,309 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.737e+01 8.733e+01 9.242e+01 9.873e+01 1.239e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-25 22:16:23,024 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3087286.6666666665, ans=0.125 2023-11-25 22:16:26,395 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 463100 2023-11-25 22:16:30,016 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.84 vs. limit=6.0 2023-11-25 22:16:31,579 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 6200, loss[loss=0.05227, simple_loss=0.07082, pruned_loss=0.006034, audio_tagging_loss=0.01083, over 15390.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08808, pruned_loss=0.01232, audio_tagging_loss=0.009199, over 3033130.69 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:16:32,943 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3087353.3333333335, ans=0.125 2023-11-25 22:16:37,702 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.24 vs. limit=5.0 2023-11-25 22:16:50,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3087420.0, ans=0.0 2023-11-25 22:17:01,119 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3087486.6666666665, ans=0.1 2023-11-25 22:17:04,578 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3087553.3333333335, ans=0.0 2023-11-25 22:17:04,872 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.66 vs. limit=10.0 2023-11-25 22:17:17,661 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3087620.0, ans=0.125 2023-11-25 22:17:20,717 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 463150 2023-11-25 22:17:25,791 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 6250, loss[loss=0.08618, simple_loss=0.119, pruned_loss=0.01695, audio_tagging_loss=0.009731, over 15170.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.08895, pruned_loss=0.01256, audio_tagging_loss=0.00931, over 3043350.69 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:17:30,391 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.83 vs. limit=12.0 2023-11-25 22:17:51,066 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3087820.0, ans=0.125 2023-11-25 22:17:51,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3087820.0, ans=0.0 2023-11-25 22:17:52,312 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3087820.0, ans=0.125 2023-11-25 22:17:55,671 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.83 vs. limit=15.0 2023-11-25 22:18:04,819 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.384e+01 8.687e+01 9.120e+01 9.665e+01 2.497e+02, threshold=1.824e+02, percent-clipped=1.0 2023-11-25 22:18:04,977 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3087886.6666666665, ans=0.125 2023-11-25 22:18:09,541 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.83 vs. limit=15.0 2023-11-25 22:18:14,409 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=3087953.3333333335, ans=0.05 2023-11-25 22:18:14,572 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.46 vs. limit=15.0 2023-11-25 22:18:15,270 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 463200 2023-11-25 22:18:21,354 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 6300, loss[loss=0.05298, simple_loss=0.07345, pruned_loss=0.008602, audio_tagging_loss=0.007655, over 16659.00 frames. ], tot_loss[loss=0.06663, simple_loss=0.08917, pruned_loss=0.01264, audio_tagging_loss=0.009404, over 3051706.07 frames. ], batch size: 65, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:18:27,282 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 22:18:31,994 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3088086.6666666665, ans=0.125 2023-11-25 22:18:41,051 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3088086.6666666665, ans=0.125 2023-11-25 22:18:55,639 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3088220.0, ans=0.125 2023-11-25 22:18:57,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3088220.0, ans=0.1 2023-11-25 22:19:03,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3088220.0, ans=0.125 2023-11-25 22:19:11,729 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 463250 2023-11-25 22:19:17,017 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 6350, loss[loss=0.07194, simple_loss=0.1076, pruned_loss=0.009355, audio_tagging_loss=0.008793, over 15938.00 frames. ], tot_loss[loss=0.066, simple_loss=0.08814, pruned_loss=0.01246, audio_tagging_loss=0.009472, over 3052649.79 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:19:20,908 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3088353.3333333335, ans=0.2 2023-11-25 22:19:30,333 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3088420.0, ans=0.0 2023-11-25 22:19:30,490 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3088420.0, ans=0.2 2023-11-25 22:19:49,607 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3088553.3333333335, ans=0.2 2023-11-25 22:19:56,126 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.178e+01 8.465e+01 9.257e+01 9.775e+01 1.191e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-25 22:20:07,185 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 463300 2023-11-25 22:20:09,776 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.51 vs. limit=15.0 2023-11-25 22:20:12,521 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 6400, loss[loss=0.06184, simple_loss=0.07683, pruned_loss=0.01562, audio_tagging_loss=0.007808, over 14795.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.08888, pruned_loss=0.01268, audio_tagging_loss=0.009492, over 3052409.51 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:20:18,050 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3088686.6666666665, ans=0.125 2023-11-25 22:20:21,718 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.73 vs. limit=22.5 2023-11-25 22:20:24,460 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.69 vs. limit=15.0 2023-11-25 22:20:56,444 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3088953.3333333335, ans=0.0 2023-11-25 22:20:59,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3088953.3333333335, ans=0.0 2023-11-25 22:21:01,549 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 463350 2023-11-25 22:21:02,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3088953.3333333335, ans=0.1 2023-11-25 22:21:06,708 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 6450, loss[loss=0.0559, simple_loss=0.07668, pruned_loss=0.009902, audio_tagging_loss=0.007658, over 14572.00 frames. ], tot_loss[loss=0.0671, simple_loss=0.09001, pruned_loss=0.01265, audio_tagging_loss=0.009446, over 3043933.43 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:21:06,856 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3089020.0, ans=0.125 2023-11-25 22:21:16,725 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3089020.0, ans=0.1 2023-11-25 22:21:23,628 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.99 vs. limit=10.0 2023-11-25 22:21:29,987 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3089153.3333333335, ans=0.125 2023-11-25 22:21:34,476 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.93 vs. limit=22.5 2023-11-25 22:21:45,295 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.28 vs. limit=15.0 2023-11-25 22:21:45,547 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.108e+01 8.513e+01 9.181e+01 1.004e+02 1.135e+02, threshold=1.836e+02, percent-clipped=0.0 2023-11-25 22:21:54,248 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.63 vs. limit=6.0 2023-11-25 22:21:56,997 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 463400 2023-11-25 22:22:03,108 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 6500, loss[loss=0.08152, simple_loss=0.1056, pruned_loss=0.01948, audio_tagging_loss=0.00921, over 14592.00 frames. ], tot_loss[loss=0.06752, simple_loss=0.09097, pruned_loss=0.01271, audio_tagging_loss=0.009329, over 3045521.41 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:22:03,279 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3089353.3333333335, ans=0.2 2023-11-25 22:22:06,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3089353.3333333335, ans=0.125 2023-11-25 22:22:13,181 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3089420.0, ans=0.125 2023-11-25 22:22:16,381 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3089420.0, ans=0.0 2023-11-25 22:22:28,784 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=3089486.6666666665, ans=0.025 2023-11-25 22:22:34,544 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3089553.3333333335, ans=0.0 2023-11-25 22:22:47,399 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3089620.0, ans=0.1 2023-11-25 22:22:52,364 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 463450 2023-11-25 22:22:58,094 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 6550, loss[loss=0.072, simple_loss=0.1026, pruned_loss=0.01091, audio_tagging_loss=0.009771, over 14453.00 frames. ], tot_loss[loss=0.06807, simple_loss=0.09197, pruned_loss=0.01295, audio_tagging_loss=0.009139, over 3046878.88 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:23:00,328 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3089686.6666666665, ans=0.125 2023-11-25 22:23:36,506 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.082e+01 8.654e+01 9.097e+01 9.635e+01 1.212e+02, threshold=1.819e+02, percent-clipped=0.0 2023-11-25 22:23:47,519 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 463500 2023-11-25 22:23:52,786 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 6600, loss[loss=0.06894, simple_loss=0.09653, pruned_loss=0.01385, audio_tagging_loss=0.006829, over 15584.00 frames. ], tot_loss[loss=0.06728, simple_loss=0.09091, pruned_loss=0.0128, audio_tagging_loss=0.009025, over 3045006.32 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:24:20,519 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.13 vs. limit=22.5 2023-11-25 22:24:36,228 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3090286.6666666665, ans=0.1 2023-11-25 22:24:38,903 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.78 vs. limit=15.0 2023-11-25 22:24:39,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3090286.6666666665, ans=0.125 2023-11-25 22:24:43,430 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 463550 2023-11-25 22:24:49,298 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 6650, loss[loss=0.06912, simple_loss=0.08868, pruned_loss=0.01673, audio_tagging_loss=0.008046, over 14166.00 frames. ], tot_loss[loss=0.06766, simple_loss=0.09147, pruned_loss=0.01295, audio_tagging_loss=0.008969, over 3042015.37 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:25:27,777 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.165e+01 8.685e+01 9.321e+01 9.946e+01 1.416e+02, threshold=1.864e+02, percent-clipped=0.0 2023-11-25 22:25:38,703 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 463600 2023-11-25 22:25:44,053 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 6700, loss[loss=0.05509, simple_loss=0.06437, pruned_loss=0.01007, audio_tagging_loss=0.01284, over 15575.00 frames. ], tot_loss[loss=0.06741, simple_loss=0.0911, pruned_loss=0.01291, audio_tagging_loss=0.008953, over 3042490.58 frames. ], batch size: 61, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:25:51,078 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3090686.6666666665, ans=10.0 2023-11-25 22:26:08,807 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3090820.0, ans=0.125 2023-11-25 22:26:33,552 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 463650 2023-11-25 22:26:38,701 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 6750, loss[loss=0.07917, simple_loss=0.1013, pruned_loss=0.017, audio_tagging_loss=0.01153, over 15599.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.09023, pruned_loss=0.01272, audio_tagging_loss=0.009043, over 3044287.51 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:26:41,026 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3091020.0, ans=0.125 2023-11-25 22:26:49,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3091086.6666666665, ans=0.125 2023-11-25 22:26:56,357 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.06 vs. limit=15.0 2023-11-25 22:27:05,505 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3091153.3333333335, ans=0.125 2023-11-25 22:27:13,978 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3091220.0, ans=0.125 2023-11-25 22:27:15,382 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=3091220.0, ans=0.02 2023-11-25 22:27:17,262 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.384e+01 8.613e+01 9.173e+01 9.716e+01 1.152e+02, threshold=1.835e+02, percent-clipped=0.0 2023-11-25 22:27:24,779 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3091286.6666666665, ans=0.125 2023-11-25 22:27:28,296 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 463700 2023-11-25 22:27:33,940 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 6800, loss[loss=0.07289, simple_loss=0.1034, pruned_loss=0.01395, audio_tagging_loss=0.007241, over 15436.00 frames. ], tot_loss[loss=0.06721, simple_loss=0.09067, pruned_loss=0.01284, audio_tagging_loss=0.009034, over 3039751.69 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 22:27:40,139 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.02 vs. limit=22.5 2023-11-25 22:27:43,030 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3091353.3333333335, ans=0.2 2023-11-25 22:27:58,210 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3091486.6666666665, ans=0.1 2023-11-25 22:27:58,566 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.04 vs. limit=15.0 2023-11-25 22:28:01,794 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.41 vs. limit=15.0 2023-11-25 22:28:12,907 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3091553.3333333335, ans=0.0 2023-11-25 22:28:23,728 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 463750 2023-11-25 22:28:26,038 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3091620.0, ans=0.1 2023-11-25 22:28:28,852 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 6850, loss[loss=0.07009, simple_loss=0.09972, pruned_loss=0.01256, audio_tagging_loss=0.007666, over 14913.00 frames. ], tot_loss[loss=0.06777, simple_loss=0.09163, pruned_loss=0.01297, audio_tagging_loss=0.008989, over 3042530.45 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 22:28:36,309 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3091686.6666666665, ans=0.0 2023-11-25 22:28:37,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3091686.6666666665, ans=0.125 2023-11-25 22:28:45,391 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3091753.3333333335, ans=0.09899494936611666 2023-11-25 22:28:53,172 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3091820.0, ans=0.0 2023-11-25 22:28:54,761 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3091820.0, ans=0.0 2023-11-25 22:28:56,782 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3091820.0, ans=0.1 2023-11-25 22:28:57,946 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3091820.0, ans=0.125 2023-11-25 22:29:05,745 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3091886.6666666665, ans=0.2 2023-11-25 22:29:07,614 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.919e+01 8.654e+01 9.393e+01 1.015e+02 1.220e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-25 22:29:18,079 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 463800 2023-11-25 22:29:23,607 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 6900, loss[loss=0.07565, simple_loss=0.1034, pruned_loss=0.01626, audio_tagging_loss=0.007689, over 14595.00 frames. ], tot_loss[loss=0.06787, simple_loss=0.09199, pruned_loss=0.013, audio_tagging_loss=0.008882, over 3047490.05 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 22:29:38,240 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-25 22:29:39,248 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3092086.6666666665, ans=10.0 2023-11-25 22:29:56,586 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3092220.0, ans=0.05 2023-11-25 22:29:56,636 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3092220.0, ans=0.5 2023-11-25 22:30:07,024 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3092286.6666666665, ans=0.1 2023-11-25 22:30:08,456 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 22:30:09,647 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3092286.6666666665, ans=0.125 2023-11-25 22:30:13,783 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 463850 2023-11-25 22:30:20,049 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 6950, loss[loss=0.05418, simple_loss=0.06553, pruned_loss=0.008527, audio_tagging_loss=0.01289, over 13686.00 frames. ], tot_loss[loss=0.06784, simple_loss=0.09197, pruned_loss=0.01294, audio_tagging_loss=0.008912, over 3048456.37 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 22:30:37,824 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.18 vs. limit=10.0 2023-11-25 22:30:40,715 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 22:30:45,935 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3092486.6666666665, ans=0.0 2023-11-25 22:30:54,940 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3092553.3333333335, ans=0.0 2023-11-25 22:30:56,082 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3092553.3333333335, ans=0.05 2023-11-25 22:30:58,398 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.370e+01 8.697e+01 9.205e+01 9.794e+01 1.442e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-25 22:30:58,652 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3092553.3333333335, ans=0.0 2023-11-25 22:31:09,782 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 463900 2023-11-25 22:31:14,232 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3092686.6666666665, ans=0.0 2023-11-25 22:31:15,081 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 7000, loss[loss=0.08846, simple_loss=0.119, pruned_loss=0.02145, audio_tagging_loss=0.007485, over 15226.00 frames. ], tot_loss[loss=0.06785, simple_loss=0.09198, pruned_loss=0.01292, audio_tagging_loss=0.00894, over 3047645.54 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 22:31:16,410 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3092686.6666666665, ans=0.125 2023-11-25 22:31:35,550 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3092820.0, ans=0.2 2023-11-25 22:32:04,326 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 463950 2023-11-25 22:32:09,447 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 7050, loss[loss=0.06327, simple_loss=0.08783, pruned_loss=0.01294, audio_tagging_loss=0.006414, over 17198.00 frames. ], tot_loss[loss=0.06735, simple_loss=0.09098, pruned_loss=0.01286, audio_tagging_loss=0.009001, over 3053088.95 frames. ], batch size: 66, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 22:32:10,721 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3093020.0, ans=0.0 2023-11-25 22:32:11,070 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.25 vs. limit=15.0 2023-11-25 22:32:17,106 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3093020.0, ans=0.0 2023-11-25 22:32:28,513 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-25 22:32:29,478 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3093086.6666666665, ans=0.025 2023-11-25 22:32:35,774 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3093153.3333333335, ans=0.0 2023-11-25 22:32:48,077 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.379e+01 8.460e+01 9.019e+01 9.979e+01 1.338e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-25 22:32:58,723 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 464000 2023-11-25 22:33:05,086 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3093286.6666666665, ans=0.125 2023-11-25 22:33:07,454 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 7100, loss[loss=0.07126, simple_loss=0.09775, pruned_loss=0.01359, audio_tagging_loss=0.008795, over 15132.00 frames. ], tot_loss[loss=0.06768, simple_loss=0.09141, pruned_loss=0.01292, audio_tagging_loss=0.009056, over 3058969.76 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 22:33:57,200 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 464050 2023-11-25 22:33:57,332 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3093620.0, ans=0.0 2023-11-25 22:34:02,475 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 7150, loss[loss=0.06354, simple_loss=0.08777, pruned_loss=0.01028, audio_tagging_loss=0.009371, over 14819.00 frames. ], tot_loss[loss=0.06754, simple_loss=0.09116, pruned_loss=0.01285, audio_tagging_loss=0.009111, over 3064631.41 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 22:34:15,758 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.49 vs. limit=15.0 2023-11-25 22:34:18,507 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3093753.3333333335, ans=0.0 2023-11-25 22:34:18,527 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-25 22:34:28,704 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=3093820.0, ans=15.0 2023-11-25 22:34:30,937 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3093820.0, ans=0.04949747468305833 2023-11-25 22:34:40,749 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.230e+01 8.669e+01 9.271e+01 1.002e+02 1.351e+02, threshold=1.854e+02, percent-clipped=0.0 2023-11-25 22:34:46,317 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3093953.3333333335, ans=0.0 2023-11-25 22:34:51,297 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 464100 2023-11-25 22:34:56,591 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 7200, loss[loss=0.06589, simple_loss=0.08723, pruned_loss=0.009668, audio_tagging_loss=0.01261, over 15525.00 frames. ], tot_loss[loss=0.06758, simple_loss=0.09122, pruned_loss=0.01282, audio_tagging_loss=0.009153, over 3060402.83 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 22:35:06,559 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.84 vs. limit=15.0 2023-11-25 22:35:23,095 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3094153.3333333335, ans=0.0 2023-11-25 22:35:32,351 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3094220.0, ans=0.0 2023-11-25 22:35:45,898 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 464150 2023-11-25 22:35:50,319 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3094353.3333333335, ans=0.125 2023-11-25 22:35:51,592 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 7250, loss[loss=0.05382, simple_loss=0.06437, pruned_loss=0.00882, audio_tagging_loss=0.01282, over 14949.00 frames. ], tot_loss[loss=0.06789, simple_loss=0.0915, pruned_loss=0.0129, audio_tagging_loss=0.009238, over 3049767.35 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:36:20,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3094486.6666666665, ans=0.125 2023-11-25 22:36:29,323 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3094553.3333333335, ans=0.0 2023-11-25 22:36:31,149 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.617e+01 8.827e+01 9.307e+01 1.005e+02 1.461e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-25 22:36:41,820 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3094620.0, ans=0.125 2023-11-25 22:36:42,602 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 464200 2023-11-25 22:36:46,174 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3094620.0, ans=0.2 2023-11-25 22:36:46,419 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.71 vs. limit=22.5 2023-11-25 22:36:48,016 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 7300, loss[loss=0.08737, simple_loss=0.1152, pruned_loss=0.02132, audio_tagging_loss=0.00846, over 16753.00 frames. ], tot_loss[loss=0.0677, simple_loss=0.09148, pruned_loss=0.01284, audio_tagging_loss=0.009125, over 3051066.41 frames. ], batch size: 62, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:36:57,768 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.26 vs. limit=15.0 2023-11-25 22:37:21,970 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3094886.6666666665, ans=0.125 2023-11-25 22:37:35,510 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.18 vs. limit=15.0 2023-11-25 22:37:37,117 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 464250 2023-11-25 22:37:42,318 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 7350, loss[loss=0.05864, simple_loss=0.08473, pruned_loss=0.008559, audio_tagging_loss=0.007713, over 14823.00 frames. ], tot_loss[loss=0.06717, simple_loss=0.09073, pruned_loss=0.01279, audio_tagging_loss=0.009013, over 3049628.72 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:37:43,607 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3095020.0, ans=0.0 2023-11-25 22:37:50,047 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.91 vs. limit=15.0 2023-11-25 22:37:53,950 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3095086.6666666665, ans=0.0 2023-11-25 22:37:55,054 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3095086.6666666665, ans=0.125 2023-11-25 22:37:58,803 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3095086.6666666665, ans=0.125 2023-11-25 22:38:05,571 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3095153.3333333335, ans=0.125 2023-11-25 22:38:23,146 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.376e+01 8.699e+01 9.551e+01 1.020e+02 2.458e+02, threshold=1.910e+02, percent-clipped=1.0 2023-11-25 22:38:31,657 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 464300 2023-11-25 22:38:36,925 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 7400, loss[loss=0.06562, simple_loss=0.08941, pruned_loss=0.01206, audio_tagging_loss=0.008849, over 14305.00 frames. ], tot_loss[loss=0.06723, simple_loss=0.09087, pruned_loss=0.01286, audio_tagging_loss=0.008932, over 3045721.28 frames. ], batch size: 52, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:38:54,458 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3095420.0, ans=0.0 2023-11-25 22:38:59,769 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3095486.6666666665, ans=0.125 2023-11-25 22:39:18,854 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.70 vs. limit=10.0 2023-11-25 22:39:19,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3095620.0, ans=0.125 2023-11-25 22:39:24,883 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3095620.0, ans=0.125 2023-11-25 22:39:26,756 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 464350 2023-11-25 22:39:32,920 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 7450, loss[loss=0.0745, simple_loss=0.1029, pruned_loss=0.01709, audio_tagging_loss=0.00597, over 16216.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.09022, pruned_loss=0.01276, audio_tagging_loss=0.008944, over 3043524.09 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:39:51,070 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.28 vs. limit=6.0 2023-11-25 22:39:57,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3095820.0, ans=0.125 2023-11-25 22:40:13,439 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.982e+01 8.803e+01 9.393e+01 1.013e+02 1.307e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-25 22:40:15,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3095953.3333333335, ans=0.125 2023-11-25 22:40:21,927 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 464400 2023-11-25 22:40:25,959 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=3095953.3333333335, ans=15.0 2023-11-25 22:40:27,441 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 7500, loss[loss=0.0486, simple_loss=0.05816, pruned_loss=0.008248, audio_tagging_loss=0.01128, over 14815.00 frames. ], tot_loss[loss=0.06719, simple_loss=0.09086, pruned_loss=0.01288, audio_tagging_loss=0.008879, over 3050249.31 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:40:28,602 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3096020.0, ans=0.125 2023-11-25 22:40:39,282 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3096086.6666666665, ans=0.0 2023-11-25 22:40:41,831 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.81 vs. limit=12.0 2023-11-25 22:40:45,231 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3096086.6666666665, ans=0.0 2023-11-25 22:40:48,927 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3096153.3333333335, ans=0.0 2023-11-25 22:40:58,535 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.31 vs. limit=12.0 2023-11-25 22:41:03,720 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.07 vs. limit=15.0 2023-11-25 22:41:05,529 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3096220.0, ans=0.09899494936611666 2023-11-25 22:41:10,143 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.13 vs. limit=15.0 2023-11-25 22:41:16,962 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 464450 2023-11-25 22:41:22,276 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 7550, loss[loss=0.0502, simple_loss=0.06799, pruned_loss=0.007902, audio_tagging_loss=0.008302, over 15471.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.0897, pruned_loss=0.01278, audio_tagging_loss=0.00886, over 3046773.96 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:41:28,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3096353.3333333335, ans=0.125 2023-11-25 22:41:35,458 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3096420.0, ans=0.125 2023-11-25 22:41:36,467 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3096420.0, ans=0.125 2023-11-25 22:41:37,026 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.12 vs. limit=15.0 2023-11-25 22:41:59,040 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3096553.3333333335, ans=0.125 2023-11-25 22:42:02,990 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.821e+01 8.728e+01 9.410e+01 1.018e+02 1.180e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-25 22:42:12,504 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 464500 2023-11-25 22:42:12,617 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3096620.0, ans=0.1 2023-11-25 22:42:18,134 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 7600, loss[loss=0.04308, simple_loss=0.05576, pruned_loss=0.005431, audio_tagging_loss=0.00977, over 15560.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.08867, pruned_loss=0.0126, audio_tagging_loss=0.009001, over 3050445.92 frames. ], batch size: 61, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:42:43,173 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3096820.0, ans=0.025 2023-11-25 22:42:57,290 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.93 vs. limit=6.0 2023-11-25 22:43:07,187 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.63 vs. limit=15.0 2023-11-25 22:43:07,865 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 464550 2023-11-25 22:43:13,167 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 7650, loss[loss=0.06303, simple_loss=0.08556, pruned_loss=0.01084, audio_tagging_loss=0.009412, over 14758.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.08921, pruned_loss=0.01269, audio_tagging_loss=0.008977, over 3049608.39 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:43:17,597 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3097020.0, ans=0.125 2023-11-25 22:43:23,848 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3097086.6666666665, ans=0.0 2023-11-25 22:43:23,912 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3097086.6666666665, ans=0.125 2023-11-25 22:43:25,950 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3097086.6666666665, ans=0.0 2023-11-25 22:43:55,108 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.389e+01 8.616e+01 9.118e+01 9.857e+01 1.270e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-25 22:43:57,501 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3097286.6666666665, ans=0.0 2023-11-25 22:43:57,715 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.71 vs. limit=12.0 2023-11-25 22:43:58,505 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3097286.6666666665, ans=0.0 2023-11-25 22:44:02,516 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 464600 2023-11-25 22:44:02,678 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3097286.6666666665, ans=0.125 2023-11-25 22:44:08,190 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 7700, loss[loss=0.06227, simple_loss=0.08987, pruned_loss=0.008308, audio_tagging_loss=0.009027, over 13368.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.08869, pruned_loss=0.01243, audio_tagging_loss=0.009046, over 3039590.04 frames. ], batch size: 52, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:44:12,672 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3097353.3333333335, ans=0.07 2023-11-25 22:44:34,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3097486.6666666665, ans=0.2 2023-11-25 22:44:44,359 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.36 vs. limit=15.0 2023-11-25 22:44:49,382 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3097553.3333333335, ans=0.125 2023-11-25 22:44:57,412 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3097620.0, ans=0.2 2023-11-25 22:44:58,687 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 464650 2023-11-25 22:45:04,316 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 7750, loss[loss=0.08506, simple_loss=0.1114, pruned_loss=0.01737, audio_tagging_loss=0.01197, over 15661.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.08961, pruned_loss=0.01252, audio_tagging_loss=0.009086, over 3043874.41 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:45:08,753 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3097686.6666666665, ans=0.2 2023-11-25 22:45:10,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3097686.6666666665, ans=0.0 2023-11-25 22:45:39,262 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.21 vs. limit=15.0 2023-11-25 22:45:40,701 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3097886.6666666665, ans=0.125 2023-11-25 22:45:43,739 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3097886.6666666665, ans=0.125 2023-11-25 22:45:46,218 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.373e+01 8.755e+01 9.240e+01 9.987e+01 1.306e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-25 22:45:53,706 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 464700 2023-11-25 22:45:53,961 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3097953.3333333335, ans=0.125 2023-11-25 22:45:56,965 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 22:45:59,384 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 7800, loss[loss=0.05172, simple_loss=0.06512, pruned_loss=0.01123, audio_tagging_loss=0.007939, over 14196.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.08984, pruned_loss=0.01254, audio_tagging_loss=0.008981, over 3036942.40 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:46:09,519 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.84 vs. limit=12.0 2023-11-25 22:46:21,442 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.47 vs. limit=15.0 2023-11-25 22:46:23,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3098153.3333333335, ans=0.0 2023-11-25 22:46:23,806 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-25 22:46:25,123 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.00 vs. limit=22.5 2023-11-25 22:46:43,809 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3098286.6666666665, ans=0.04949747468305833 2023-11-25 22:46:45,781 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3098286.6666666665, ans=0.125 2023-11-25 22:46:48,783 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 464750 2023-11-25 22:46:50,079 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3098286.6666666665, ans=0.1 2023-11-25 22:46:52,134 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3098286.6666666665, ans=0.125 2023-11-25 22:46:54,005 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 7850, loss[loss=0.0724, simple_loss=0.1076, pruned_loss=0.01189, audio_tagging_loss=0.006693, over 15288.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.08973, pruned_loss=0.01259, audio_tagging_loss=0.009043, over 3035321.45 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:46:54,482 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.80 vs. limit=6.0 2023-11-25 22:47:01,468 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3098353.3333333335, ans=0.125 2023-11-25 22:47:35,821 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.757e+01 8.791e+01 9.341e+01 1.014e+02 1.334e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-25 22:47:43,167 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 464800 2023-11-25 22:47:49,381 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 7900, loss[loss=0.0643, simple_loss=0.08913, pruned_loss=0.01113, audio_tagging_loss=0.008613, over 15281.00 frames. ], tot_loss[loss=0.06673, simple_loss=0.08999, pruned_loss=0.01262, audio_tagging_loss=0.009114, over 3036986.19 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:47:54,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3098686.6666666665, ans=0.0 2023-11-25 22:48:06,870 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3098753.3333333335, ans=0.2 2023-11-25 22:48:12,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3098820.0, ans=0.0 2023-11-25 22:48:18,846 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3098820.0, ans=0.0 2023-11-25 22:48:39,184 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 464850 2023-11-25 22:48:44,373 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 7950, loss[loss=0.06639, simple_loss=0.0924, pruned_loss=0.01287, audio_tagging_loss=0.007316, over 15797.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.09015, pruned_loss=0.01265, audio_tagging_loss=0.009145, over 3036276.09 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:48:55,589 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3099086.6666666665, ans=0.125 2023-11-25 22:48:58,463 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 22:49:00,860 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3099086.6666666665, ans=0.0 2023-11-25 22:49:16,269 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=3099153.3333333335, ans=6.0 2023-11-25 22:49:20,728 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.50 vs. limit=15.0 2023-11-25 22:49:21,157 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3099220.0, ans=0.0 2023-11-25 22:49:26,281 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.637e+01 8.696e+01 9.334e+01 1.006e+02 1.500e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-25 22:49:26,608 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3099220.0, ans=0.04949747468305833 2023-11-25 22:49:34,209 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 464900 2023-11-25 22:49:39,391 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 8000, loss[loss=0.04811, simple_loss=0.06023, pruned_loss=0.006974, audio_tagging_loss=0.01102, over 13800.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.09002, pruned_loss=0.0126, audio_tagging_loss=0.009171, over 3038991.06 frames. ], batch size: 54, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:49:41,911 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.28 vs. limit=6.0 2023-11-25 22:49:49,123 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3099420.0, ans=0.2 2023-11-25 22:49:52,991 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3099420.0, ans=0.125 2023-11-25 22:49:57,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3099420.0, ans=0.1 2023-11-25 22:50:01,512 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.27 vs. limit=12.0 2023-11-25 22:50:10,443 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3099486.6666666665, ans=0.125 2023-11-25 22:50:28,790 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 464950 2023-11-25 22:50:34,225 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3099686.6666666665, ans=0.125 2023-11-25 22:50:34,985 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 8050, loss[loss=0.05036, simple_loss=0.0616, pruned_loss=0.005637, audio_tagging_loss=0.01393, over 15908.00 frames. ], tot_loss[loss=0.06663, simple_loss=0.08976, pruned_loss=0.01255, audio_tagging_loss=0.009198, over 3039896.14 frames. ], batch size: 61, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:50:44,567 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.08 vs. limit=15.0 2023-11-25 22:50:48,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3099753.3333333335, ans=0.2 2023-11-25 22:51:16,987 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.196e+01 8.622e+01 9.226e+01 9.839e+01 1.205e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-25 22:51:19,273 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3099953.3333333335, ans=0.0 2023-11-25 22:51:24,928 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 465000 2023-11-25 22:51:27,341 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3099953.3333333335, ans=0.05 2023-11-25 22:51:30,379 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 8100, loss[loss=0.046, simple_loss=0.05993, pruned_loss=0.005699, audio_tagging_loss=0.01033, over 15789.00 frames. ], tot_loss[loss=0.06675, simple_loss=0.08984, pruned_loss=0.01266, audio_tagging_loss=0.009176, over 3040488.94 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:51:43,453 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.38 vs. limit=15.0 2023-11-25 22:51:57,028 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.02 vs. limit=15.0 2023-11-25 22:52:14,753 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.05 vs. limit=15.0 2023-11-25 22:52:19,613 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 465050 2023-11-25 22:52:24,811 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 8150, loss[loss=0.06531, simple_loss=0.08741, pruned_loss=0.0118, audio_tagging_loss=0.009805, over 15080.00 frames. ], tot_loss[loss=0.06713, simple_loss=0.09068, pruned_loss=0.01281, audio_tagging_loss=0.008981, over 3041438.71 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:52:25,034 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-25 22:52:25,370 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.54 vs. limit=22.5 2023-11-25 22:52:31,770 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=3100353.3333333335, ans=0.1 2023-11-25 22:52:33,971 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3100353.3333333335, ans=0.125 2023-11-25 22:52:37,117 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3100420.0, ans=0.0 2023-11-25 22:52:42,262 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3100420.0, ans=0.125 2023-11-25 22:53:06,745 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.378e+01 8.506e+01 9.069e+01 1.015e+02 1.632e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-25 22:53:11,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3100620.0, ans=0.125 2023-11-25 22:53:13,883 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3100620.0, ans=0.2 2023-11-25 22:53:14,751 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 465100 2023-11-25 22:53:20,588 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 8200, loss[loss=0.03938, simple_loss=0.05035, pruned_loss=0.006698, audio_tagging_loss=0.00751, over 15417.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.09038, pruned_loss=0.01265, audio_tagging_loss=0.008995, over 3044416.83 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:53:23,173 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 22:53:33,447 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3100753.3333333335, ans=0.0 2023-11-25 22:53:42,234 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.50 vs. limit=15.0 2023-11-25 22:53:44,809 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3100820.0, ans=0.125 2023-11-25 22:53:47,863 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3100820.0, ans=0.125 2023-11-25 22:53:51,056 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 22:53:55,842 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3100886.6666666665, ans=0.0 2023-11-25 22:54:02,561 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3100886.6666666665, ans=0.125 2023-11-25 22:54:11,016 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 465150 2023-11-25 22:54:12,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3100953.3333333335, ans=0.125 2023-11-25 22:54:16,188 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 8250, loss[loss=0.0647, simple_loss=0.09038, pruned_loss=0.01235, audio_tagging_loss=0.00716, over 16134.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.09033, pruned_loss=0.0126, audio_tagging_loss=0.008924, over 3046837.03 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:54:20,080 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.09 vs. limit=22.5 2023-11-25 22:54:20,984 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.99 vs. limit=22.5 2023-11-25 22:54:58,439 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.486e+01 8.606e+01 9.259e+01 1.021e+02 1.240e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-25 22:55:04,731 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 465200 2023-11-25 22:55:10,185 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 8300, loss[loss=0.07578, simple_loss=0.09964, pruned_loss=0.01486, audio_tagging_loss=0.0111, over 15534.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.0897, pruned_loss=0.01239, audio_tagging_loss=0.009017, over 3052317.83 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:55:15,654 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3101353.3333333335, ans=0.125 2023-11-25 22:55:21,280 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3101420.0, ans=10.0 2023-11-25 22:55:43,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3101553.3333333335, ans=0.125 2023-11-25 22:55:44,398 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3101553.3333333335, ans=0.0 2023-11-25 22:55:46,535 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3101553.3333333335, ans=0.2 2023-11-25 22:55:49,913 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.88 vs. limit=22.5 2023-11-25 22:55:58,982 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 465250 2023-11-25 22:56:04,593 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 8350, loss[loss=0.07847, simple_loss=0.1052, pruned_loss=0.01712, audio_tagging_loss=0.008753, over 15075.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.08941, pruned_loss=0.01244, audio_tagging_loss=0.009113, over 3046676.54 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 8.0 2023-11-25 22:56:11,634 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3101686.6666666665, ans=0.1 2023-11-25 22:56:14,222 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3101686.6666666665, ans=0.1 2023-11-25 22:56:46,889 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.088e+01 8.512e+01 9.293e+01 1.012e+02 1.242e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-25 22:56:54,224 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 465300 2023-11-25 22:56:59,880 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 8400, loss[loss=0.05729, simple_loss=0.08417, pruned_loss=0.006538, audio_tagging_loss=0.008667, over 14151.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.08939, pruned_loss=0.01248, audio_tagging_loss=0.008977, over 3041122.46 frames. ], batch size: 53, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:57:03,532 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.55 vs. limit=15.0 2023-11-25 22:57:03,614 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.09 vs. limit=15.0 2023-11-25 22:57:20,940 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3102153.3333333335, ans=0.125 2023-11-25 22:57:24,121 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3102153.3333333335, ans=0.0 2023-11-25 22:57:28,273 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3102153.3333333335, ans=0.125 2023-11-25 22:57:46,538 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3102286.6666666665, ans=0.1 2023-11-25 22:57:48,445 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 465350 2023-11-25 22:57:53,639 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 8450, loss[loss=0.07733, simple_loss=0.1035, pruned_loss=0.016, audio_tagging_loss=0.009586, over 16047.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.09009, pruned_loss=0.01254, audio_tagging_loss=0.008859, over 3042723.95 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:57:54,919 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-25 22:58:17,800 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3102486.6666666665, ans=0.07 2023-11-25 22:58:27,226 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.84 vs. limit=22.5 2023-11-25 22:58:33,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3102553.3333333335, ans=0.1 2023-11-25 22:58:35,876 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.363e+01 8.915e+01 9.393e+01 9.975e+01 1.301e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-25 22:58:39,601 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.13 vs. limit=22.5 2023-11-25 22:58:40,319 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3102620.0, ans=0.0 2023-11-25 22:58:41,389 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3102620.0, ans=0.0 2023-11-25 22:58:42,280 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 465400 2023-11-25 22:58:47,822 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 8500, loss[loss=0.05186, simple_loss=0.0711, pruned_loss=0.007653, audio_tagging_loss=0.008654, over 14809.00 frames. ], tot_loss[loss=0.06688, simple_loss=0.09045, pruned_loss=0.01274, audio_tagging_loss=0.008922, over 3039898.28 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:58:55,364 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 22:59:16,719 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3102820.0, ans=0.1 2023-11-25 22:59:22,286 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.90 vs. limit=22.5 2023-11-25 22:59:24,208 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3102886.6666666665, ans=0.2 2023-11-25 22:59:36,652 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.62 vs. limit=10.0 2023-11-25 22:59:37,806 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 465450 2023-11-25 22:59:43,605 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 8550, loss[loss=0.06976, simple_loss=0.09585, pruned_loss=0.01288, audio_tagging_loss=0.008962, over 15462.00 frames. ], tot_loss[loss=0.06682, simple_loss=0.09054, pruned_loss=0.01261, audio_tagging_loss=0.008941, over 3049022.10 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 22:59:46,270 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.74 vs. limit=15.0 2023-11-25 22:59:53,070 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3103086.6666666665, ans=0.125 2023-11-25 22:59:58,230 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3103086.6666666665, ans=0.125 2023-11-25 23:00:11,808 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3103153.3333333335, ans=0.125 2023-11-25 23:00:24,115 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3103220.0, ans=0.1 2023-11-25 23:00:25,865 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.126e+01 8.599e+01 9.050e+01 9.776e+01 1.276e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-25 23:00:27,217 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3103286.6666666665, ans=0.125 2023-11-25 23:00:31,368 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3103286.6666666665, ans=0.125 2023-11-25 23:00:32,221 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 465500 2023-11-25 23:00:37,359 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 8600, loss[loss=0.09044, simple_loss=0.1185, pruned_loss=0.02388, audio_tagging_loss=0.007316, over 15603.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.09033, pruned_loss=0.01269, audio_tagging_loss=0.008957, over 3050334.78 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:01:00,876 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3103486.6666666665, ans=0.0 2023-11-25 23:01:07,522 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.50 vs. limit=22.5 2023-11-25 23:01:26,018 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 465550 2023-11-25 23:01:31,131 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 8650, loss[loss=0.07444, simple_loss=0.09309, pruned_loss=0.01493, audio_tagging_loss=0.01296, over 16054.00 frames. ], tot_loss[loss=0.0672, simple_loss=0.09087, pruned_loss=0.01283, audio_tagging_loss=0.008931, over 3047295.63 frames. ], batch size: 62, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:01:39,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3103686.6666666665, ans=0.2 2023-11-25 23:01:42,341 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.82 vs. limit=15.0 2023-11-25 23:02:13,488 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.107e+01 8.616e+01 9.272e+01 9.852e+01 1.304e+02, threshold=1.854e+02, percent-clipped=0.0 2023-11-25 23:02:20,396 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 465600 2023-11-25 23:02:23,852 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3103953.3333333335, ans=0.2 2023-11-25 23:02:25,781 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 8700, loss[loss=0.06427, simple_loss=0.08707, pruned_loss=0.01183, audio_tagging_loss=0.008901, over 16176.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.09014, pruned_loss=0.01276, audio_tagging_loss=0.009083, over 3047635.25 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:02:31,238 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3104020.0, ans=0.125 2023-11-25 23:02:39,568 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3104086.6666666665, ans=0.125 2023-11-25 23:02:39,630 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3104086.6666666665, ans=0.125 2023-11-25 23:02:42,677 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3104086.6666666665, ans=0.125 2023-11-25 23:02:46,814 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3104153.3333333335, ans=0.2 2023-11-25 23:02:49,397 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.21 vs. limit=22.5 2023-11-25 23:03:12,925 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.13 vs. limit=15.0 2023-11-25 23:03:15,437 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 465650 2023-11-25 23:03:20,573 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 8750, loss[loss=0.06007, simple_loss=0.0708, pruned_loss=0.01324, audio_tagging_loss=0.01143, over 15316.00 frames. ], tot_loss[loss=0.06735, simple_loss=0.09088, pruned_loss=0.01282, audio_tagging_loss=0.00909, over 3043797.81 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:03:28,189 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3104353.3333333335, ans=0.1 2023-11-25 23:03:28,192 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3104353.3333333335, ans=0.0 2023-11-25 23:03:28,550 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.87 vs. limit=22.5 2023-11-25 23:03:29,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3104353.3333333335, ans=0.125 2023-11-25 23:03:33,812 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.56 vs. limit=12.0 2023-11-25 23:03:36,552 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3104420.0, ans=0.1 2023-11-25 23:03:36,587 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3104420.0, ans=0.125 2023-11-25 23:03:38,674 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3104420.0, ans=0.125 2023-11-25 23:03:45,383 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3104486.6666666665, ans=0.125 2023-11-25 23:04:03,117 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.234e+01 8.696e+01 9.362e+01 9.858e+01 1.375e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-25 23:04:07,548 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3104620.0, ans=0.125 2023-11-25 23:04:09,649 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 465700 2023-11-25 23:04:14,843 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 8800, loss[loss=0.05818, simple_loss=0.07363, pruned_loss=0.01175, audio_tagging_loss=0.009612, over 15283.00 frames. ], tot_loss[loss=0.06796, simple_loss=0.09179, pruned_loss=0.01296, audio_tagging_loss=0.009106, over 3044028.09 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:04:18,149 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3104686.6666666665, ans=0.0 2023-11-25 23:04:52,831 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3104886.6666666665, ans=0.125 2023-11-25 23:05:04,363 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 465750 2023-11-25 23:05:10,609 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 8850, loss[loss=0.06385, simple_loss=0.08863, pruned_loss=0.01169, audio_tagging_loss=0.007845, over 15286.00 frames. ], tot_loss[loss=0.06762, simple_loss=0.09106, pruned_loss=0.0129, audio_tagging_loss=0.009191, over 3039698.13 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:05:18,657 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3105020.0, ans=10.0 2023-11-25 23:05:23,104 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 23:05:28,895 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.66 vs. limit=15.0 2023-11-25 23:05:43,237 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3105220.0, ans=0.2 2023-11-25 23:05:43,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3105220.0, ans=0.0 2023-11-25 23:05:53,529 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.603e+01 8.482e+01 9.169e+01 1.001e+02 1.243e+02, threshold=1.834e+02, percent-clipped=0.0 2023-11-25 23:05:59,499 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3105286.6666666665, ans=0.125 2023-11-25 23:06:00,499 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 465800 2023-11-25 23:06:05,125 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3105353.3333333335, ans=0.125 2023-11-25 23:06:06,440 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 8900, loss[loss=0.06116, simple_loss=0.0825, pruned_loss=0.01109, audio_tagging_loss=0.008828, over 15285.00 frames. ], tot_loss[loss=0.06791, simple_loss=0.09168, pruned_loss=0.01302, audio_tagging_loss=0.009052, over 3050971.02 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:06:08,683 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3105353.3333333335, ans=0.1 2023-11-25 23:06:09,782 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3105353.3333333335, ans=0.09899494936611666 2023-11-25 23:06:12,246 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.07 vs. limit=12.0 2023-11-25 23:06:13,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3105353.3333333335, ans=0.125 2023-11-25 23:06:26,442 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3105486.6666666665, ans=0.0 2023-11-25 23:06:31,677 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3105486.6666666665, ans=0.0 2023-11-25 23:06:41,626 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3105553.3333333335, ans=0.0 2023-11-25 23:06:43,624 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3105553.3333333335, ans=0.125 2023-11-25 23:06:55,567 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 465850 2023-11-25 23:07:00,854 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 8950, loss[loss=0.09319, simple_loss=0.1253, pruned_loss=0.02257, audio_tagging_loss=0.007962, over 15007.00 frames. ], tot_loss[loss=0.06833, simple_loss=0.09264, pruned_loss=0.01306, audio_tagging_loss=0.008952, over 3049737.24 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:07:03,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3105686.6666666665, ans=0.125 2023-11-25 23:07:04,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3105686.6666666665, ans=0.1 2023-11-25 23:07:14,720 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3105753.3333333335, ans=0.125 2023-11-25 23:07:17,907 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3105753.3333333335, ans=0.1 2023-11-25 23:07:40,990 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3105886.6666666665, ans=0.125 2023-11-25 23:07:43,926 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.330e+01 8.637e+01 9.614e+01 1.032e+02 1.612e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-25 23:07:50,281 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 465900 2023-11-25 23:07:56,439 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 9000, loss[loss=0.1078, simple_loss=0.154, pruned_loss=0.02211, audio_tagging_loss=0.008643, over 16382.00 frames. ], tot_loss[loss=0.06863, simple_loss=0.09329, pruned_loss=0.01314, audio_tagging_loss=0.008843, over 3046491.51 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:07:56,440 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-25 23:08:22,720 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.4577, 3.7990, 4.3370, 3.5624], device='cuda:3') 2023-11-25 23:08:28,207 INFO [train_asr.py:1267] (3/4) Epoch 39, validation: loss=0.05899, simple_loss=0.0507, pruned_loss=0.005227, audio_tagging_loss=0.02841, over 4681554.00 frames. 2023-11-25 23:08:28,208 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-25 23:08:39,296 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3106086.6666666665, ans=0.0 2023-11-25 23:08:41,388 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3106086.6666666665, ans=0.0 2023-11-25 23:08:42,395 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3106086.6666666665, ans=0.0 2023-11-25 23:08:45,032 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.31 vs. limit=22.5 2023-11-25 23:08:55,200 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=5.85 vs. limit=15.0 2023-11-25 23:09:17,800 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 465950 2023-11-25 23:09:21,072 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3106286.6666666665, ans=0.0 2023-11-25 23:09:22,964 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 9050, loss[loss=0.06491, simple_loss=0.08141, pruned_loss=0.01496, audio_tagging_loss=0.009246, over 15491.00 frames. ], tot_loss[loss=0.06847, simple_loss=0.09299, pruned_loss=0.01312, audio_tagging_loss=0.008851, over 3044812.58 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:09:23,697 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.56 vs. limit=15.0 2023-11-25 23:09:37,798 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3106420.0, ans=0.125 2023-11-25 23:09:59,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3106553.3333333335, ans=0.125 2023-11-25 23:10:06,230 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3106620.0, ans=0.04949747468305833 2023-11-25 23:10:07,085 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.183e+01 8.869e+01 9.445e+01 1.003e+02 1.420e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-25 23:10:08,368 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3106620.0, ans=0.125 2023-11-25 23:10:08,567 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.92 vs. limit=15.0 2023-11-25 23:10:12,404 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 466000 2023-11-25 23:10:17,918 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3106686.6666666665, ans=0.125 2023-11-25 23:10:18,645 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 9100, loss[loss=0.09387, simple_loss=0.1248, pruned_loss=0.02398, audio_tagging_loss=0.00749, over 14479.00 frames. ], tot_loss[loss=0.0685, simple_loss=0.09281, pruned_loss=0.01323, audio_tagging_loss=0.008866, over 3048268.76 frames. ], batch size: 53, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:10:33,321 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.83 vs. limit=15.0 2023-11-25 23:10:40,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3106820.0, ans=0.1 2023-11-25 23:10:47,224 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3106820.0, ans=0.04949747468305833 2023-11-25 23:10:59,511 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3106886.6666666665, ans=0.125 2023-11-25 23:11:08,264 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 466050 2023-11-25 23:11:13,469 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 9150, loss[loss=0.07263, simple_loss=0.09808, pruned_loss=0.01581, audio_tagging_loss=0.007777, over 14387.00 frames. ], tot_loss[loss=0.06811, simple_loss=0.0922, pruned_loss=0.01316, audio_tagging_loss=0.008844, over 3043045.62 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:11:17,829 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3107020.0, ans=0.125 2023-11-25 23:11:28,879 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3107086.6666666665, ans=0.125 2023-11-25 23:11:43,969 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3107153.3333333335, ans=0.2 2023-11-25 23:11:48,850 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3107220.0, ans=0.125 2023-11-25 23:11:50,918 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3107220.0, ans=0.125 2023-11-25 23:11:56,886 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.260e+01 8.490e+01 9.148e+01 9.794e+01 1.489e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-25 23:12:02,231 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 466100 2023-11-25 23:12:07,885 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 9200, loss[loss=0.06261, simple_loss=0.09047, pruned_loss=0.009009, audio_tagging_loss=0.008368, over 15218.00 frames. ], tot_loss[loss=0.0683, simple_loss=0.09269, pruned_loss=0.01321, audio_tagging_loss=0.008744, over 3048190.21 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:12:13,513 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3107353.3333333335, ans=0.0 2023-11-25 23:12:34,719 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3107486.6666666665, ans=0.0 2023-11-25 23:12:55,139 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3107620.0, ans=0.2 2023-11-25 23:12:57,035 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 466150 2023-11-25 23:13:02,444 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3107686.6666666665, ans=0.125 2023-11-25 23:13:03,216 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 9250, loss[loss=0.09439, simple_loss=0.1357, pruned_loss=0.01922, audio_tagging_loss=0.007337, over 15134.00 frames. ], tot_loss[loss=0.06849, simple_loss=0.0926, pruned_loss=0.01336, audio_tagging_loss=0.008834, over 3045361.38 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:13:36,054 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3107886.6666666665, ans=10.0 2023-11-25 23:13:37,069 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3107886.6666666665, ans=0.0 2023-11-25 23:13:46,850 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.067e+01 8.554e+01 9.246e+01 1.012e+02 1.216e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-25 23:13:52,721 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 466200 2023-11-25 23:13:58,063 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 9300, loss[loss=0.04514, simple_loss=0.0666, pruned_loss=0.00555, audio_tagging_loss=0.00629, over 14619.00 frames. ], tot_loss[loss=0.06787, simple_loss=0.0916, pruned_loss=0.01321, audio_tagging_loss=0.008855, over 3041123.94 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:14:07,988 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.32 vs. limit=15.0 2023-11-25 23:14:15,544 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3108086.6666666665, ans=0.2 2023-11-25 23:14:28,810 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3108153.3333333335, ans=0.0 2023-11-25 23:14:31,837 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3108220.0, ans=0.125 2023-11-25 23:14:46,891 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 466250 2023-11-25 23:14:52,127 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 9350, loss[loss=0.07065, simple_loss=0.1002, pruned_loss=0.01185, audio_tagging_loss=0.008684, over 15126.00 frames. ], tot_loss[loss=0.06721, simple_loss=0.09061, pruned_loss=0.01298, audio_tagging_loss=0.008917, over 3043161.66 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:15:00,042 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3108353.3333333335, ans=0.0 2023-11-25 23:15:02,507 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=7.80 vs. limit=12.0 2023-11-25 23:15:06,803 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3108420.0, ans=0.04949747468305833 2023-11-25 23:15:19,067 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.00 vs. limit=15.0 2023-11-25 23:15:19,142 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.39 vs. limit=6.0 2023-11-25 23:15:21,140 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.83 vs. limit=10.0 2023-11-25 23:15:21,842 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3108486.6666666665, ans=0.0 2023-11-25 23:15:26,101 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=3108553.3333333335, ans=0.5 2023-11-25 23:15:36,369 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.736e+01 8.512e+01 9.083e+01 9.779e+01 1.171e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-25 23:15:36,531 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=3108620.0, ans=0.2 2023-11-25 23:15:41,170 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 466300 2023-11-25 23:15:44,408 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3108620.0, ans=0.0 2023-11-25 23:15:46,820 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 9400, loss[loss=0.07262, simple_loss=0.09601, pruned_loss=0.01505, audio_tagging_loss=0.009558, over 13865.00 frames. ], tot_loss[loss=0.06676, simple_loss=0.08977, pruned_loss=0.01279, audio_tagging_loss=0.009078, over 3047370.27 frames. ], batch size: 53, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:15:54,740 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3108686.6666666665, ans=0.0 2023-11-25 23:15:58,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten.whitening_limit, batch_count=3108753.3333333335, ans=15.0 2023-11-25 23:16:00,067 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3108753.3333333335, ans=0.5 2023-11-25 23:16:09,951 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.44 vs. limit=10.0 2023-11-25 23:16:25,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3108886.6666666665, ans=0.0 2023-11-25 23:16:25,768 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-25 23:16:35,599 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 466350 2023-11-25 23:16:37,847 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.52 vs. limit=15.0 2023-11-25 23:16:40,780 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.95 vs. limit=15.0 2023-11-25 23:16:41,288 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 9450, loss[loss=0.06347, simple_loss=0.09117, pruned_loss=0.009438, audio_tagging_loss=0.008451, over 14796.00 frames. ], tot_loss[loss=0.06693, simple_loss=0.08978, pruned_loss=0.01284, audio_tagging_loss=0.009201, over 3044903.28 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:16:42,347 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 23:16:46,861 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3109020.0, ans=0.2 2023-11-25 23:16:49,059 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3109020.0, ans=0.1 2023-11-25 23:16:55,265 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3109086.6666666665, ans=0.125 2023-11-25 23:16:58,658 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.61 vs. limit=15.0 2023-11-25 23:17:04,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3109153.3333333335, ans=0.07 2023-11-25 23:17:04,446 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.13 vs. limit=15.0 2023-11-25 23:17:06,246 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3109153.3333333335, ans=0.1 2023-11-25 23:17:08,352 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3109153.3333333335, ans=0.125 2023-11-25 23:17:14,145 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3109220.0, ans=0.1 2023-11-25 23:17:16,237 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3109220.0, ans=0.0 2023-11-25 23:17:25,498 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.54 vs. limit=6.0 2023-11-25 23:17:25,978 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.965e+01 8.505e+01 9.184e+01 9.882e+01 1.417e+02, threshold=1.837e+02, percent-clipped=0.0 2023-11-25 23:17:26,225 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3109286.6666666665, ans=0.2 2023-11-25 23:17:30,190 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 466400 2023-11-25 23:17:31,478 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3109286.6666666665, ans=0.125 2023-11-25 23:17:35,700 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 9500, loss[loss=0.05863, simple_loss=0.07572, pruned_loss=0.007768, audio_tagging_loss=0.013, over 13555.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.08895, pruned_loss=0.01248, audio_tagging_loss=0.009317, over 3043670.69 frames. ], batch size: 53, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:17:49,877 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3109420.0, ans=0.125 2023-11-25 23:17:59,432 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3109486.6666666665, ans=0.0 2023-11-25 23:18:10,353 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3109553.3333333335, ans=0.125 2023-11-25 23:18:24,966 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 466450 2023-11-25 23:18:25,410 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.66 vs. limit=15.0 2023-11-25 23:18:28,354 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3109620.0, ans=0.125 2023-11-25 23:18:30,754 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 9550, loss[loss=0.06898, simple_loss=0.1005, pruned_loss=0.01068, audio_tagging_loss=0.008079, over 14912.00 frames. ], tot_loss[loss=0.06677, simple_loss=0.08992, pruned_loss=0.01254, audio_tagging_loss=0.00927, over 3045026.09 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:18:50,306 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3109753.3333333335, ans=0.1 2023-11-25 23:19:16,195 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.243e+01 8.693e+01 9.287e+01 1.001e+02 1.223e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-25 23:19:17,513 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3109953.3333333335, ans=0.125 2023-11-25 23:19:20,375 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 466500 2023-11-25 23:19:26,133 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 9600, loss[loss=0.06767, simple_loss=0.09376, pruned_loss=0.01247, audio_tagging_loss=0.008325, over 14607.00 frames. ], tot_loss[loss=0.06716, simple_loss=0.0905, pruned_loss=0.01265, audio_tagging_loss=0.009252, over 3046141.42 frames. ], batch size: 54, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:19:26,197 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3110020.0, ans=0.015 2023-11-25 23:20:10,646 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3110286.6666666665, ans=0.125 2023-11-25 23:20:14,829 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 466550 2023-11-25 23:20:18,172 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3110286.6666666665, ans=0.0 2023-11-25 23:20:20,018 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 9650, loss[loss=0.07602, simple_loss=0.0998, pruned_loss=0.01459, audio_tagging_loss=0.01153, over 15527.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.08986, pruned_loss=0.01254, audio_tagging_loss=0.009217, over 3047800.61 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:20:23,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3110353.3333333335, ans=0.0 2023-11-25 23:20:24,438 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3110353.3333333335, ans=0.0 2023-11-25 23:20:26,485 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3110353.3333333335, ans=0.95 2023-11-25 23:20:27,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3110353.3333333335, ans=0.0 2023-11-25 23:20:42,772 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3110486.6666666665, ans=0.05 2023-11-25 23:20:53,237 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.46 vs. limit=15.0 2023-11-25 23:21:05,152 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.725e+01 8.886e+01 9.411e+01 1.006e+02 1.308e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-25 23:21:07,356 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3110620.0, ans=0.0 2023-11-25 23:21:09,291 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 466600 2023-11-25 23:21:14,690 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 9700, loss[loss=0.07235, simple_loss=0.1024, pruned_loss=0.0148, audio_tagging_loss=0.006354, over 16288.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.09005, pruned_loss=0.0126, audio_tagging_loss=0.009075, over 3046165.44 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:22:04,880 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 466650 2023-11-25 23:22:11,121 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 9750, loss[loss=0.07299, simple_loss=0.09646, pruned_loss=0.01751, audio_tagging_loss=0.007249, over 14548.00 frames. ], tot_loss[loss=0.06673, simple_loss=0.09034, pruned_loss=0.01261, audio_tagging_loss=0.008946, over 3044335.81 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:22:17,593 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3111020.0, ans=0.0 2023-11-25 23:22:18,588 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3111020.0, ans=0.1 2023-11-25 23:22:21,916 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3111086.6666666665, ans=0.0 2023-11-25 23:22:32,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3111153.3333333335, ans=0.125 2023-11-25 23:22:33,395 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3111153.3333333335, ans=0.0 2023-11-25 23:22:35,765 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.48 vs. limit=10.0 2023-11-25 23:22:39,692 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3111153.3333333335, ans=0.125 2023-11-25 23:22:48,122 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3111220.0, ans=0.125 2023-11-25 23:22:57,247 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.121e+01 8.598e+01 9.280e+01 1.031e+02 1.262e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-25 23:22:57,478 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3111286.6666666665, ans=0.125 2023-11-25 23:23:00,479 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 466700 2023-11-25 23:23:00,877 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.38 vs. limit=22.5 2023-11-25 23:23:05,716 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 9800, loss[loss=0.05638, simple_loss=0.07177, pruned_loss=0.01169, audio_tagging_loss=0.008803, over 14065.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.08995, pruned_loss=0.01272, audio_tagging_loss=0.008845, over 3046631.53 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:23:23,751 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3111420.0, ans=0.125 2023-11-25 23:23:25,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3111420.0, ans=0.2 2023-11-25 23:23:28,053 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3111486.6666666665, ans=0.125 2023-11-25 23:23:47,994 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3111553.3333333335, ans=0.0 2023-11-25 23:23:55,148 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 466750 2023-11-25 23:23:56,121 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 23:24:00,443 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 9850, loss[loss=0.07454, simple_loss=0.1036, pruned_loss=0.01603, audio_tagging_loss=0.006721, over 15090.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.08967, pruned_loss=0.01273, audio_tagging_loss=0.00884, over 3047905.85 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:24:25,206 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3111820.0, ans=0.1 2023-11-25 23:24:33,438 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3111886.6666666665, ans=0.125 2023-11-25 23:24:38,668 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3111886.6666666665, ans=0.0 2023-11-25 23:24:45,828 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.982e+01 8.652e+01 9.205e+01 1.019e+02 1.596e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-25 23:24:50,055 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 466800 2023-11-25 23:24:54,686 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3112020.0, ans=0.125 2023-11-25 23:24:55,435 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 9900, loss[loss=0.05261, simple_loss=0.06618, pruned_loss=0.008186, audio_tagging_loss=0.01133, over 14444.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.08973, pruned_loss=0.01267, audio_tagging_loss=0.008903, over 3042023.94 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:24:56,547 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3112020.0, ans=0.125 2023-11-25 23:24:59,340 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3112020.0, ans=0.2 2023-11-25 23:25:04,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3112020.0, ans=0.125 2023-11-25 23:25:05,256 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.61 vs. limit=10.0 2023-11-25 23:25:15,425 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3112086.6666666665, ans=0.0 2023-11-25 23:25:24,055 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3112153.3333333335, ans=0.125 2023-11-25 23:25:32,546 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.31 vs. limit=22.5 2023-11-25 23:25:45,816 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 466850 2023-11-25 23:25:50,421 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.27 vs. limit=22.5 2023-11-25 23:25:51,054 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 9950, loss[loss=0.0613, simple_loss=0.08157, pruned_loss=0.0116, audio_tagging_loss=0.008916, over 14860.00 frames. ], tot_loss[loss=0.06693, simple_loss=0.09059, pruned_loss=0.01283, audio_tagging_loss=0.0088, over 3043664.21 frames. ], batch size: 54, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:25:55,445 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3112353.3333333335, ans=0.0 2023-11-25 23:26:08,080 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3112420.0, ans=0.125 2023-11-25 23:26:31,085 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3112553.3333333335, ans=0.2 2023-11-25 23:26:37,238 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.460e+01 8.531e+01 9.197e+01 9.885e+01 1.494e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-25 23:26:40,484 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 466900 2023-11-25 23:26:45,709 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 10000, loss[loss=0.07581, simple_loss=0.1044, pruned_loss=0.0147, audio_tagging_loss=0.008925, over 16568.00 frames. ], tot_loss[loss=0.0669, simple_loss=0.09085, pruned_loss=0.0127, audio_tagging_loss=0.008774, over 3048187.84 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:26:46,871 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3112686.6666666665, ans=0.2 2023-11-25 23:26:51,026 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3112686.6666666665, ans=0.0 2023-11-25 23:26:58,465 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3112753.3333333335, ans=0.0 2023-11-25 23:27:14,001 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3112820.0, ans=0.0 2023-11-25 23:27:34,929 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 466950 2023-11-25 23:27:41,128 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 10050, loss[loss=0.0482, simple_loss=0.06162, pruned_loss=0.007786, audio_tagging_loss=0.009604, over 15343.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.09005, pruned_loss=0.01258, audio_tagging_loss=0.008763, over 3044828.95 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:27:50,569 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.61 vs. limit=15.0 2023-11-25 23:27:53,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3113086.6666666665, ans=0.125 2023-11-25 23:27:54,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3113086.6666666665, ans=0.1 2023-11-25 23:27:58,018 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3113086.6666666665, ans=0.125 2023-11-25 23:28:02,769 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.84 vs. limit=6.0 2023-11-25 23:28:10,712 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3113153.3333333335, ans=0.1 2023-11-25 23:28:13,750 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3113220.0, ans=0.0 2023-11-25 23:28:20,973 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3113220.0, ans=0.125 2023-11-25 23:28:20,975 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3113220.0, ans=0.1 2023-11-25 23:28:28,024 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.248e+01 8.554e+01 9.112e+01 9.756e+01 1.275e+02, threshold=1.822e+02, percent-clipped=0.0 2023-11-25 23:28:30,151 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.41 vs. limit=15.0 2023-11-25 23:28:30,654 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 467000 2023-11-25 23:28:35,516 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.06 vs. limit=12.0 2023-11-25 23:28:36,530 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 10100, loss[loss=0.0801, simple_loss=0.1173, pruned_loss=0.01236, audio_tagging_loss=0.009081, over 14629.00 frames. ], tot_loss[loss=0.06713, simple_loss=0.09133, pruned_loss=0.01267, audio_tagging_loss=0.008797, over 3045002.28 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:28:56,644 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3113486.6666666665, ans=0.125 2023-11-25 23:28:56,694 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-25 23:29:05,408 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.23 vs. limit=15.0 2023-11-25 23:29:07,343 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 23:29:15,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3113553.3333333335, ans=0.0 2023-11-25 23:29:22,941 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 23:29:23,090 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3113620.0, ans=0.0 2023-11-25 23:29:23,202 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3113620.0, ans=0.125 2023-11-25 23:29:26,097 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 467050 2023-11-25 23:29:31,216 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 10150, loss[loss=0.04896, simple_loss=0.0673, pruned_loss=0.006017, audio_tagging_loss=0.009287, over 16006.00 frames. ], tot_loss[loss=0.06722, simple_loss=0.09117, pruned_loss=0.01281, audio_tagging_loss=0.008817, over 3045764.99 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:29:58,940 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 23:30:04,861 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3113886.6666666665, ans=0.2 2023-11-25 23:30:09,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3113886.6666666665, ans=0.125 2023-11-25 23:30:12,479 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.68 vs. limit=15.0 2023-11-25 23:30:18,235 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.487e+01 8.705e+01 9.387e+01 9.994e+01 1.374e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-25 23:30:20,428 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 467100 2023-11-25 23:30:26,713 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 10200, loss[loss=0.04817, simple_loss=0.06253, pruned_loss=0.009127, audio_tagging_loss=0.007771, over 14564.00 frames. ], tot_loss[loss=0.06709, simple_loss=0.09092, pruned_loss=0.01278, audio_tagging_loss=0.008843, over 3052860.01 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:30:28,041 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3114020.0, ans=0.125 2023-11-25 23:30:28,913 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3114020.0, ans=0.125 2023-11-25 23:30:30,121 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3114020.0, ans=0.2 2023-11-25 23:30:41,942 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3114086.6666666665, ans=0.125 2023-11-25 23:30:42,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3114086.6666666665, ans=0.125 2023-11-25 23:30:45,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3114086.6666666665, ans=0.125 2023-11-25 23:30:49,586 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 23:30:53,870 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3114153.3333333335, ans=0.1 2023-11-25 23:30:55,063 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3114153.3333333335, ans=0.04949747468305833 2023-11-25 23:30:59,196 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3114220.0, ans=0.125 2023-11-25 23:31:11,910 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3114286.6666666665, ans=0.07 2023-11-25 23:31:15,709 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3114286.6666666665, ans=0.1 2023-11-25 23:31:16,499 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 467150 2023-11-25 23:31:21,723 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 10250, loss[loss=0.05876, simple_loss=0.07155, pruned_loss=0.01171, audio_tagging_loss=0.01127, over 15573.00 frames. ], tot_loss[loss=0.06688, simple_loss=0.09029, pruned_loss=0.01277, audio_tagging_loss=0.008964, over 3048609.11 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:31:30,165 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.69 vs. limit=22.5 2023-11-25 23:31:32,989 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=3114420.0, ans=0.02 2023-11-25 23:31:37,122 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-25 23:31:38,023 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3114420.0, ans=0.125 2023-11-25 23:31:40,258 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3114420.0, ans=0.125 2023-11-25 23:31:44,383 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3114486.6666666665, ans=0.125 2023-11-25 23:32:04,906 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3114620.0, ans=0.125 2023-11-25 23:32:08,829 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.548e+01 8.876e+01 9.394e+01 1.009e+02 1.335e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-25 23:32:11,016 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 467200 2023-11-25 23:32:16,998 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 10300, loss[loss=0.06651, simple_loss=0.08905, pruned_loss=0.01179, audio_tagging_loss=0.0102, over 15038.00 frames. ], tot_loss[loss=0.06688, simple_loss=0.09011, pruned_loss=0.01271, audio_tagging_loss=0.009115, over 3047804.00 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:32:31,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3114753.3333333335, ans=0.0 2023-11-25 23:32:34,806 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3114753.3333333335, ans=0.0 2023-11-25 23:32:39,567 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3114820.0, ans=0.0 2023-11-25 23:32:42,649 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3114820.0, ans=0.0 2023-11-25 23:32:44,794 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3114820.0, ans=0.1 2023-11-25 23:32:46,212 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.12 vs. limit=15.0 2023-11-25 23:33:06,201 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 467250 2023-11-25 23:33:06,426 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3114953.3333333335, ans=0.125 2023-11-25 23:33:11,015 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.00 vs. limit=15.0 2023-11-25 23:33:11,816 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 10350, loss[loss=0.07649, simple_loss=0.1128, pruned_loss=0.01455, audio_tagging_loss=0.005559, over 15037.00 frames. ], tot_loss[loss=0.0672, simple_loss=0.09054, pruned_loss=0.01269, audio_tagging_loss=0.009241, over 3047855.04 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:33:29,782 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3115086.6666666665, ans=0.1 2023-11-25 23:33:51,304 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.50 vs. limit=22.5 2023-11-25 23:33:58,300 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3115286.6666666665, ans=0.1 2023-11-25 23:33:59,133 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.500e+01 8.695e+01 9.211e+01 9.915e+01 1.210e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-25 23:34:00,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3115286.6666666665, ans=0.1 2023-11-25 23:34:01,285 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 467300 2023-11-25 23:34:07,015 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 10400, loss[loss=0.06619, simple_loss=0.08764, pruned_loss=0.01092, audio_tagging_loss=0.01144, over 15344.00 frames. ], tot_loss[loss=0.0674, simple_loss=0.09083, pruned_loss=0.0127, audio_tagging_loss=0.009291, over 3046990.89 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:34:07,744 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.82 vs. limit=15.0 2023-11-25 23:34:14,658 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3115353.3333333335, ans=0.0 2023-11-25 23:34:34,204 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3115486.6666666665, ans=10.0 2023-11-25 23:34:48,608 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.45 vs. limit=22.5 2023-11-25 23:34:56,626 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 467350 2023-11-25 23:35:01,767 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 10450, loss[loss=0.08296, simple_loss=0.1053, pruned_loss=0.02334, audio_tagging_loss=0.006974, over 15916.00 frames. ], tot_loss[loss=0.06734, simple_loss=0.0905, pruned_loss=0.01281, audio_tagging_loss=0.00928, over 3046385.82 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:35:05,605 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3115686.6666666665, ans=0.04949747468305833 2023-11-25 23:35:05,853 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.90 vs. limit=12.0 2023-11-25 23:35:17,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3115753.3333333335, ans=0.125 2023-11-25 23:35:32,765 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.56 vs. limit=12.0 2023-11-25 23:35:38,520 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3115886.6666666665, ans=0.0 2023-11-25 23:35:42,796 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3115886.6666666665, ans=0.125 2023-11-25 23:35:49,277 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.174e+01 8.667e+01 9.396e+01 1.018e+02 1.785e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-25 23:35:51,419 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 467400 2023-11-25 23:35:56,809 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 10500, loss[loss=0.06683, simple_loss=0.09771, pruned_loss=0.01001, audio_tagging_loss=0.007957, over 15380.00 frames. ], tot_loss[loss=0.06739, simple_loss=0.09084, pruned_loss=0.01286, audio_tagging_loss=0.009116, over 3043013.53 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:36:01,399 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.85 vs. limit=15.0 2023-11-25 23:36:16,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=3116086.6666666665, ans=0.95 2023-11-25 23:36:17,323 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 23:36:19,318 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3116153.3333333335, ans=0.125 2023-11-25 23:36:31,342 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3116220.0, ans=0.125 2023-11-25 23:36:37,710 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3116220.0, ans=0.2 2023-11-25 23:36:46,216 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.44 vs. limit=15.0 2023-11-25 23:36:46,893 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 467450 2023-11-25 23:36:52,574 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 10550, loss[loss=0.05471, simple_loss=0.07569, pruned_loss=0.006651, audio_tagging_loss=0.01022, over 15496.00 frames. ], tot_loss[loss=0.06742, simple_loss=0.09125, pruned_loss=0.01292, audio_tagging_loss=0.008877, over 3047718.91 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:36:52,709 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3116353.3333333335, ans=0.125 2023-11-25 23:36:58,028 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3116353.3333333335, ans=0.125 2023-11-25 23:37:01,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3116353.3333333335, ans=0.0 2023-11-25 23:37:16,270 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3116486.6666666665, ans=0.125 2023-11-25 23:37:40,575 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.206e+01 8.690e+01 9.247e+01 9.972e+01 1.800e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-25 23:37:40,888 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3116620.0, ans=0.2 2023-11-25 23:37:41,726 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 467500 2023-11-25 23:37:42,922 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3116620.0, ans=0.125 2023-11-25 23:37:42,949 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3116620.0, ans=10.0 2023-11-25 23:37:46,815 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 10600, loss[loss=0.06991, simple_loss=0.09861, pruned_loss=0.01325, audio_tagging_loss=0.007355, over 15726.00 frames. ], tot_loss[loss=0.06749, simple_loss=0.09134, pruned_loss=0.01292, audio_tagging_loss=0.0089, over 3049633.88 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:37:48,016 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3116686.6666666665, ans=0.125 2023-11-25 23:37:55,990 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3116686.6666666665, ans=0.0 2023-11-25 23:38:11,578 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3116820.0, ans=0.125 2023-11-25 23:38:36,000 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 467550 2023-11-25 23:38:39,759 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3116953.3333333335, ans=0.125 2023-11-25 23:38:41,685 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 10650, loss[loss=0.06035, simple_loss=0.07576, pruned_loss=0.009776, audio_tagging_loss=0.01269, over 13829.00 frames. ], tot_loss[loss=0.06761, simple_loss=0.09167, pruned_loss=0.01288, audio_tagging_loss=0.008894, over 3055877.50 frames. ], batch size: 53, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:38:52,209 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3117086.6666666665, ans=0.125 2023-11-25 23:39:06,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3117153.3333333335, ans=0.125 2023-11-25 23:39:11,403 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3117153.3333333335, ans=0.0 2023-11-25 23:39:30,341 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.082e+01 8.807e+01 9.255e+01 1.012e+02 1.355e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-25 23:39:31,451 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 467600 2023-11-25 23:39:31,672 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3117286.6666666665, ans=0.0 2023-11-25 23:39:36,808 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 10700, loss[loss=0.06846, simple_loss=0.09755, pruned_loss=0.01339, audio_tagging_loss=0.006289, over 16128.00 frames. ], tot_loss[loss=0.06739, simple_loss=0.09123, pruned_loss=0.01284, audio_tagging_loss=0.00893, over 3058272.32 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:40:26,120 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 467650 2023-11-25 23:40:27,280 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3117620.0, ans=0.0 2023-11-25 23:40:31,270 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 10750, loss[loss=0.08231, simple_loss=0.1083, pruned_loss=0.01802, audio_tagging_loss=0.01013, over 16679.00 frames. ], tot_loss[loss=0.06745, simple_loss=0.09128, pruned_loss=0.01289, audio_tagging_loss=0.008908, over 3060896.60 frames. ], batch size: 62, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:41:10,993 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3117886.6666666665, ans=0.0 2023-11-25 23:41:12,338 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.20 vs. limit=22.5 2023-11-25 23:41:18,428 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3117953.3333333335, ans=0.0 2023-11-25 23:41:19,195 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.994e+01 8.803e+01 9.280e+01 9.939e+01 1.365e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-25 23:41:20,308 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 467700 2023-11-25 23:41:24,763 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3118020.0, ans=0.05 2023-11-25 23:41:25,498 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 10800, loss[loss=0.07493, simple_loss=0.1037, pruned_loss=0.01731, audio_tagging_loss=0.005743, over 15748.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.09079, pruned_loss=0.01264, audio_tagging_loss=0.008875, over 3057025.71 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 32.0 2023-11-25 23:41:35,573 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3118020.0, ans=0.125 2023-11-25 23:41:40,833 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3118086.6666666665, ans=0.0 2023-11-25 23:41:54,806 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3118153.3333333335, ans=0.125 2023-11-25 23:41:59,917 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3118220.0, ans=0.125 2023-11-25 23:42:11,334 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3118286.6666666665, ans=0.0 2023-11-25 23:42:15,808 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 467750 2023-11-25 23:42:19,064 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3118286.6666666665, ans=0.0 2023-11-25 23:42:21,017 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 10850, loss[loss=0.05237, simple_loss=0.06716, pruned_loss=0.009912, audio_tagging_loss=0.008874, over 16119.00 frames. ], tot_loss[loss=0.0671, simple_loss=0.09108, pruned_loss=0.01266, audio_tagging_loss=0.008894, over 3057791.42 frames. ], batch size: 62, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:42:24,865 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3118353.3333333335, ans=0.0 2023-11-25 23:43:03,555 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.64 vs. limit=22.5 2023-11-25 23:43:09,854 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.511e+01 8.755e+01 9.381e+01 1.019e+02 1.994e+02, threshold=1.876e+02, percent-clipped=1.0 2023-11-25 23:43:09,941 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 467800 2023-11-25 23:43:14,316 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 23:43:15,316 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 10900, loss[loss=0.0662, simple_loss=0.0964, pruned_loss=0.01082, audio_tagging_loss=0.007183, over 14732.00 frames. ], tot_loss[loss=0.06679, simple_loss=0.09065, pruned_loss=0.01257, audio_tagging_loss=0.008895, over 3048997.76 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:43:18,854 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.11 vs. limit=12.0 2023-11-25 23:43:57,207 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3118886.6666666665, ans=0.1 2023-11-25 23:44:04,184 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 467850 2023-11-25 23:44:09,326 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 10950, loss[loss=0.05429, simple_loss=0.07381, pruned_loss=0.006992, audio_tagging_loss=0.01039, over 14672.00 frames. ], tot_loss[loss=0.06695, simple_loss=0.09083, pruned_loss=0.01256, audio_tagging_loss=0.008974, over 3047333.66 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:44:18,478 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3119020.0, ans=0.04949747468305833 2023-11-25 23:44:23,444 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3119086.6666666665, ans=0.1 2023-11-25 23:44:23,555 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3119086.6666666665, ans=0.2 2023-11-25 23:44:34,601 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3119153.3333333335, ans=0.1 2023-11-25 23:44:39,625 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3119153.3333333335, ans=0.125 2023-11-25 23:44:53,144 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3119286.6666666665, ans=0.125 2023-11-25 23:44:58,213 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.075e+01 8.372e+01 9.128e+01 9.666e+01 1.249e+02, threshold=1.826e+02, percent-clipped=0.0 2023-11-25 23:44:58,300 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 467900 2023-11-25 23:45:04,526 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 11000, loss[loss=0.06243, simple_loss=0.07365, pruned_loss=0.01434, audio_tagging_loss=0.01127, over 14835.00 frames. ], tot_loss[loss=0.06695, simple_loss=0.09078, pruned_loss=0.01254, audio_tagging_loss=0.009023, over 3048220.06 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:45:06,354 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3119353.3333333335, ans=0.125 2023-11-25 23:45:13,304 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.04 vs. limit=15.0 2023-11-25 23:45:15,971 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 23:45:17,266 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3119420.0, ans=0.0 2023-11-25 23:45:19,304 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3119420.0, ans=0.0 2023-11-25 23:45:26,518 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3119486.6666666665, ans=0.2 2023-11-25 23:45:28,555 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3119486.6666666665, ans=0.125 2023-11-25 23:45:43,131 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3119553.3333333335, ans=0.125 2023-11-25 23:45:46,177 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3119553.3333333335, ans=0.1 2023-11-25 23:45:46,307 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3119553.3333333335, ans=0.0 2023-11-25 23:45:54,393 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 467950 2023-11-25 23:45:59,568 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 11050, loss[loss=0.06129, simple_loss=0.08247, pruned_loss=0.01262, audio_tagging_loss=0.007437, over 14390.00 frames. ], tot_loss[loss=0.06701, simple_loss=0.09067, pruned_loss=0.01256, audio_tagging_loss=0.009125, over 3042420.08 frames. ], batch size: 54, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:46:05,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3119686.6666666665, ans=0.125 2023-11-25 23:46:28,918 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3119820.0, ans=0.125 2023-11-25 23:46:48,347 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.121e+01 8.692e+01 9.297e+01 1.029e+02 1.368e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-25 23:46:48,447 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 468000 2023-11-25 23:46:55,488 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 11100, loss[loss=0.05477, simple_loss=0.07109, pruned_loss=0.008025, audio_tagging_loss=0.0112, over 15561.00 frames. ], tot_loss[loss=0.06668, simple_loss=0.08999, pruned_loss=0.01247, audio_tagging_loss=0.009218, over 3051334.25 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:47:08,187 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3120086.6666666665, ans=0.125 2023-11-25 23:47:13,481 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.77 vs. limit=6.0 2023-11-25 23:47:27,803 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3120220.0, ans=0.2 2023-11-25 23:47:37,314 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.whiten.whitening_limit, batch_count=3120220.0, ans=12.0 2023-11-25 23:47:40,305 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3120286.6666666665, ans=0.125 2023-11-25 23:47:44,325 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 468050 2023-11-25 23:47:50,056 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 11150, loss[loss=0.0649, simple_loss=0.09189, pruned_loss=0.01194, audio_tagging_loss=0.007011, over 16211.00 frames. ], tot_loss[loss=0.06701, simple_loss=0.09012, pruned_loss=0.01263, audio_tagging_loss=0.009317, over 3047267.36 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 16.0 2023-11-25 23:48:20,361 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-25 23:48:34,337 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3120620.0, ans=0.0 2023-11-25 23:48:37,394 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3120620.0, ans=0.125 2023-11-25 23:48:38,193 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.101e+01 8.667e+01 9.262e+01 9.903e+01 1.395e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-25 23:48:38,287 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 468100 2023-11-25 23:48:43,926 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 11200, loss[loss=0.05961, simple_loss=0.08006, pruned_loss=0.01244, audio_tagging_loss=0.007142, over 15195.00 frames. ], tot_loss[loss=0.06703, simple_loss=0.09015, pruned_loss=0.01255, audio_tagging_loss=0.009399, over 3052427.38 frames. ], batch size: 56, lr: 1.72e-03, grad_scale: 32.0 2023-11-25 23:48:47,294 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-25 23:48:50,467 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3120686.6666666665, ans=0.125 2023-11-25 23:48:59,905 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3120753.3333333335, ans=0.1 2023-11-25 23:49:10,629 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3120820.0, ans=0.1 2023-11-25 23:49:19,521 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3120886.6666666665, ans=0.125 2023-11-25 23:49:31,667 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 468150 2023-11-25 23:49:36,746 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 11250, loss[loss=0.05505, simple_loss=0.07003, pruned_loss=0.01059, audio_tagging_loss=0.009439, over 15892.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.08923, pruned_loss=0.01253, audio_tagging_loss=0.009363, over 3052395.48 frames. ], batch size: 62, lr: 1.72e-03, grad_scale: 16.0 2023-11-25 23:49:38,981 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3121020.0, ans=0.0 2023-11-25 23:49:53,515 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3121086.6666666665, ans=0.09899494936611666 2023-11-25 23:50:16,733 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.79 vs. limit=6.0 2023-11-25 23:50:25,544 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 468200 2023-11-25 23:50:26,480 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.350e+01 8.668e+01 9.346e+01 1.011e+02 2.547e+02, threshold=1.869e+02, percent-clipped=1.0 2023-11-25 23:50:31,484 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 11300, loss[loss=0.05962, simple_loss=0.08139, pruned_loss=0.009943, audio_tagging_loss=0.008979, over 15278.00 frames. ], tot_loss[loss=0.06701, simple_loss=0.09026, pruned_loss=0.0127, audio_tagging_loss=0.009186, over 3052920.53 frames. ], batch size: 58, lr: 1.72e-03, grad_scale: 16.0 2023-11-25 23:50:36,317 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3121353.3333333335, ans=0.1 2023-11-25 23:50:56,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3121486.6666666665, ans=0.0 2023-11-25 23:51:00,997 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3121486.6666666665, ans=0.0 2023-11-25 23:51:04,069 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-25 23:51:06,605 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.84 vs. limit=15.0 2023-11-25 23:51:19,466 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3121620.0, ans=0.125 2023-11-25 23:51:20,343 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 468250 2023-11-25 23:51:23,209 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3121620.0, ans=0.125 2023-11-25 23:51:24,557 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.79 vs. limit=12.0 2023-11-25 23:51:25,989 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 11350, loss[loss=0.07337, simple_loss=0.1108, pruned_loss=0.01301, audio_tagging_loss=0.004953, over 15189.00 frames. ], tot_loss[loss=0.06723, simple_loss=0.09065, pruned_loss=0.01279, audio_tagging_loss=0.009109, over 3057692.69 frames. ], batch size: 55, lr: 1.72e-03, grad_scale: 16.0 2023-11-25 23:51:36,034 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3121753.3333333335, ans=0.0 2023-11-25 23:52:15,190 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 468300 2023-11-25 23:52:16,137 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.581e+01 8.715e+01 9.313e+01 1.012e+02 1.423e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-25 23:52:20,318 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 11400, loss[loss=0.06111, simple_loss=0.08637, pruned_loss=0.01247, audio_tagging_loss=0.005458, over 16469.00 frames. ], tot_loss[loss=0.06712, simple_loss=0.09052, pruned_loss=0.0127, audio_tagging_loss=0.009153, over 3055680.93 frames. ], batch size: 65, lr: 1.72e-03, grad_scale: 16.0 2023-11-25 23:52:35,790 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.02 vs. limit=12.0 2023-11-25 23:52:59,734 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3122220.0, ans=0.1 2023-11-25 23:53:00,880 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3122220.0, ans=0.125 2023-11-25 23:53:07,288 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.38 vs. limit=15.0 2023-11-25 23:53:09,037 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 468350 2023-11-25 23:53:12,338 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3122286.6666666665, ans=0.0 2023-11-25 23:53:14,125 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 11450, loss[loss=0.06548, simple_loss=0.08954, pruned_loss=0.01159, audio_tagging_loss=0.009122, over 16320.00 frames. ], tot_loss[loss=0.06726, simple_loss=0.09069, pruned_loss=0.01286, audio_tagging_loss=0.009048, over 3057089.80 frames. ], batch size: 63, lr: 1.72e-03, grad_scale: 16.0 2023-11-25 23:53:52,496 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3122553.3333333335, ans=0.125 2023-11-25 23:54:00,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3122620.0, ans=0.125 2023-11-25 23:54:03,228 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 468400 2023-11-25 23:54:04,160 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.427e+01 8.283e+01 9.283e+01 1.005e+02 1.593e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-25 23:54:05,928 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.21 vs. limit=15.0 2023-11-25 23:54:09,221 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 11500, loss[loss=0.06698, simple_loss=0.08842, pruned_loss=0.01045, audio_tagging_loss=0.01233, over 15565.00 frames. ], tot_loss[loss=0.06708, simple_loss=0.09056, pruned_loss=0.01283, audio_tagging_loss=0.008969, over 3049415.32 frames. ], batch size: 57, lr: 1.72e-03, grad_scale: 16.0 2023-11-25 23:54:22,386 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3122753.3333333335, ans=0.125 2023-11-25 23:54:30,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3122820.0, ans=0.125 2023-11-25 23:54:30,747 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3122820.0, ans=0.125 2023-11-25 23:54:42,056 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3122886.6666666665, ans=0.0 2023-11-25 23:54:57,824 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 468450 2023-11-25 23:54:58,048 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3122953.3333333335, ans=0.0 2023-11-25 23:55:01,691 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3122953.3333333335, ans=0.0 2023-11-25 23:55:03,506 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 11550, loss[loss=0.06851, simple_loss=0.09917, pruned_loss=0.01268, audio_tagging_loss=0.006247, over 16034.00 frames. ], tot_loss[loss=0.06706, simple_loss=0.09084, pruned_loss=0.01278, audio_tagging_loss=0.008861, over 3048058.13 frames. ], batch size: 57, lr: 1.72e-03, grad_scale: 16.0 2023-11-25 23:55:22,367 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3123086.6666666665, ans=0.125 2023-11-25 23:55:29,527 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3123153.3333333335, ans=0.0 2023-11-25 23:55:35,231 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3123220.0, ans=0.025 2023-11-25 23:55:38,136 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-25 23:55:51,190 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=3123286.6666666665, ans=0.5 2023-11-25 23:55:52,171 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 468500 2023-11-25 23:55:53,110 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.591e+01 8.807e+01 9.353e+01 9.870e+01 1.294e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-25 23:55:57,354 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 11600, loss[loss=0.06517, simple_loss=0.09479, pruned_loss=0.009814, audio_tagging_loss=0.007968, over 14208.00 frames. ], tot_loss[loss=0.06734, simple_loss=0.09149, pruned_loss=0.01276, audio_tagging_loss=0.008832, over 3047757.85 frames. ], batch size: 55, lr: 1.72e-03, grad_scale: 32.0 2023-11-25 23:56:02,422 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.68 vs. limit=15.0 2023-11-25 23:56:10,002 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3123420.0, ans=0.1 2023-11-25 23:56:17,800 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-25 23:56:22,952 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=3123486.6666666665, ans=0.1 2023-11-25 23:56:23,012 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3123486.6666666665, ans=0.07 2023-11-25 23:56:25,907 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3123486.6666666665, ans=0.015 2023-11-25 23:56:29,068 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.80 vs. limit=22.5 2023-11-25 23:56:41,088 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3123620.0, ans=0.125 2023-11-25 23:56:47,260 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 468550 2023-11-25 23:56:47,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3123620.0, ans=0.0 2023-11-25 23:56:52,391 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 11650, loss[loss=0.05319, simple_loss=0.07611, pruned_loss=0.006412, audio_tagging_loss=0.008721, over 15727.00 frames. ], tot_loss[loss=0.06725, simple_loss=0.09114, pruned_loss=0.0128, audio_tagging_loss=0.008882, over 3041435.87 frames. ], batch size: 61, lr: 1.72e-03, grad_scale: 16.0 2023-11-25 23:57:04,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3123753.3333333335, ans=0.2 2023-11-25 23:57:06,657 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3123753.3333333335, ans=0.0 2023-11-25 23:57:39,062 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3123953.3333333335, ans=0.0 2023-11-25 23:57:41,964 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 468600 2023-11-25 23:57:44,268 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.340e+01 8.612e+01 9.119e+01 9.760e+01 1.208e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-25 23:57:47,434 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 11700, loss[loss=0.06644, simple_loss=0.07577, pruned_loss=0.0155, audio_tagging_loss=0.01306, over 14194.00 frames. ], tot_loss[loss=0.06689, simple_loss=0.09038, pruned_loss=0.01275, audio_tagging_loss=0.00895, over 3041087.59 frames. ], batch size: 56, lr: 1.72e-03, grad_scale: 16.0 2023-11-25 23:57:53,423 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3124020.0, ans=0.05 2023-11-25 23:58:00,777 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3124086.6666666665, ans=0.125 2023-11-25 23:58:04,885 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3124086.6666666665, ans=0.1 2023-11-25 23:58:36,905 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 468650 2023-11-25 23:58:42,064 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 11750, loss[loss=0.06383, simple_loss=0.07643, pruned_loss=0.01354, audio_tagging_loss=0.01207, over 14468.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.08928, pruned_loss=0.0126, audio_tagging_loss=0.009008, over 3045457.94 frames. ], batch size: 55, lr: 1.72e-03, grad_scale: 16.0 2023-11-25 23:58:49,305 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.22 vs. limit=15.0 2023-11-25 23:59:01,652 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-25 23:59:03,995 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.06 vs. limit=22.5 2023-11-25 23:59:07,014 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.28 vs. limit=15.0 2023-11-25 23:59:07,865 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3124486.6666666665, ans=0.0 2023-11-25 23:59:13,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3124486.6666666665, ans=0.125 2023-11-25 23:59:17,248 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3124553.3333333335, ans=0.0 2023-11-25 23:59:18,249 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-25 23:59:21,799 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3124553.3333333335, ans=0.0 2023-11-25 23:59:32,164 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 468700 2023-11-25 23:59:33,519 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.97 vs. limit=6.0 2023-11-25 23:59:34,131 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.396e+01 8.692e+01 9.354e+01 9.925e+01 1.548e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-25 23:59:37,269 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 11800, loss[loss=0.07251, simple_loss=0.09967, pruned_loss=0.01476, audio_tagging_loss=0.007916, over 15195.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.08935, pruned_loss=0.01258, audio_tagging_loss=0.009055, over 3044357.09 frames. ], batch size: 55, lr: 1.72e-03, grad_scale: 16.0 2023-11-26 00:00:02,653 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.24 vs. limit=15.0 2023-11-26 00:00:03,545 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3124820.0, ans=0.025 2023-11-26 00:00:03,927 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.76 vs. limit=15.0 2023-11-26 00:00:17,467 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3124886.6666666665, ans=0.0 2023-11-26 00:00:19,555 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3124953.3333333335, ans=0.1 2023-11-26 00:00:26,140 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 468750 2023-11-26 00:00:29,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3124953.3333333335, ans=0.2 2023-11-26 00:00:31,223 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 11850, loss[loss=0.07604, simple_loss=0.1062, pruned_loss=0.01472, audio_tagging_loss=0.008204, over 16192.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.08912, pruned_loss=0.01261, audio_tagging_loss=0.00919, over 3042366.87 frames. ], batch size: 62, lr: 1.72e-03, grad_scale: 16.0 2023-11-26 00:00:31,916 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.09 vs. limit=10.0 2023-11-26 00:00:44,597 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3125086.6666666665, ans=0.125 2023-11-26 00:00:46,587 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3125086.6666666665, ans=0.125 2023-11-26 00:00:59,899 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.54 vs. limit=15.0 2023-11-26 00:01:10,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3125220.0, ans=0.0 2023-11-26 00:01:12,767 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3125220.0, ans=0.0 2023-11-26 00:01:20,119 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 468800 2023-11-26 00:01:22,401 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.466e+01 8.741e+01 9.224e+01 1.012e+02 1.182e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-26 00:01:22,620 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3125286.6666666665, ans=0.125 2023-11-26 00:01:22,643 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3125286.6666666665, ans=0.07 2023-11-26 00:01:25,624 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 11900, loss[loss=0.07694, simple_loss=0.09826, pruned_loss=0.01756, audio_tagging_loss=0.01025, over 15573.00 frames. ], tot_loss[loss=0.06674, simple_loss=0.08983, pruned_loss=0.01264, audio_tagging_loss=0.009184, over 3044287.47 frames. ], batch size: 60, lr: 1.72e-03, grad_scale: 16.0 2023-11-26 00:01:40,532 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3125420.0, ans=0.0 2023-11-26 00:01:55,673 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3125486.6666666665, ans=0.125 2023-11-26 00:02:02,974 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3125553.3333333335, ans=0.1 2023-11-26 00:02:15,009 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 468850 2023-11-26 00:02:20,694 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 11950, loss[loss=0.07259, simple_loss=0.09065, pruned_loss=0.01825, audio_tagging_loss=0.009015, over 15144.00 frames. ], tot_loss[loss=0.06712, simple_loss=0.09042, pruned_loss=0.01271, audio_tagging_loss=0.009197, over 3048115.29 frames. ], batch size: 55, lr: 1.72e-03, grad_scale: 16.0 2023-11-26 00:02:28,782 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3125686.6666666665, ans=0.0 2023-11-26 00:02:35,079 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3125753.3333333335, ans=0.0 2023-11-26 00:02:36,152 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3125753.3333333335, ans=0.09899494936611666 2023-11-26 00:02:41,433 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3125820.0, ans=0.125 2023-11-26 00:02:42,506 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3125820.0, ans=0.2 2023-11-26 00:02:56,287 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3125886.6666666665, ans=0.125 2023-11-26 00:03:07,591 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3125953.3333333335, ans=0.125 2023-11-26 00:03:09,046 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 468900 2023-11-26 00:03:11,571 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.232e+01 8.661e+01 9.250e+01 9.933e+01 1.391e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-26 00:03:11,831 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3125953.3333333335, ans=0.125 2023-11-26 00:03:14,636 INFO [train_asr.py:1235] (3/4) Epoch 39, batch 12000, loss[loss=0.05206, simple_loss=0.0636, pruned_loss=0.007025, audio_tagging_loss=0.01324, over 15001.00 frames. ], tot_loss[loss=0.0669, simple_loss=0.08972, pruned_loss=0.01263, audio_tagging_loss=0.009412, over 3051926.89 frames. ], batch size: 58, lr: 1.72e-03, grad_scale: 32.0 2023-11-26 00:03:14,637 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-26 00:03:47,120 INFO [train_asr.py:1267] (3/4) Epoch 39, validation: loss=0.05809, simple_loss=0.05065, pruned_loss=0.005132, audio_tagging_loss=0.02764, over 4681554.00 frames. 2023-11-26 00:03:47,121 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-26 00:03:57,511 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3126086.6666666665, ans=0.1 2023-11-26 00:03:58,413 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3126086.6666666665, ans=0.125 2023-11-26 00:04:01,544 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3126086.6666666665, ans=0.125 2023-11-26 00:04:03,660 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3126086.6666666665, ans=0.125 2023-11-26 00:04:06,809 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3126153.3333333335, ans=0.0 2023-11-26 00:04:40,564 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 0, loss[loss=0.07668, simple_loss=0.1006, pruned_loss=0.00831, audio_tagging_loss=0.01805, over 15514.00 frames. ], tot_loss[loss=0.07668, simple_loss=0.1006, pruned_loss=0.00831, audio_tagging_loss=0.01805, over 15514.00 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:04:40,565 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-26 00:05:12,143 INFO [train_asr.py:1267] (3/4) Epoch 40, validation: loss=0.05782, simple_loss=0.05064, pruned_loss=0.005121, audio_tagging_loss=0.02738, over 4681554.00 frames. 2023-11-26 00:05:12,144 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-26 00:05:18,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3126186.6666666665, ans=0.125 2023-11-26 00:05:23,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3126253.3333333335, ans=0.125 2023-11-26 00:05:27,814 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3126253.3333333335, ans=0.125 2023-11-26 00:05:28,074 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.53 vs. limit=22.5 2023-11-26 00:05:33,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3126320.0, ans=0.0 2023-11-26 00:05:34,108 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 468950 2023-11-26 00:05:39,843 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.66 vs. limit=15.0 2023-11-26 00:05:57,839 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3126453.3333333335, ans=0.125 2023-11-26 00:05:58,790 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3126453.3333333335, ans=0.125 2023-11-26 00:06:06,503 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.11 vs. limit=12.0 2023-11-26 00:06:07,103 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 50, loss[loss=0.08051, simple_loss=0.1106, pruned_loss=0.01209, audio_tagging_loss=0.01312, over 15304.00 frames. ], tot_loss[loss=0.07394, simple_loss=0.08854, pruned_loss=0.01223, audio_tagging_loss=0.01744, over 688931.47 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:06:10,532 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.33 vs. limit=15.0 2023-11-26 00:06:28,960 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 469000 2023-11-26 00:06:32,409 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.568e+01 9.207e+01 9.971e+01 1.067e+02 1.313e+02, threshold=1.994e+02, percent-clipped=0.0 2023-11-26 00:06:48,764 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.15 vs. limit=22.5 2023-11-26 00:06:55,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3126786.6666666665, ans=0.0 2023-11-26 00:06:55,512 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3126786.6666666665, ans=0.07 2023-11-26 00:07:01,846 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3126853.3333333335, ans=0.1 2023-11-26 00:07:02,687 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 100, loss[loss=0.07017, simple_loss=0.09027, pruned_loss=0.01384, audio_tagging_loss=0.0112, over 15321.00 frames. ], tot_loss[loss=0.07489, simple_loss=0.09099, pruned_loss=0.0129, audio_tagging_loss=0.01649, over 1207560.39 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:07:03,910 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3126853.3333333335, ans=0.2 2023-11-26 00:07:05,546 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.59 vs. limit=22.5 2023-11-26 00:07:07,432 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.40 vs. limit=15.0 2023-11-26 00:07:25,677 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 469050 2023-11-26 00:07:29,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3126986.6666666665, ans=0.0 2023-11-26 00:07:32,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3126986.6666666665, ans=0.125 2023-11-26 00:07:47,705 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3127120.0, ans=0.2 2023-11-26 00:07:53,398 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.94 vs. limit=12.0 2023-11-26 00:07:58,578 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 150, loss[loss=0.06531, simple_loss=0.0824, pruned_loss=0.01068, audio_tagging_loss=0.01342, over 15609.00 frames. ], tot_loss[loss=0.07309, simple_loss=0.09099, pruned_loss=0.0127, audio_tagging_loss=0.0149, over 1619466.18 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:08:15,617 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3127253.3333333335, ans=0.0 2023-11-26 00:08:18,786 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3127253.3333333335, ans=0.2 2023-11-26 00:08:19,267 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2023-11-26 00:08:19,833 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3127253.3333333335, ans=0.0 2023-11-26 00:08:21,754 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 469100 2023-11-26 00:08:24,930 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.565e+01 9.020e+01 9.615e+01 1.041e+02 1.301e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-26 00:08:41,580 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3127386.6666666665, ans=0.0 2023-11-26 00:08:48,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3127453.3333333335, ans=0.125 2023-11-26 00:08:54,984 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 200, loss[loss=0.06182, simple_loss=0.08494, pruned_loss=0.01063, audio_tagging_loss=0.008726, over 15355.00 frames. ], tot_loss[loss=0.07107, simple_loss=0.09034, pruned_loss=0.01266, audio_tagging_loss=0.01324, over 1933444.67 frames. ], batch size: 60, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:08:55,159 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3127520.0, ans=0.0 2023-11-26 00:09:16,961 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 469150 2023-11-26 00:09:27,078 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3127720.0, ans=0.0 2023-11-26 00:09:30,843 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3127720.0, ans=0.125 2023-11-26 00:09:36,163 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3127720.0, ans=0.0 2023-11-26 00:09:36,322 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3127720.0, ans=0.125 2023-11-26 00:09:45,185 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3127786.6666666665, ans=0.1 2023-11-26 00:09:50,288 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 250, loss[loss=0.0706, simple_loss=0.09678, pruned_loss=0.01516, audio_tagging_loss=0.00705, over 16006.00 frames. ], tot_loss[loss=0.06975, simple_loss=0.09026, pruned_loss=0.0127, audio_tagging_loss=0.01192, over 2176341.71 frames. ], batch size: 60, lr: 1.70e-03, grad_scale: 8.0 2023-11-26 00:09:50,587 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3127853.3333333335, ans=0.0 2023-11-26 00:09:52,927 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.55 vs. limit=22.5 2023-11-26 00:10:03,281 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3127920.0, ans=0.125 2023-11-26 00:10:09,949 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.23 vs. limit=22.5 2023-11-26 00:10:12,209 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.21 vs. limit=15.0 2023-11-26 00:10:12,767 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 469200 2023-11-26 00:10:17,834 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.652e+01 8.769e+01 9.325e+01 1.022e+02 1.435e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-26 00:10:30,356 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3128053.3333333335, ans=0.125 2023-11-26 00:10:35,661 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3128120.0, ans=0.05 2023-11-26 00:10:46,187 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 300, loss[loss=0.06007, simple_loss=0.07773, pruned_loss=0.01008, audio_tagging_loss=0.01113, over 15354.00 frames. ], tot_loss[loss=0.06918, simple_loss=0.09068, pruned_loss=0.0129, audio_tagging_loss=0.01095, over 2377517.82 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 8.0 2023-11-26 00:11:09,522 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 469250 2023-11-26 00:11:25,877 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.11 vs. limit=12.0 2023-11-26 00:11:35,307 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3128453.3333333335, ans=0.125 2023-11-26 00:11:42,965 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 350, loss[loss=0.068, simple_loss=0.09154, pruned_loss=0.0144, audio_tagging_loss=0.007831, over 15528.00 frames. ], tot_loss[loss=0.06886, simple_loss=0.0913, pruned_loss=0.0129, audio_tagging_loss=0.01032, over 2527298.47 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 8.0 2023-11-26 00:11:43,705 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.50 vs. limit=15.0 2023-11-26 00:11:47,419 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3128520.0, ans=0.125 2023-11-26 00:11:55,361 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3128586.6666666665, ans=0.0 2023-11-26 00:12:04,821 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 469300 2023-11-26 00:12:08,967 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.623e+01 8.708e+01 9.325e+01 9.980e+01 1.485e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-26 00:12:15,285 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3128720.0, ans=0.0 2023-11-26 00:12:16,335 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3128720.0, ans=0.125 2023-11-26 00:12:17,693 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.83 vs. limit=10.0 2023-11-26 00:12:19,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3128720.0, ans=0.2 2023-11-26 00:12:31,293 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3128786.6666666665, ans=0.125 2023-11-26 00:12:31,429 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3128786.6666666665, ans=0.2 2023-11-26 00:12:38,373 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 400, loss[loss=0.04893, simple_loss=0.06134, pruned_loss=0.007103, audio_tagging_loss=0.01116, over 15792.00 frames. ], tot_loss[loss=0.0681, simple_loss=0.09065, pruned_loss=0.01278, audio_tagging_loss=0.009994, over 2639281.98 frames. ], batch size: 62, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:12:40,925 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.75 vs. limit=15.0 2023-11-26 00:12:47,452 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.00 vs. limit=22.5 2023-11-26 00:13:00,062 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 469350 2023-11-26 00:13:32,806 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 450, loss[loss=0.06854, simple_loss=0.08821, pruned_loss=0.01382, audio_tagging_loss=0.01062, over 15237.00 frames. ], tot_loss[loss=0.06715, simple_loss=0.08983, pruned_loss=0.01247, audio_tagging_loss=0.009764, over 2727311.31 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:13:37,457 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3129186.6666666665, ans=0.125 2023-11-26 00:13:40,987 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.71 vs. limit=15.0 2023-11-26 00:13:41,726 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3129186.6666666665, ans=0.125 2023-11-26 00:13:56,330 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 469400 2023-11-26 00:14:00,098 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.79 vs. limit=15.0 2023-11-26 00:14:00,682 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.213e+01 8.743e+01 9.299e+01 9.864e+01 1.390e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-26 00:14:03,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3129320.0, ans=0.125 2023-11-26 00:14:04,150 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3129320.0, ans=0.125 2023-11-26 00:14:11,429 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3129386.6666666665, ans=0.125 2023-11-26 00:14:26,035 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3129453.3333333335, ans=0.05 2023-11-26 00:14:26,139 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3129453.3333333335, ans=0.125 2023-11-26 00:14:28,984 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 500, loss[loss=0.07833, simple_loss=0.1045, pruned_loss=0.01652, audio_tagging_loss=0.00958, over 14556.00 frames. ], tot_loss[loss=0.06693, simple_loss=0.0896, pruned_loss=0.01261, audio_tagging_loss=0.009514, over 2795336.37 frames. ], batch size: 52, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:14:36,644 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3129520.0, ans=0.125 2023-11-26 00:14:51,414 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 469450 2023-11-26 00:14:53,614 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3129653.3333333335, ans=0.125 2023-11-26 00:14:59,055 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3129653.3333333335, ans=0.125 2023-11-26 00:15:02,190 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3129720.0, ans=0.0 2023-11-26 00:15:10,187 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.51 vs. limit=15.0 2023-11-26 00:15:14,220 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3129786.6666666665, ans=0.2 2023-11-26 00:15:24,641 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 550, loss[loss=0.07208, simple_loss=0.1008, pruned_loss=0.01249, audio_tagging_loss=0.00921, over 14635.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.08911, pruned_loss=0.0125, audio_tagging_loss=0.009389, over 2849054.72 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:15:26,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3129853.3333333335, ans=0.0 2023-11-26 00:15:31,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3129853.3333333335, ans=0.1 2023-11-26 00:15:34,440 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3129920.0, ans=0.1 2023-11-26 00:15:38,691 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3129920.0, ans=0.0 2023-11-26 00:15:46,112 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.09 vs. limit=15.0 2023-11-26 00:15:46,730 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 469500 2023-11-26 00:15:46,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3129986.6666666665, ans=0.0 2023-11-26 00:15:49,022 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3129986.6666666665, ans=0.125 2023-11-26 00:15:50,839 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.933e+01 8.611e+01 9.176e+01 9.917e+01 4.186e+02, threshold=1.835e+02, percent-clipped=1.0 2023-11-26 00:16:02,907 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3130053.3333333335, ans=0.0 2023-11-26 00:16:14,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3130120.0, ans=0.2 2023-11-26 00:16:18,166 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.83 vs. limit=15.0 2023-11-26 00:16:19,919 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 600, loss[loss=0.06901, simple_loss=0.1011, pruned_loss=0.01156, audio_tagging_loss=0.00689, over 15670.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.08858, pruned_loss=0.0124, audio_tagging_loss=0.009388, over 2900111.90 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:16:25,986 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.64 vs. limit=15.0 2023-11-26 00:16:29,822 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3130186.6666666665, ans=10.0 2023-11-26 00:16:35,552 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.84 vs. limit=10.0 2023-11-26 00:16:40,995 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.58 vs. limit=22.5 2023-11-26 00:16:43,243 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 469550 2023-11-26 00:16:45,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3130320.0, ans=0.125 2023-11-26 00:16:46,011 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.72 vs. limit=15.0 2023-11-26 00:16:49,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3130320.0, ans=0.0 2023-11-26 00:16:55,984 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.93 vs. limit=22.5 2023-11-26 00:17:16,593 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 650, loss[loss=0.05361, simple_loss=0.0683, pruned_loss=0.007487, audio_tagging_loss=0.01198, over 14828.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08788, pruned_loss=0.01235, audio_tagging_loss=0.009393, over 2932191.43 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:17:19,045 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3130520.0, ans=0.1 2023-11-26 00:17:23,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3130520.0, ans=0.125 2023-11-26 00:17:32,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3130586.6666666665, ans=0.1 2023-11-26 00:17:39,107 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 469600 2023-11-26 00:17:43,458 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.523e+01 8.552e+01 9.119e+01 9.990e+01 1.151e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-26 00:18:12,544 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 700, loss[loss=0.08201, simple_loss=0.1163, pruned_loss=0.01542, audio_tagging_loss=0.008448, over 16161.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.08902, pruned_loss=0.01242, audio_tagging_loss=0.009333, over 2959532.10 frames. ], batch size: 60, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:18:15,816 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.44 vs. limit=6.0 2023-11-26 00:18:18,949 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.80 vs. limit=15.0 2023-11-26 00:18:34,304 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 469650 2023-11-26 00:18:59,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3131120.0, ans=0.07 2023-11-26 00:19:00,621 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3131120.0, ans=0.125 2023-11-26 00:19:07,763 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 750, loss[loss=0.07677, simple_loss=0.106, pruned_loss=0.0147, audio_tagging_loss=0.009097, over 15763.00 frames. ], tot_loss[loss=0.0668, simple_loss=0.09027, pruned_loss=0.0125, audio_tagging_loss=0.009161, over 2983715.53 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:19:11,320 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3131186.6666666665, ans=0.125 2023-11-26 00:19:12,162 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=3131186.6666666665, ans=0.5 2023-11-26 00:19:15,743 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.40 vs. limit=10.0 2023-11-26 00:19:24,842 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.57 vs. limit=10.0 2023-11-26 00:19:29,615 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 469700 2023-11-26 00:19:29,736 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3131320.0, ans=0.125 2023-11-26 00:19:34,321 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.489e+01 8.495e+01 9.390e+01 9.960e+01 1.200e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-26 00:19:45,735 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3131386.6666666665, ans=0.125 2023-11-26 00:19:57,404 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3131453.3333333335, ans=0.125 2023-11-26 00:20:03,196 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 800, loss[loss=0.05554, simple_loss=0.07769, pruned_loss=0.009772, audio_tagging_loss=0.006918, over 15116.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.0901, pruned_loss=0.01246, audio_tagging_loss=0.009272, over 3001120.57 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:20:25,565 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 469750 2023-11-26 00:20:30,049 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3131653.3333333335, ans=0.1 2023-11-26 00:20:37,737 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3131720.0, ans=0.125 2023-11-26 00:20:38,879 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3131720.0, ans=0.125 2023-11-26 00:20:52,672 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3131786.6666666665, ans=0.125 2023-11-26 00:20:59,434 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 850, loss[loss=0.09668, simple_loss=0.1335, pruned_loss=0.02038, audio_tagging_loss=0.009554, over 16415.00 frames. ], tot_loss[loss=0.06738, simple_loss=0.09097, pruned_loss=0.01258, audio_tagging_loss=0.009318, over 3015021.82 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:21:16,015 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3131920.0, ans=0.0 2023-11-26 00:21:21,153 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 469800 2023-11-26 00:21:26,642 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.048e+01 8.683e+01 9.047e+01 1.001e+02 1.303e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-26 00:21:33,396 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3132053.3333333335, ans=0.125 2023-11-26 00:21:36,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3132053.3333333335, ans=0.1 2023-11-26 00:21:42,331 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=3132053.3333333335, ans=0.2 2023-11-26 00:21:43,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3132120.0, ans=0.125 2023-11-26 00:21:47,793 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3132120.0, ans=0.09899494936611666 2023-11-26 00:21:55,551 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 900, loss[loss=0.06379, simple_loss=0.08484, pruned_loss=0.01378, audio_tagging_loss=0.00759, over 15159.00 frames. ], tot_loss[loss=0.06698, simple_loss=0.09033, pruned_loss=0.01247, audio_tagging_loss=0.009353, over 3028756.36 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:22:16,294 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3132253.3333333335, ans=0.09899494936611666 2023-11-26 00:22:18,259 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 469850 2023-11-26 00:22:22,623 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3132320.0, ans=0.125 2023-11-26 00:22:32,053 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.71 vs. limit=15.0 2023-11-26 00:22:50,899 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3132520.0, ans=0.2 2023-11-26 00:22:52,231 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 950, loss[loss=0.04982, simple_loss=0.06278, pruned_loss=0.006882, audio_tagging_loss=0.01154, over 15033.00 frames. ], tot_loss[loss=0.06746, simple_loss=0.09128, pruned_loss=0.01269, audio_tagging_loss=0.00913, over 3027766.48 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:23:03,048 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3132586.6666666665, ans=0.125 2023-11-26 00:23:06,932 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3132586.6666666665, ans=0.0 2023-11-26 00:23:14,126 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 469900 2023-11-26 00:23:19,283 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.398e+01 8.462e+01 9.352e+01 1.021e+02 1.286e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-26 00:23:31,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3132720.0, ans=0.125 2023-11-26 00:23:34,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3132720.0, ans=0.2 2023-11-26 00:23:38,717 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.81 vs. limit=15.0 2023-11-26 00:23:47,646 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 1000, loss[loss=0.05455, simple_loss=0.07163, pruned_loss=0.007009, audio_tagging_loss=0.01172, over 15951.00 frames. ], tot_loss[loss=0.06716, simple_loss=0.09095, pruned_loss=0.01267, audio_tagging_loss=0.009015, over 3027296.92 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:24:10,094 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 469950 2023-11-26 00:24:12,219 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 00:24:21,050 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3133053.3333333335, ans=0.035 2023-11-26 00:24:23,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3133053.3333333335, ans=0.125 2023-11-26 00:24:34,500 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3133120.0, ans=0.0 2023-11-26 00:24:38,937 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.62 vs. limit=22.5 2023-11-26 00:24:43,927 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 1050, loss[loss=0.06499, simple_loss=0.08348, pruned_loss=0.0115, audio_tagging_loss=0.01175, over 15415.00 frames. ], tot_loss[loss=0.06671, simple_loss=0.09025, pruned_loss=0.01258, audio_tagging_loss=0.008996, over 3027844.86 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:24:47,760 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.51 vs. limit=22.5 2023-11-26 00:25:00,696 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3133253.3333333335, ans=0.1 2023-11-26 00:25:01,843 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3133253.3333333335, ans=0.0 2023-11-26 00:25:05,357 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.98 vs. limit=22.5 2023-11-26 00:25:06,938 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 470000 2023-11-26 00:25:12,412 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.119e+01 8.549e+01 9.309e+01 1.004e+02 1.287e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-26 00:25:17,912 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.24 vs. limit=15.0 2023-11-26 00:25:40,148 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 1100, loss[loss=0.06315, simple_loss=0.08577, pruned_loss=0.01121, audio_tagging_loss=0.00906, over 14452.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.08924, pruned_loss=0.01229, audio_tagging_loss=0.008967, over 3029654.49 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:25:40,315 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3133520.0, ans=0.2 2023-11-26 00:25:44,371 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 00:25:44,894 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.72 vs. limit=22.5 2023-11-26 00:25:47,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3133520.0, ans=0.0 2023-11-26 00:25:49,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff3.min_abs, batch_count=3133520.0, ans=0.2 2023-11-26 00:25:55,043 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 00:25:57,754 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3133586.6666666665, ans=0.0 2023-11-26 00:26:02,878 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 470050 2023-11-26 00:26:14,291 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.82 vs. limit=22.5 2023-11-26 00:26:31,742 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.95 vs. limit=12.0 2023-11-26 00:26:36,637 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 1150, loss[loss=0.0631, simple_loss=0.08886, pruned_loss=0.01184, audio_tagging_loss=0.006829, over 14422.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08876, pruned_loss=0.01233, audio_tagging_loss=0.008965, over 3029617.62 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:26:48,018 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3133920.0, ans=0.0 2023-11-26 00:26:49,013 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3133920.0, ans=0.2 2023-11-26 00:26:58,331 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 470100 2023-11-26 00:27:04,014 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.254e+01 8.538e+01 9.163e+01 1.008e+02 1.257e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-26 00:27:05,403 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3133986.6666666665, ans=0.125 2023-11-26 00:27:10,781 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3134053.3333333335, ans=0.125 2023-11-26 00:27:13,831 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=3134053.3333333335, ans=0.025 2023-11-26 00:27:15,001 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3134053.3333333335, ans=0.125 2023-11-26 00:27:28,204 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3134120.0, ans=0.125 2023-11-26 00:27:28,631 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.78 vs. limit=22.5 2023-11-26 00:27:31,397 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3134186.6666666665, ans=0.125 2023-11-26 00:27:32,274 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 1200, loss[loss=0.07305, simple_loss=0.102, pruned_loss=0.01386, audio_tagging_loss=0.008215, over 15123.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.08944, pruned_loss=0.01241, audio_tagging_loss=0.008857, over 3024350.26 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:27:33,489 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3134186.6666666665, ans=0.05 2023-11-26 00:27:45,770 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.93 vs. limit=15.0 2023-11-26 00:27:55,243 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 470150 2023-11-26 00:28:14,411 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.99 vs. limit=15.0 2023-11-26 00:28:15,143 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3134386.6666666665, ans=0.2 2023-11-26 00:28:15,160 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3134386.6666666665, ans=0.125 2023-11-26 00:28:21,387 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3134453.3333333335, ans=0.1 2023-11-26 00:28:27,667 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 1250, loss[loss=0.06672, simple_loss=0.09064, pruned_loss=0.01201, audio_tagging_loss=0.009393, over 15299.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.08977, pruned_loss=0.01241, audio_tagging_loss=0.008759, over 3030320.18 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:28:48,507 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3134586.6666666665, ans=0.125 2023-11-26 00:28:49,658 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3134653.3333333335, ans=0.1 2023-11-26 00:28:50,654 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 470200 2023-11-26 00:28:50,770 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3134653.3333333335, ans=0.125 2023-11-26 00:28:50,868 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3134653.3333333335, ans=0.125 2023-11-26 00:28:56,137 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.365e+01 8.537e+01 9.082e+01 9.508e+01 1.462e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-26 00:29:23,793 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 1300, loss[loss=0.07928, simple_loss=0.1029, pruned_loss=0.01624, audio_tagging_loss=0.01161, over 14073.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.08955, pruned_loss=0.01232, audio_tagging_loss=0.008897, over 3028336.85 frames. ], batch size: 53, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:29:45,579 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 470250 2023-11-26 00:30:08,529 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3135120.0, ans=0.0 2023-11-26 00:30:10,641 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3135120.0, ans=0.125 2023-11-26 00:30:19,480 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 1350, loss[loss=0.08382, simple_loss=0.1172, pruned_loss=0.01621, audio_tagging_loss=0.009028, over 15521.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08865, pruned_loss=0.01218, audio_tagging_loss=0.008964, over 3027844.07 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:30:41,949 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 470300 2023-11-26 00:30:47,641 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.032e+01 8.371e+01 9.120e+01 9.741e+01 1.134e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-26 00:30:58,080 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.45 vs. limit=12.0 2023-11-26 00:31:00,783 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 00:31:11,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3135453.3333333335, ans=0.1 2023-11-26 00:31:14,382 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.79 vs. limit=15.0 2023-11-26 00:31:14,792 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 1400, loss[loss=0.07006, simple_loss=0.09233, pruned_loss=0.01389, audio_tagging_loss=0.01001, over 15321.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08855, pruned_loss=0.01226, audio_tagging_loss=0.008965, over 3032922.97 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:31:17,603 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.54 vs. limit=10.0 2023-11-26 00:31:22,544 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3135520.0, ans=0.0 2023-11-26 00:31:38,593 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 470350 2023-11-26 00:31:56,861 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3135720.0, ans=0.0 2023-11-26 00:32:04,977 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3135786.6666666665, ans=0.0 2023-11-26 00:32:11,771 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 1450, loss[loss=0.07963, simple_loss=0.1051, pruned_loss=0.0181, audio_tagging_loss=0.008978, over 15280.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.08916, pruned_loss=0.01244, audio_tagging_loss=0.008946, over 3040579.63 frames. ], batch size: 54, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:32:22,590 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.21 vs. limit=15.0 2023-11-26 00:32:33,899 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 470400 2023-11-26 00:32:40,435 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.317e+01 8.703e+01 9.390e+01 1.022e+02 1.337e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-26 00:32:45,073 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3136053.3333333335, ans=0.125 2023-11-26 00:32:45,342 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.47 vs. limit=12.0 2023-11-26 00:32:51,424 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3136053.3333333335, ans=0.2 2023-11-26 00:33:04,102 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3136120.0, ans=0.125 2023-11-26 00:33:06,185 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3136120.0, ans=0.125 2023-11-26 00:33:08,192 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 1500, loss[loss=0.04745, simple_loss=0.05758, pruned_loss=0.008508, audio_tagging_loss=0.01015, over 14171.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.08975, pruned_loss=0.01256, audio_tagging_loss=0.008987, over 3043126.35 frames. ], batch size: 53, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:33:08,529 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3136186.6666666665, ans=0.125 2023-11-26 00:33:10,074 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.96 vs. limit=10.0 2023-11-26 00:33:17,900 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3136253.3333333335, ans=0.125 2023-11-26 00:33:23,790 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.82 vs. limit=10.0 2023-11-26 00:33:30,741 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 470450 2023-11-26 00:33:41,153 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3136386.6666666665, ans=0.125 2023-11-26 00:33:51,201 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3136386.6666666665, ans=0.04949747468305833 2023-11-26 00:33:51,354 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.79 vs. limit=10.0 2023-11-26 00:34:03,666 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 1550, loss[loss=0.05795, simple_loss=0.07567, pruned_loss=0.008948, audio_tagging_loss=0.01117, over 17121.00 frames. ], tot_loss[loss=0.06714, simple_loss=0.09061, pruned_loss=0.01276, audio_tagging_loss=0.009076, over 3036549.46 frames. ], batch size: 66, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:34:10,712 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.23 vs. limit=15.0 2023-11-26 00:34:13,490 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3136520.0, ans=0.0 2023-11-26 00:34:13,545 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3136520.0, ans=0.1 2023-11-26 00:34:14,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3136586.6666666665, ans=0.1 2023-11-26 00:34:26,673 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 470500 2023-11-26 00:34:33,563 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.285e+01 8.668e+01 9.304e+01 9.957e+01 1.824e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-26 00:34:34,914 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3136653.3333333335, ans=0.125 2023-11-26 00:34:35,960 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=3136653.3333333335, ans=10.0 2023-11-26 00:34:38,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3136720.0, ans=0.2 2023-11-26 00:34:41,080 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3136720.0, ans=0.125 2023-11-26 00:34:43,204 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3136720.0, ans=0.2 2023-11-26 00:34:45,675 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.66 vs. limit=15.0 2023-11-26 00:34:48,533 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3136786.6666666665, ans=0.05 2023-11-26 00:34:48,837 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.81 vs. limit=15.0 2023-11-26 00:34:53,234 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.60 vs. limit=15.0 2023-11-26 00:34:56,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3136786.6666666665, ans=0.2 2023-11-26 00:34:59,525 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 1600, loss[loss=0.07765, simple_loss=0.1125, pruned_loss=0.0133, audio_tagging_loss=0.008089, over 16452.00 frames. ], tot_loss[loss=0.06712, simple_loss=0.0906, pruned_loss=0.01275, audio_tagging_loss=0.009065, over 3045046.48 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:35:04,227 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3136853.3333333335, ans=0.125 2023-11-26 00:35:22,163 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 470550 2023-11-26 00:35:23,456 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3136986.6666666665, ans=0.125 2023-11-26 00:35:26,627 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3136986.6666666665, ans=0.0 2023-11-26 00:35:36,891 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3137053.3333333335, ans=0.0 2023-11-26 00:35:55,997 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 1650, loss[loss=0.06734, simple_loss=0.09281, pruned_loss=0.01132, audio_tagging_loss=0.009613, over 14994.00 frames. ], tot_loss[loss=0.06716, simple_loss=0.09063, pruned_loss=0.0127, audio_tagging_loss=0.009148, over 3048524.73 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:35:57,258 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3137186.6666666665, ans=0.1 2023-11-26 00:36:03,474 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3137186.6666666665, ans=0.125 2023-11-26 00:36:06,697 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3137253.3333333335, ans=0.1 2023-11-26 00:36:07,935 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3137253.3333333335, ans=0.125 2023-11-26 00:36:17,245 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 470600 2023-11-26 00:36:24,354 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.265e+01 8.506e+01 9.125e+01 1.020e+02 1.203e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-26 00:36:26,613 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3137320.0, ans=0.125 2023-11-26 00:36:37,214 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3137386.6666666665, ans=0.125 2023-11-26 00:36:51,029 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 1700, loss[loss=0.06355, simple_loss=0.09081, pruned_loss=0.009332, audio_tagging_loss=0.008812, over 15409.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.0901, pruned_loss=0.01246, audio_tagging_loss=0.00914, over 3055773.46 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:36:59,819 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3137520.0, ans=0.0 2023-11-26 00:37:04,466 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3137586.6666666665, ans=0.125 2023-11-26 00:37:06,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3137586.6666666665, ans=0.0 2023-11-26 00:37:08,849 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3137586.6666666665, ans=0.2 2023-11-26 00:37:09,942 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3137586.6666666665, ans=0.04949747468305833 2023-11-26 00:37:13,062 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 470650 2023-11-26 00:37:15,922 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3137653.3333333335, ans=0.125 2023-11-26 00:37:17,038 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3137653.3333333335, ans=0.0 2023-11-26 00:37:24,610 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3137720.0, ans=0.125 2023-11-26 00:37:25,908 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=3137720.0, ans=22.5 2023-11-26 00:37:46,333 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 1750, loss[loss=0.05264, simple_loss=0.06779, pruned_loss=0.009727, audio_tagging_loss=0.009013, over 15302.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.08986, pruned_loss=0.0125, audio_tagging_loss=0.009111, over 3049340.87 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:37:49,918 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.87 vs. limit=15.0 2023-11-26 00:37:55,002 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3137853.3333333335, ans=0.1 2023-11-26 00:37:55,045 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3137853.3333333335, ans=0.1 2023-11-26 00:37:56,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3137920.0, ans=0.2 2023-11-26 00:37:58,350 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.32 vs. limit=8.0 2023-11-26 00:38:06,565 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3137920.0, ans=0.125 2023-11-26 00:38:08,749 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 470700 2023-11-26 00:38:16,169 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.336e+01 8.541e+01 8.977e+01 9.696e+01 1.531e+02, threshold=1.795e+02, percent-clipped=0.0 2023-11-26 00:38:21,910 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3138053.3333333335, ans=0.125 2023-11-26 00:38:28,499 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.13 vs. limit=15.0 2023-11-26 00:38:31,690 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.72 vs. limit=15.0 2023-11-26 00:38:35,853 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.04 vs. limit=15.0 2023-11-26 00:38:42,297 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 1800, loss[loss=0.0753, simple_loss=0.1113, pruned_loss=0.01222, audio_tagging_loss=0.007452, over 14711.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.08944, pruned_loss=0.01245, audio_tagging_loss=0.00912, over 3056449.96 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:39:03,982 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 470750 2023-11-26 00:39:14,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3138386.6666666665, ans=0.125 2023-11-26 00:39:26,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3138453.3333333335, ans=0.1 2023-11-26 00:39:30,120 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3138453.3333333335, ans=0.125 2023-11-26 00:39:33,383 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3138453.3333333335, ans=0.125 2023-11-26 00:39:37,383 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 1850, loss[loss=0.08948, simple_loss=0.1324, pruned_loss=0.01823, audio_tagging_loss=0.005033, over 16424.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.09025, pruned_loss=0.01266, audio_tagging_loss=0.008995, over 3053132.85 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:39:40,816 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3138520.0, ans=0.07 2023-11-26 00:39:51,916 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3138586.6666666665, ans=0.0 2023-11-26 00:39:59,030 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 470800 2023-11-26 00:40:02,717 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3138653.3333333335, ans=0.125 2023-11-26 00:40:03,705 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3138653.3333333335, ans=0.0 2023-11-26 00:40:07,150 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.457e+01 8.736e+01 9.136e+01 9.723e+01 1.171e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-26 00:40:11,098 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3138720.0, ans=0.2 2023-11-26 00:40:19,655 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3138720.0, ans=0.125 2023-11-26 00:40:32,805 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 1900, loss[loss=0.05443, simple_loss=0.0718, pruned_loss=0.007805, audio_tagging_loss=0.01073, over 16023.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08862, pruned_loss=0.01226, audio_tagging_loss=0.008952, over 3057326.80 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:40:40,596 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3138853.3333333335, ans=0.1 2023-11-26 00:40:55,381 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 470850 2023-11-26 00:41:02,049 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.38 vs. limit=10.0 2023-11-26 00:41:11,253 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 00:41:15,494 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 00:41:28,558 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 1950, loss[loss=0.06702, simple_loss=0.09802, pruned_loss=0.01271, audio_tagging_loss=0.005304, over 14685.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08865, pruned_loss=0.01234, audio_tagging_loss=0.008938, over 3061751.22 frames. ], batch size: 53, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:41:44,584 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.83 vs. limit=6.0 2023-11-26 00:41:50,570 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 470900 2023-11-26 00:41:59,048 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.531e+01 8.427e+01 9.159e+01 1.002e+02 1.233e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-26 00:42:01,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3139386.6666666665, ans=0.125 2023-11-26 00:42:05,529 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3139386.6666666665, ans=0.2 2023-11-26 00:42:24,494 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 2000, loss[loss=0.08093, simple_loss=0.1166, pruned_loss=0.01421, audio_tagging_loss=0.008429, over 14126.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.08914, pruned_loss=0.01238, audio_tagging_loss=0.008926, over 3054415.28 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:42:31,044 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3139520.0, ans=0.125 2023-11-26 00:42:40,588 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3139586.6666666665, ans=0.2 2023-11-26 00:42:46,821 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 470950 2023-11-26 00:42:48,043 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3139653.3333333335, ans=0.0 2023-11-26 00:43:09,294 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3139786.6666666665, ans=0.04949747468305833 2023-11-26 00:43:16,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3139786.6666666665, ans=0.125 2023-11-26 00:43:19,545 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 2050, loss[loss=0.06705, simple_loss=0.08973, pruned_loss=0.01462, audio_tagging_loss=0.007563, over 15611.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.08911, pruned_loss=0.01239, audio_tagging_loss=0.008912, over 3055572.17 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:43:28,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3139853.3333333335, ans=0.125 2023-11-26 00:43:41,861 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 471000 2023-11-26 00:43:49,194 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3139986.6666666665, ans=0.1 2023-11-26 00:43:50,084 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.223e+01 8.583e+01 9.206e+01 9.963e+01 1.276e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-26 00:43:52,369 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3140053.3333333335, ans=0.0 2023-11-26 00:43:55,653 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3140053.3333333335, ans=0.125 2023-11-26 00:44:06,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3140120.0, ans=0.125 2023-11-26 00:44:13,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3140120.0, ans=0.1 2023-11-26 00:44:15,648 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 2100, loss[loss=0.0749, simple_loss=0.1041, pruned_loss=0.01575, audio_tagging_loss=0.007106, over 15235.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.08973, pruned_loss=0.01254, audio_tagging_loss=0.008866, over 3049227.94 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:44:15,893 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3140186.6666666665, ans=0.1 2023-11-26 00:44:22,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3140186.6666666665, ans=0.125 2023-11-26 00:44:26,128 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3140253.3333333335, ans=0.125 2023-11-26 00:44:38,000 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 471050 2023-11-26 00:44:39,323 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3140320.0, ans=0.125 2023-11-26 00:45:04,856 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3140453.3333333335, ans=0.125 2023-11-26 00:45:11,030 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 2150, loss[loss=0.06796, simple_loss=0.0852, pruned_loss=0.01621, audio_tagging_loss=0.009149, over 13766.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.08977, pruned_loss=0.01259, audio_tagging_loss=0.00887, over 3044730.92 frames. ], batch size: 51, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:45:24,804 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.39 vs. limit=12.0 2023-11-26 00:45:27,792 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3140586.6666666665, ans=0.125 2023-11-26 00:45:33,537 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 471100 2023-11-26 00:45:41,970 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.211e+01 8.773e+01 9.255e+01 9.995e+01 1.124e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-26 00:45:45,891 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 00:45:49,359 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3140720.0, ans=0.125 2023-11-26 00:45:54,843 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3140786.6666666665, ans=0.1 2023-11-26 00:45:58,016 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 00:46:06,709 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 2200, loss[loss=0.06097, simple_loss=0.08386, pruned_loss=0.009628, audio_tagging_loss=0.009412, over 14326.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.08984, pruned_loss=0.01263, audio_tagging_loss=0.008909, over 3040947.41 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:46:07,153 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.89 vs. limit=12.0 2023-11-26 00:46:23,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3140920.0, ans=0.125 2023-11-26 00:46:24,277 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3140920.0, ans=0.125 2023-11-26 00:46:29,025 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 471150 2023-11-26 00:46:39,719 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3141053.3333333335, ans=0.2 2023-11-26 00:46:51,893 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3141120.0, ans=0.1 2023-11-26 00:46:59,725 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3141120.0, ans=0.1 2023-11-26 00:47:01,680 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 2250, loss[loss=0.07185, simple_loss=0.106, pruned_loss=0.01224, audio_tagging_loss=0.006597, over 15122.00 frames. ], tot_loss[loss=0.06701, simple_loss=0.09074, pruned_loss=0.01275, audio_tagging_loss=0.008889, over 3043467.29 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:47:23,524 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 471200 2023-11-26 00:47:32,699 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.277e+01 8.619e+01 9.398e+01 1.009e+02 1.153e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-26 00:47:35,085 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3141386.6666666665, ans=0.2 2023-11-26 00:47:39,839 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3141386.6666666665, ans=0.125 2023-11-26 00:47:48,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3141453.3333333335, ans=0.125 2023-11-26 00:47:52,245 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3141453.3333333335, ans=0.125 2023-11-26 00:47:57,268 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 2300, loss[loss=0.07641, simple_loss=0.1113, pruned_loss=0.01474, audio_tagging_loss=0.00601, over 16011.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.09001, pruned_loss=0.01263, audio_tagging_loss=0.008909, over 3045841.84 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:48:07,114 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3141586.6666666665, ans=0.1 2023-11-26 00:48:19,698 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 471250 2023-11-26 00:48:32,019 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3141720.0, ans=0.025 2023-11-26 00:48:33,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3141720.0, ans=0.1 2023-11-26 00:48:41,362 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3141786.6666666665, ans=0.125 2023-11-26 00:48:43,529 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3141786.6666666665, ans=0.125 2023-11-26 00:48:43,893 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.58 vs. limit=12.0 2023-11-26 00:48:46,565 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 00:48:52,334 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 2350, loss[loss=0.07282, simple_loss=0.09949, pruned_loss=0.01167, audio_tagging_loss=0.01141, over 13787.00 frames. ], tot_loss[loss=0.06709, simple_loss=0.09093, pruned_loss=0.01264, audio_tagging_loss=0.008988, over 3051639.28 frames. ], batch size: 54, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:48:53,684 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3141853.3333333335, ans=0.125 2023-11-26 00:49:06,968 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3141920.0, ans=0.125 2023-11-26 00:49:07,320 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.21 vs. limit=15.0 2023-11-26 00:49:11,427 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.56 vs. limit=15.0 2023-11-26 00:49:14,616 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 471300 2023-11-26 00:49:19,307 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.53 vs. limit=15.0 2023-11-26 00:49:20,141 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3141986.6666666665, ans=0.0 2023-11-26 00:49:23,015 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.222e+01 8.557e+01 9.249e+01 9.915e+01 1.418e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-26 00:49:24,529 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.33 vs. limit=22.5 2023-11-26 00:49:27,674 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3142053.3333333335, ans=0.1 2023-11-26 00:49:44,108 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3142120.0, ans=0.0 2023-11-26 00:49:48,008 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 2400, loss[loss=0.08526, simple_loss=0.108, pruned_loss=0.02086, audio_tagging_loss=0.01041, over 16193.00 frames. ], tot_loss[loss=0.06735, simple_loss=0.09082, pruned_loss=0.01274, audio_tagging_loss=0.009204, over 3054274.43 frames. ], batch size: 63, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:49:56,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3142186.6666666665, ans=0.1 2023-11-26 00:49:59,496 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3142253.3333333335, ans=0.1 2023-11-26 00:50:05,723 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3142253.3333333335, ans=0.125 2023-11-26 00:50:09,803 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 471350 2023-11-26 00:50:19,980 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3142386.6666666665, ans=0.0 2023-11-26 00:50:30,515 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.20 vs. limit=15.0 2023-11-26 00:50:43,039 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 2450, loss[loss=0.0605, simple_loss=0.08673, pruned_loss=0.007842, audio_tagging_loss=0.009292, over 15511.00 frames. ], tot_loss[loss=0.06735, simple_loss=0.0907, pruned_loss=0.01275, audio_tagging_loss=0.009248, over 3055676.41 frames. ], batch size: 62, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:50:53,676 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3142586.6666666665, ans=0.125 2023-11-26 00:51:04,624 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 471400 2023-11-26 00:51:13,854 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.339e+01 8.694e+01 9.441e+01 1.025e+02 1.251e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-26 00:51:17,296 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3142720.0, ans=0.125 2023-11-26 00:51:20,074 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3142720.0, ans=0.125 2023-11-26 00:51:25,329 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3142720.0, ans=0.1 2023-11-26 00:51:37,662 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 2500, loss[loss=0.06463, simple_loss=0.08749, pruned_loss=0.01045, audio_tagging_loss=0.01044, over 14674.00 frames. ], tot_loss[loss=0.06703, simple_loss=0.09028, pruned_loss=0.01267, audio_tagging_loss=0.009232, over 3055526.25 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:52:00,684 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 471450 2023-11-26 00:52:14,412 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3143053.3333333335, ans=0.125 2023-11-26 00:52:16,512 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3143053.3333333335, ans=0.2 2023-11-26 00:52:23,063 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3143120.0, ans=0.1 2023-11-26 00:52:33,327 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 2550, loss[loss=0.06808, simple_loss=0.08892, pruned_loss=0.01313, audio_tagging_loss=0.01049, over 15818.00 frames. ], tot_loss[loss=0.06695, simple_loss=0.09029, pruned_loss=0.01268, audio_tagging_loss=0.009123, over 3053420.76 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:52:34,580 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3143186.6666666665, ans=0.125 2023-11-26 00:52:38,809 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3143186.6666666665, ans=0.125 2023-11-26 00:52:54,898 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 471500 2023-11-26 00:53:03,216 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.431e+01 8.571e+01 9.048e+01 1.003e+02 1.375e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-26 00:53:27,801 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 2600, loss[loss=0.07326, simple_loss=0.09728, pruned_loss=0.01584, audio_tagging_loss=0.008784, over 16727.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.09026, pruned_loss=0.01264, audio_tagging_loss=0.008927, over 3054975.39 frames. ], batch size: 61, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:53:33,361 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3143520.0, ans=0.2 2023-11-26 00:53:49,065 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 471550 2023-11-26 00:54:21,741 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3143853.3333333335, ans=0.0 2023-11-26 00:54:22,456 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 2650, loss[loss=0.06082, simple_loss=0.08602, pruned_loss=0.01209, audio_tagging_loss=0.005714, over 14393.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.09017, pruned_loss=0.01266, audio_tagging_loss=0.008914, over 3053102.65 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:54:26,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3143853.3333333335, ans=0.2 2023-11-26 00:54:44,942 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 471600 2023-11-26 00:54:46,148 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3143986.6666666665, ans=0.09899494936611666 2023-11-26 00:54:54,179 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.039e+01 8.622e+01 9.253e+01 1.030e+02 1.251e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-26 00:54:56,488 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3144053.3333333335, ans=0.1 2023-11-26 00:55:18,669 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 2700, loss[loss=0.07154, simple_loss=0.09972, pruned_loss=0.01428, audio_tagging_loss=0.007399, over 15102.00 frames. ], tot_loss[loss=0.06675, simple_loss=0.09039, pruned_loss=0.01269, audio_tagging_loss=0.008866, over 3055602.52 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:55:22,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3144186.6666666665, ans=0.09899494936611666 2023-11-26 00:55:26,960 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3144186.6666666665, ans=0.2 2023-11-26 00:55:37,466 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.19 vs. limit=15.0 2023-11-26 00:55:41,205 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 471650 2023-11-26 00:56:15,118 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 2750, loss[loss=0.0948, simple_loss=0.1359, pruned_loss=0.02221, audio_tagging_loss=0.004651, over 15309.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.08939, pruned_loss=0.01264, audio_tagging_loss=0.00889, over 3059165.54 frames. ], batch size: 54, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:56:36,239 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 471700 2023-11-26 00:56:45,693 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.793e+01 8.564e+01 9.385e+01 1.025e+02 1.216e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-26 00:57:01,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3144786.6666666665, ans=0.0 2023-11-26 00:57:03,944 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 00:57:10,306 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 2800, loss[loss=0.06137, simple_loss=0.08985, pruned_loss=0.008954, audio_tagging_loss=0.007491, over 15526.00 frames. ], tot_loss[loss=0.0652, simple_loss=0.08791, pruned_loss=0.01239, audio_tagging_loss=0.008859, over 3055001.36 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 32.0 2023-11-26 00:57:19,271 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3144853.3333333335, ans=0.0 2023-11-26 00:57:25,553 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3144920.0, ans=10.0 2023-11-26 00:57:33,158 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 471750 2023-11-26 00:57:36,567 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3144986.6666666665, ans=0.0 2023-11-26 00:58:05,871 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 2850, loss[loss=0.08491, simple_loss=0.11, pruned_loss=0.02136, audio_tagging_loss=0.008528, over 15082.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.08931, pruned_loss=0.01275, audio_tagging_loss=0.008806, over 3054149.99 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:58:09,785 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3145186.6666666665, ans=10.0 2023-11-26 00:58:13,996 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3145186.6666666665, ans=0.125 2023-11-26 00:58:14,074 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3145186.6666666665, ans=0.125 2023-11-26 00:58:28,843 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 471800 2023-11-26 00:58:35,579 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3145320.0, ans=0.125 2023-11-26 00:58:38,477 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.919e+01 8.895e+01 9.329e+01 9.789e+01 1.221e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-26 00:58:47,153 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3145386.6666666665, ans=0.125 2023-11-26 00:59:02,280 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 2900, loss[loss=0.06749, simple_loss=0.08957, pruned_loss=0.01236, audio_tagging_loss=0.01035, over 14247.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.08906, pruned_loss=0.01259, audio_tagging_loss=0.00882, over 3049248.03 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 00:59:03,149 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3145520.0, ans=0.125 2023-11-26 00:59:05,273 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3145520.0, ans=0.0 2023-11-26 00:59:18,218 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3145586.6666666665, ans=0.1 2023-11-26 00:59:20,281 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-26 00:59:24,314 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 471850 2023-11-26 00:59:24,713 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.56 vs. limit=15.0 2023-11-26 00:59:28,559 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3145653.3333333335, ans=0.125 2023-11-26 00:59:58,016 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 2950, loss[loss=0.04704, simple_loss=0.06953, pruned_loss=0.005695, audio_tagging_loss=0.006579, over 14644.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.0893, pruned_loss=0.01263, audio_tagging_loss=0.008918, over 3051706.69 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 01:00:20,332 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 471900 2023-11-26 01:00:31,951 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.912e+01 8.856e+01 9.351e+01 9.999e+01 2.175e+02, threshold=1.870e+02, percent-clipped=2.0 2023-11-26 01:00:46,028 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3146120.0, ans=0.125 2023-11-26 01:00:53,310 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 3000, loss[loss=0.07106, simple_loss=0.09247, pruned_loss=0.01468, audio_tagging_loss=0.01014, over 15653.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.08943, pruned_loss=0.01271, audio_tagging_loss=0.009072, over 3055212.82 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 8.0 2023-11-26 01:00:53,311 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-26 01:01:19,823 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.4722, 3.8444, 4.3886, 3.5842], device='cuda:3') 2023-11-26 01:01:25,505 INFO [train_asr.py:1267] (3/4) Epoch 40, validation: loss=0.05777, simple_loss=0.05069, pruned_loss=0.005189, audio_tagging_loss=0.02724, over 4681554.00 frames. 2023-11-26 01:01:25,506 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-26 01:01:26,724 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3146186.6666666665, ans=0.2 2023-11-26 01:01:35,702 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.31 vs. limit=22.5 2023-11-26 01:01:46,799 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 471950 2023-11-26 01:02:20,158 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.64 vs. limit=15.0 2023-11-26 01:02:20,661 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 3050, loss[loss=0.07854, simple_loss=0.1194, pruned_loss=0.01176, audio_tagging_loss=0.007094, over 15752.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.08963, pruned_loss=0.01265, audio_tagging_loss=0.009077, over 3048557.60 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 8.0 2023-11-26 01:02:32,747 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3146586.6666666665, ans=0.125 2023-11-26 01:02:37,655 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3146586.6666666665, ans=0.2 2023-11-26 01:02:40,913 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3146586.6666666665, ans=0.0 2023-11-26 01:02:42,826 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 472000 2023-11-26 01:02:47,518 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3146653.3333333335, ans=0.125 2023-11-26 01:02:56,921 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.352e+01 8.712e+01 9.411e+01 1.021e+02 1.458e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-26 01:02:56,985 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 01:03:05,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3146720.0, ans=0.125 2023-11-26 01:03:08,980 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3146786.6666666665, ans=0.0 2023-11-26 01:03:18,256 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 3100, loss[loss=0.08329, simple_loss=0.1182, pruned_loss=0.0174, audio_tagging_loss=0.006799, over 16372.00 frames. ], tot_loss[loss=0.06676, simple_loss=0.09014, pruned_loss=0.01265, audio_tagging_loss=0.009045, over 3045638.69 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 8.0 2023-11-26 01:03:23,261 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3146853.3333333335, ans=0.0 2023-11-26 01:03:35,716 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.16 vs. limit=15.0 2023-11-26 01:03:36,044 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.05 vs. limit=6.0 2023-11-26 01:03:41,119 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 472050 2023-11-26 01:03:48,246 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.58 vs. limit=15.0 2023-11-26 01:03:53,073 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3147053.3333333335, ans=0.125 2023-11-26 01:03:55,331 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3147053.3333333335, ans=0.125 2023-11-26 01:04:14,259 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 3150, loss[loss=0.09407, simple_loss=0.1329, pruned_loss=0.01876, audio_tagging_loss=0.008876, over 14792.00 frames. ], tot_loss[loss=0.06711, simple_loss=0.09085, pruned_loss=0.01263, audio_tagging_loss=0.009052, over 3042942.29 frames. ], batch size: 53, lr: 1.70e-03, grad_scale: 8.0 2023-11-26 01:04:20,903 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3147186.6666666665, ans=0.125 2023-11-26 01:04:21,012 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3147186.6666666665, ans=0.0 2023-11-26 01:04:36,243 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 472100 2023-11-26 01:04:47,286 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.998e+01 8.868e+01 9.358e+01 9.908e+01 1.230e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-26 01:04:48,517 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3147386.6666666665, ans=0.2 2023-11-26 01:04:51,846 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3147386.6666666665, ans=0.125 2023-11-26 01:05:09,986 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 3200, loss[loss=0.07041, simple_loss=0.08989, pruned_loss=0.01593, audio_tagging_loss=0.009542, over 16229.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.09018, pruned_loss=0.01249, audio_tagging_loss=0.009067, over 3040834.35 frames. ], batch size: 62, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 01:05:10,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3147520.0, ans=0.1 2023-11-26 01:05:15,429 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3147520.0, ans=0.0 2023-11-26 01:05:32,076 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 472150 2023-11-26 01:05:37,525 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3147653.3333333335, ans=0.1 2023-11-26 01:05:44,877 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.48 vs. limit=15.0 2023-11-26 01:05:54,748 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.78 vs. limit=10.0 2023-11-26 01:06:02,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3147786.6666666665, ans=0.125 2023-11-26 01:06:04,939 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 3250, loss[loss=0.06948, simple_loss=0.09485, pruned_loss=0.01359, audio_tagging_loss=0.008464, over 15278.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.08994, pruned_loss=0.01243, audio_tagging_loss=0.009136, over 3044290.50 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 01:06:11,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3147853.3333333335, ans=0.1 2023-11-26 01:06:19,520 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3147920.0, ans=0.125 2023-11-26 01:06:23,224 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3147920.0, ans=0.0 2023-11-26 01:06:26,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3147986.6666666665, ans=0.125 2023-11-26 01:06:27,284 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 472200 2023-11-26 01:06:37,073 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.23 vs. limit=22.5 2023-11-26 01:06:38,676 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.433e+01 8.733e+01 9.362e+01 1.015e+02 1.651e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-26 01:06:57,362 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=3148120.0, ans=15.0 2023-11-26 01:07:01,107 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 3300, loss[loss=0.07022, simple_loss=0.09514, pruned_loss=0.01435, audio_tagging_loss=0.008309, over 15573.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.08975, pruned_loss=0.01234, audio_tagging_loss=0.009254, over 3049486.26 frames. ], batch size: 60, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 01:07:03,847 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3148186.6666666665, ans=0.0 2023-11-26 01:07:23,472 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 472250 2023-11-26 01:07:23,573 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3148320.0, ans=0.0 2023-11-26 01:07:36,451 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3148386.6666666665, ans=0.0 2023-11-26 01:07:45,531 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3148453.3333333335, ans=0.1 2023-11-26 01:07:55,328 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.58 vs. limit=15.0 2023-11-26 01:07:57,002 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 3350, loss[loss=0.06209, simple_loss=0.07916, pruned_loss=0.01208, audio_tagging_loss=0.01043, over 14039.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.08953, pruned_loss=0.01248, audio_tagging_loss=0.009135, over 3052750.30 frames. ], batch size: 54, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 01:08:05,570 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.79 vs. limit=15.0 2023-11-26 01:08:19,833 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 472300 2023-11-26 01:08:30,882 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.038e+01 8.683e+01 9.249e+01 1.019e+02 1.203e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-26 01:08:36,816 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.08 vs. limit=15.0 2023-11-26 01:08:49,119 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.34 vs. limit=22.5 2023-11-26 01:08:52,803 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 3400, loss[loss=0.04471, simple_loss=0.06515, pruned_loss=0.00496, audio_tagging_loss=0.007175, over 14835.00 frames. ], tot_loss[loss=0.0666, simple_loss=0.09006, pruned_loss=0.01255, audio_tagging_loss=0.009015, over 3052165.56 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 01:08:56,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3148853.3333333335, ans=0.1 2023-11-26 01:09:02,577 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3148853.3333333335, ans=0.0 2023-11-26 01:09:06,803 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3148920.0, ans=0.125 2023-11-26 01:09:06,879 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 01:09:15,660 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 472350 2023-11-26 01:09:36,550 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3149120.0, ans=0.125 2023-11-26 01:09:41,254 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 01:09:48,857 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 3450, loss[loss=0.055, simple_loss=0.06384, pruned_loss=0.01231, audio_tagging_loss=0.01077, over 14117.00 frames. ], tot_loss[loss=0.06677, simple_loss=0.09071, pruned_loss=0.01252, audio_tagging_loss=0.008902, over 3048933.33 frames. ], batch size: 54, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 01:10:05,598 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3149253.3333333335, ans=0.2 2023-11-26 01:10:09,884 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3149320.0, ans=0.125 2023-11-26 01:10:11,420 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 472400 2023-11-26 01:10:17,118 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3149320.0, ans=0.125 2023-11-26 01:10:22,056 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.081e+01 8.810e+01 9.451e+01 1.004e+02 1.366e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-26 01:10:37,800 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3149453.3333333335, ans=0.125 2023-11-26 01:10:42,065 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3149453.3333333335, ans=0.05 2023-11-26 01:10:45,052 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 3500, loss[loss=0.05189, simple_loss=0.07398, pruned_loss=0.007912, audio_tagging_loss=0.006987, over 16342.00 frames. ], tot_loss[loss=0.06676, simple_loss=0.09058, pruned_loss=0.01262, audio_tagging_loss=0.008849, over 3049612.42 frames. ], batch size: 64, lr: 1.70e-03, grad_scale: 16.0 2023-11-26 01:10:59,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3149586.6666666665, ans=0.07 2023-11-26 01:11:02,730 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3149586.6666666665, ans=0.0 2023-11-26 01:11:07,033 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.99 vs. limit=15.0 2023-11-26 01:11:08,080 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 472450 2023-11-26 01:11:15,498 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 01:11:40,888 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 3550, loss[loss=0.04845, simple_loss=0.0601, pruned_loss=0.008434, audio_tagging_loss=0.009965, over 17259.00 frames. ], tot_loss[loss=0.06674, simple_loss=0.09045, pruned_loss=0.01263, audio_tagging_loss=0.008884, over 3052946.87 frames. ], batch size: 68, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:12:02,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3149920.0, ans=0.0 2023-11-26 01:12:03,031 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3149986.6666666665, ans=0.0 2023-11-26 01:12:04,007 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 472500 2023-11-26 01:12:14,563 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.986e+01 8.583e+01 9.059e+01 9.596e+01 1.364e+02, threshold=1.812e+02, percent-clipped=0.0 2023-11-26 01:12:19,219 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3150053.3333333335, ans=0.0 2023-11-26 01:12:29,317 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3150120.0, ans=0.125 2023-11-26 01:12:30,325 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3150120.0, ans=0.125 2023-11-26 01:12:37,535 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 3600, loss[loss=0.07653, simple_loss=0.1093, pruned_loss=0.01428, audio_tagging_loss=0.007604, over 15606.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.08986, pruned_loss=0.01255, audio_tagging_loss=0.008924, over 3049356.48 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 01:12:43,050 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.84 vs. limit=15.0 2023-11-26 01:12:46,121 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.39 vs. limit=15.0 2023-11-26 01:12:47,868 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3150253.3333333335, ans=0.125 2023-11-26 01:12:55,903 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.17 vs. limit=15.0 2023-11-26 01:12:57,464 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3150253.3333333335, ans=0.04949747468305833 2023-11-26 01:12:58,499 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3150320.0, ans=0.0 2023-11-26 01:12:59,412 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 472550 2023-11-26 01:13:14,541 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3150386.6666666665, ans=0.0 2023-11-26 01:13:14,598 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3150386.6666666665, ans=0.2 2023-11-26 01:13:22,548 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3150453.3333333335, ans=0.125 2023-11-26 01:13:33,441 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 3650, loss[loss=0.08631, simple_loss=0.1149, pruned_loss=0.01982, audio_tagging_loss=0.009049, over 15246.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.09002, pruned_loss=0.01261, audio_tagging_loss=0.008898, over 3045079.19 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:13:35,906 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 01:13:37,950 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3150520.0, ans=0.0 2023-11-26 01:13:42,481 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.97 vs. limit=12.0 2023-11-26 01:13:46,500 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3150586.6666666665, ans=0.125 2023-11-26 01:13:53,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3150586.6666666665, ans=0.125 2023-11-26 01:13:54,923 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.85 vs. limit=15.0 2023-11-26 01:13:55,259 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 472600 2023-11-26 01:14:08,546 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.534e+01 8.763e+01 9.129e+01 9.774e+01 1.635e+02, threshold=1.826e+02, percent-clipped=0.0 2023-11-26 01:14:22,730 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3150786.6666666665, ans=0.125 2023-11-26 01:14:22,962 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.94 vs. limit=22.5 2023-11-26 01:14:26,143 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.26 vs. limit=22.5 2023-11-26 01:14:27,794 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3150853.3333333335, ans=0.0 2023-11-26 01:14:28,702 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 3700, loss[loss=0.0618, simple_loss=0.07381, pruned_loss=0.01203, audio_tagging_loss=0.01286, over 15067.00 frames. ], tot_loss[loss=0.06739, simple_loss=0.09148, pruned_loss=0.01283, audio_tagging_loss=0.008815, over 3053878.12 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:14:43,228 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3150920.0, ans=0.2 2023-11-26 01:14:49,206 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.79 vs. limit=15.0 2023-11-26 01:14:52,241 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 472650 2023-11-26 01:15:01,984 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3151053.3333333335, ans=0.05 2023-11-26 01:15:08,302 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3151053.3333333335, ans=0.1 2023-11-26 01:15:09,335 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3151053.3333333335, ans=0.125 2023-11-26 01:15:09,419 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3151053.3333333335, ans=0.125 2023-11-26 01:15:25,866 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 3750, loss[loss=0.0559, simple_loss=0.07147, pruned_loss=0.01044, audio_tagging_loss=0.009728, over 14756.00 frames. ], tot_loss[loss=0.06833, simple_loss=0.09244, pruned_loss=0.01323, audio_tagging_loss=0.008887, over 3050622.56 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:15:27,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3151186.6666666665, ans=0.0 2023-11-26 01:15:44,920 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.35 vs. limit=15.0 2023-11-26 01:15:47,756 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 472700 2023-11-26 01:15:52,222 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3151320.0, ans=0.1 2023-11-26 01:15:59,311 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.274e+01 8.900e+01 9.433e+01 1.035e+02 1.729e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-26 01:16:06,236 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 01:16:21,666 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 3800, loss[loss=0.05668, simple_loss=0.07206, pruned_loss=0.00986, audio_tagging_loss=0.01079, over 14533.00 frames. ], tot_loss[loss=0.06798, simple_loss=0.09208, pruned_loss=0.01301, audio_tagging_loss=0.008931, over 3045642.05 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:16:26,207 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3151520.0, ans=0.0 2023-11-26 01:16:30,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3151520.0, ans=0.0 2023-11-26 01:16:43,122 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 472750 2023-11-26 01:16:43,240 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3151653.3333333335, ans=0.2 2023-11-26 01:17:16,308 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 3850, loss[loss=0.06667, simple_loss=0.09485, pruned_loss=0.01324, audio_tagging_loss=0.006013, over 15514.00 frames. ], tot_loss[loss=0.06767, simple_loss=0.09141, pruned_loss=0.01292, audio_tagging_loss=0.00904, over 3040777.14 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:17:18,984 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.56 vs. limit=12.0 2023-11-26 01:17:37,746 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.34 vs. limit=10.0 2023-11-26 01:17:39,145 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 472800 2023-11-26 01:17:43,252 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.51 vs. limit=15.0 2023-11-26 01:17:47,352 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3151986.6666666665, ans=0.125 2023-11-26 01:17:51,398 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.434e+01 8.590e+01 9.252e+01 9.700e+01 1.619e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-26 01:18:12,592 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 3900, loss[loss=0.08434, simple_loss=0.1124, pruned_loss=0.01649, audio_tagging_loss=0.01168, over 15556.00 frames. ], tot_loss[loss=0.06734, simple_loss=0.09094, pruned_loss=0.01287, audio_tagging_loss=0.008999, over 3042419.55 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:18:14,364 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3152186.6666666665, ans=0.2 2023-11-26 01:18:15,344 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3152186.6666666665, ans=0.2 2023-11-26 01:18:34,790 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 472850 2023-11-26 01:18:36,029 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3152320.0, ans=0.1 2023-11-26 01:18:40,380 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.15 vs. limit=15.0 2023-11-26 01:18:49,732 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3152386.6666666665, ans=0.0 2023-11-26 01:19:02,542 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3152453.3333333335, ans=0.0 2023-11-26 01:19:02,643 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3152453.3333333335, ans=0.2 2023-11-26 01:19:08,181 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 3950, loss[loss=0.05967, simple_loss=0.07869, pruned_loss=0.01267, audio_tagging_loss=0.007655, over 14091.00 frames. ], tot_loss[loss=0.06748, simple_loss=0.09101, pruned_loss=0.01287, audio_tagging_loss=0.0091, over 3040441.82 frames. ], batch size: 53, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:19:19,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3152586.6666666665, ans=0.1 2023-11-26 01:19:29,415 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 472900 2023-11-26 01:19:29,635 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 01:19:38,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3152653.3333333335, ans=0.125 2023-11-26 01:19:42,516 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.416e+01 8.671e+01 9.267e+01 9.996e+01 1.170e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-26 01:19:58,055 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=3152786.6666666665, ans=0.025 2023-11-26 01:20:03,229 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 4000, loss[loss=0.06992, simple_loss=0.09597, pruned_loss=0.01025, audio_tagging_loss=0.01169, over 15853.00 frames. ], tot_loss[loss=0.06813, simple_loss=0.09194, pruned_loss=0.01302, audio_tagging_loss=0.009137, over 3049245.26 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 01:20:17,906 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3152920.0, ans=0.1 2023-11-26 01:20:23,117 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3152920.0, ans=0.125 2023-11-26 01:20:26,157 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 472950 2023-11-26 01:20:26,297 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 01:20:29,825 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.34 vs. limit=12.0 2023-11-26 01:20:36,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=3153053.3333333335, ans=0.1 2023-11-26 01:20:38,529 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3153053.3333333335, ans=0.125 2023-11-26 01:20:42,835 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3153053.3333333335, ans=0.0 2023-11-26 01:20:46,048 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3153053.3333333335, ans=0.0 2023-11-26 01:20:48,250 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3153120.0, ans=0.0 2023-11-26 01:20:52,521 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3153120.0, ans=0.0 2023-11-26 01:20:58,604 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 4050, loss[loss=0.08055, simple_loss=0.1132, pruned_loss=0.01492, audio_tagging_loss=0.009012, over 15429.00 frames. ], tot_loss[loss=0.06839, simple_loss=0.09243, pruned_loss=0.01305, audio_tagging_loss=0.009125, over 3046579.93 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 01:20:59,062 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.31 vs. limit=22.5 2023-11-26 01:21:03,963 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 01:21:08,458 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2023-11-26 01:21:19,181 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3153253.3333333335, ans=0.0 2023-11-26 01:21:20,008 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3153253.3333333335, ans=0.125 2023-11-26 01:21:22,158 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 473000 2023-11-26 01:21:33,106 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3153386.6666666665, ans=0.125 2023-11-26 01:21:35,080 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.157e+01 8.884e+01 9.464e+01 1.024e+02 1.208e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 01:21:55,528 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.15 vs. limit=12.0 2023-11-26 01:21:55,902 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 4100, loss[loss=0.05872, simple_loss=0.08298, pruned_loss=0.007914, audio_tagging_loss=0.009317, over 13581.00 frames. ], tot_loss[loss=0.06829, simple_loss=0.09236, pruned_loss=0.013, audio_tagging_loss=0.009112, over 3041856.62 frames. ], batch size: 52, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:22:05,172 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3153520.0, ans=0.1 2023-11-26 01:22:09,501 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3153586.6666666665, ans=0.1 2023-11-26 01:22:16,832 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3153653.3333333335, ans=0.1 2023-11-26 01:22:17,713 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 473050 2023-11-26 01:22:22,114 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=3153653.3333333335, ans=0.05 2023-11-26 01:22:40,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3153786.6666666665, ans=0.125 2023-11-26 01:22:41,171 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3153786.6666666665, ans=0.1 2023-11-26 01:22:51,523 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 4150, loss[loss=0.06406, simple_loss=0.1005, pruned_loss=0.009442, audio_tagging_loss=0.004361, over 15954.00 frames. ], tot_loss[loss=0.06809, simple_loss=0.09226, pruned_loss=0.013, audio_tagging_loss=0.008952, over 3040239.20 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:22:52,781 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3153853.3333333335, ans=0.1 2023-11-26 01:23:08,389 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.26 vs. limit=15.0 2023-11-26 01:23:12,049 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.60 vs. limit=15.0 2023-11-26 01:23:13,776 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 473100 2023-11-26 01:23:24,361 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff3.min_abs, batch_count=3154053.3333333335, ans=0.2 2023-11-26 01:23:27,361 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.811e+01 8.761e+01 9.353e+01 9.782e+01 1.109e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-26 01:23:28,701 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3154053.3333333335, ans=0.0 2023-11-26 01:23:30,148 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.31 vs. limit=15.0 2023-11-26 01:23:32,738 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 01:23:36,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3154120.0, ans=0.125 2023-11-26 01:23:40,406 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3154120.0, ans=0.125 2023-11-26 01:23:44,792 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3154120.0, ans=0.0 2023-11-26 01:23:46,537 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 4200, loss[loss=0.08272, simple_loss=0.1217, pruned_loss=0.0136, audio_tagging_loss=0.008263, over 16047.00 frames. ], tot_loss[loss=0.06755, simple_loss=0.09172, pruned_loss=0.01285, audio_tagging_loss=0.008839, over 3047827.13 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:23:50,644 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.65 vs. limit=12.0 2023-11-26 01:23:53,054 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3154186.6666666665, ans=0.125 2023-11-26 01:23:57,377 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3154253.3333333335, ans=0.07 2023-11-26 01:24:07,197 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3154253.3333333335, ans=0.125 2023-11-26 01:24:10,146 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 473150 2023-11-26 01:24:42,233 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3154520.0, ans=0.125 2023-11-26 01:24:42,952 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 4250, loss[loss=0.05228, simple_loss=0.06866, pruned_loss=0.008744, audio_tagging_loss=0.0092, over 14826.00 frames. ], tot_loss[loss=0.06733, simple_loss=0.09134, pruned_loss=0.01285, audio_tagging_loss=0.008813, over 3049608.09 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:24:46,295 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3154520.0, ans=0.125 2023-11-26 01:24:55,977 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3154586.6666666665, ans=0.1 2023-11-26 01:24:57,119 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3154586.6666666665, ans=0.0 2023-11-26 01:25:05,332 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 473200 2023-11-26 01:25:19,424 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.652e+01 8.719e+01 9.230e+01 1.020e+02 1.385e+02, threshold=1.846e+02, percent-clipped=0.0 2023-11-26 01:25:37,572 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.38 vs. limit=15.0 2023-11-26 01:25:39,090 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 4300, loss[loss=0.05806, simple_loss=0.07967, pruned_loss=0.009181, audio_tagging_loss=0.009045, over 14721.00 frames. ], tot_loss[loss=0.0679, simple_loss=0.09225, pruned_loss=0.01306, audio_tagging_loss=0.008722, over 3045987.98 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:25:44,831 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.26 vs. limit=12.0 2023-11-26 01:25:57,827 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=3154920.0, ans=0.5 2023-11-26 01:25:59,534 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3154920.0, ans=0.125 2023-11-26 01:26:01,418 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 473250 2023-11-26 01:26:06,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3154986.6666666665, ans=0.1 2023-11-26 01:26:14,277 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3155053.3333333335, ans=0.0 2023-11-26 01:26:34,030 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 4350, loss[loss=0.04131, simple_loss=0.04968, pruned_loss=0.006894, audio_tagging_loss=0.00958, over 16729.00 frames. ], tot_loss[loss=0.06731, simple_loss=0.09149, pruned_loss=0.01285, audio_tagging_loss=0.008716, over 3049689.54 frames. ], batch size: 67, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:26:34,285 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3155186.6666666665, ans=0.05 2023-11-26 01:26:53,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3155253.3333333335, ans=0.125 2023-11-26 01:26:56,912 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 473300 2023-11-26 01:26:56,998 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3155320.0, ans=10.0 2023-11-26 01:27:01,739 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3155320.0, ans=0.125 2023-11-26 01:27:09,927 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.910e+01 8.594e+01 9.290e+01 1.001e+02 1.319e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-26 01:27:28,392 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.14 vs. limit=15.0 2023-11-26 01:27:30,057 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 4400, loss[loss=0.05464, simple_loss=0.0674, pruned_loss=0.009974, audio_tagging_loss=0.01096, over 14696.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.09017, pruned_loss=0.01261, audio_tagging_loss=0.008767, over 3048992.79 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 01:27:48,451 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3155586.6666666665, ans=0.0 2023-11-26 01:27:52,522 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 473350 2023-11-26 01:27:58,726 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.72 vs. limit=15.0 2023-11-26 01:28:11,137 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.38 vs. limit=15.0 2023-11-26 01:28:23,937 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3155786.6666666665, ans=0.125 2023-11-26 01:28:26,503 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 4450, loss[loss=0.06875, simple_loss=0.0973, pruned_loss=0.01363, audio_tagging_loss=0.006471, over 16947.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.09022, pruned_loss=0.01273, audio_tagging_loss=0.008794, over 3055252.69 frames. ], batch size: 62, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 01:28:29,901 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3155853.3333333335, ans=0.125 2023-11-26 01:28:30,055 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.89 vs. limit=22.5 2023-11-26 01:28:39,447 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3155920.0, ans=0.125 2023-11-26 01:28:48,836 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 473400 2023-11-26 01:28:57,816 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3155986.6666666665, ans=0.125 2023-11-26 01:29:02,392 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.554e+01 8.765e+01 9.296e+01 9.987e+01 1.152e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-26 01:29:03,865 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.09 vs. limit=15.0 2023-11-26 01:29:13,768 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3156120.0, ans=0.2 2023-11-26 01:29:20,298 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.81 vs. limit=15.0 2023-11-26 01:29:22,006 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 4500, loss[loss=0.0492, simple_loss=0.06427, pruned_loss=0.009503, audio_tagging_loss=0.007566, over 15040.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.09022, pruned_loss=0.01268, audio_tagging_loss=0.008783, over 3051952.42 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 01:29:44,478 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.15 vs. limit=15.0 2023-11-26 01:29:44,854 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 473450 2023-11-26 01:29:48,403 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.45 vs. limit=15.0 2023-11-26 01:29:49,206 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3156320.0, ans=0.125 2023-11-26 01:29:54,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3156386.6666666665, ans=0.05 2023-11-26 01:30:14,391 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3156453.3333333335, ans=0.0 2023-11-26 01:30:18,336 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 4550, loss[loss=0.04551, simple_loss=0.05759, pruned_loss=0.004836, audio_tagging_loss=0.01188, over 14946.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.08941, pruned_loss=0.01254, audio_tagging_loss=0.008897, over 3043873.92 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 01:30:22,042 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.78 vs. limit=6.0 2023-11-26 01:30:27,707 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3156520.0, ans=0.125 2023-11-26 01:30:35,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3156586.6666666665, ans=0.125 2023-11-26 01:30:37,739 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3156586.6666666665, ans=0.0 2023-11-26 01:30:40,765 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 473500 2023-11-26 01:30:43,018 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3156653.3333333335, ans=0.1 2023-11-26 01:30:56,712 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.568e+01 8.762e+01 9.232e+01 1.004e+02 1.439e+02, threshold=1.846e+02, percent-clipped=0.0 2023-11-26 01:31:00,029 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3156720.0, ans=0.0 2023-11-26 01:31:02,015 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 01:31:04,428 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3156786.6666666665, ans=0.125 2023-11-26 01:31:06,966 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3156786.6666666665, ans=0.0 2023-11-26 01:31:14,184 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 4600, loss[loss=0.04794, simple_loss=0.05667, pruned_loss=0.008865, audio_tagging_loss=0.01075, over 14794.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.08964, pruned_loss=0.01252, audio_tagging_loss=0.009058, over 3045691.09 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 01:31:24,650 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3156920.0, ans=0.1 2023-11-26 01:31:26,961 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.00 vs. limit=12.0 2023-11-26 01:31:29,666 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3156920.0, ans=0.04949747468305833 2023-11-26 01:31:35,984 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 473550 2023-11-26 01:31:36,538 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.76 vs. limit=15.0 2023-11-26 01:31:42,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3156986.6666666665, ans=0.125 2023-11-26 01:31:50,501 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3157053.3333333335, ans=0.125 2023-11-26 01:31:59,205 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.78 vs. limit=15.0 2023-11-26 01:32:07,062 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3157120.0, ans=0.2 2023-11-26 01:32:10,103 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 4650, loss[loss=0.07331, simple_loss=0.0973, pruned_loss=0.01406, audio_tagging_loss=0.0106, over 15072.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.08969, pruned_loss=0.0125, audio_tagging_loss=0.00912, over 3039387.85 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 01:32:11,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3157186.6666666665, ans=0.1 2023-11-26 01:32:15,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3157186.6666666665, ans=0.1 2023-11-26 01:32:32,771 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 473600 2023-11-26 01:32:33,190 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.13 vs. limit=15.0 2023-11-26 01:32:35,306 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3157320.0, ans=0.5 2023-11-26 01:32:47,888 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.585e+01 8.767e+01 9.190e+01 1.011e+02 1.594e+02, threshold=1.838e+02, percent-clipped=0.0 2023-11-26 01:32:49,781 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=3157386.6666666665, ans=10.0 2023-11-26 01:32:56,162 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3157453.3333333335, ans=0.1 2023-11-26 01:33:03,064 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3157453.3333333335, ans=0.0 2023-11-26 01:33:05,716 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3157520.0, ans=0.0 2023-11-26 01:33:06,569 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 4700, loss[loss=0.06612, simple_loss=0.09048, pruned_loss=0.01264, audio_tagging_loss=0.008239, over 15627.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.09026, pruned_loss=0.01263, audio_tagging_loss=0.009145, over 3043003.17 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 01:33:07,859 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3157520.0, ans=0.0 2023-11-26 01:33:09,537 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.55 vs. limit=15.0 2023-11-26 01:33:12,179 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3157520.0, ans=0.0 2023-11-26 01:33:16,942 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3157586.6666666665, ans=0.0 2023-11-26 01:33:28,382 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 473650 2023-11-26 01:33:36,493 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3157653.3333333335, ans=0.125 2023-11-26 01:33:45,871 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.38 vs. limit=12.0 2023-11-26 01:33:48,585 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.89 vs. limit=10.0 2023-11-26 01:33:57,444 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.07 vs. limit=15.0 2023-11-26 01:34:02,326 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 4750, loss[loss=0.06352, simple_loss=0.09509, pruned_loss=0.008808, audio_tagging_loss=0.007167, over 14406.00 frames. ], tot_loss[loss=0.06702, simple_loss=0.09037, pruned_loss=0.01269, audio_tagging_loss=0.009144, over 3043058.12 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 01:34:24,130 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 473700 2023-11-26 01:34:40,726 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.460e+01 8.549e+01 9.197e+01 1.001e+02 1.331e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-26 01:34:42,089 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3158053.3333333335, ans=0.125 2023-11-26 01:34:52,948 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.02 vs. limit=15.0 2023-11-26 01:34:57,734 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 4800, loss[loss=0.05471, simple_loss=0.07774, pruned_loss=0.006483, audio_tagging_loss=0.009362, over 13701.00 frames. ], tot_loss[loss=0.06707, simple_loss=0.09016, pruned_loss=0.01275, audio_tagging_loss=0.009235, over 3044659.16 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:35:14,898 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.06 vs. limit=22.5 2023-11-26 01:35:19,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3158320.0, ans=0.125 2023-11-26 01:35:20,606 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 473750 2023-11-26 01:35:41,395 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3158453.3333333335, ans=0.125 2023-11-26 01:35:53,151 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3158520.0, ans=0.125 2023-11-26 01:35:54,548 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 4850, loss[loss=0.06414, simple_loss=0.08036, pruned_loss=0.01302, audio_tagging_loss=0.01095, over 14357.00 frames. ], tot_loss[loss=0.06698, simple_loss=0.09019, pruned_loss=0.01259, audio_tagging_loss=0.009297, over 3045236.33 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:36:02,369 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.00 vs. limit=15.0 2023-11-26 01:36:03,279 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3158520.0, ans=0.1 2023-11-26 01:36:10,786 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.86 vs. limit=15.0 2023-11-26 01:36:12,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3158586.6666666665, ans=0.125 2023-11-26 01:36:13,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3158586.6666666665, ans=0.1 2023-11-26 01:36:16,365 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 473800 2023-11-26 01:36:32,056 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.019e+01 8.661e+01 9.165e+01 9.886e+01 1.284e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-26 01:36:36,110 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3158720.0, ans=0.125 2023-11-26 01:36:41,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3158786.6666666665, ans=0.1 2023-11-26 01:36:44,990 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3158786.6666666665, ans=0.05 2023-11-26 01:36:50,746 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 4900, loss[loss=0.06891, simple_loss=0.09269, pruned_loss=0.01309, audio_tagging_loss=0.009469, over 16315.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.08966, pruned_loss=0.01236, audio_tagging_loss=0.009236, over 3038871.21 frames. ], batch size: 61, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:37:02,325 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.45 vs. limit=5.0 2023-11-26 01:37:09,429 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.25 vs. limit=10.0 2023-11-26 01:37:12,772 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 473850 2023-11-26 01:37:17,098 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3158986.6666666665, ans=0.125 2023-11-26 01:37:21,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3158986.6666666665, ans=0.125 2023-11-26 01:37:35,600 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3159120.0, ans=0.05 2023-11-26 01:37:46,069 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 4950, loss[loss=0.07619, simple_loss=0.1027, pruned_loss=0.01668, audio_tagging_loss=0.008149, over 16252.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.08978, pruned_loss=0.01239, audio_tagging_loss=0.009109, over 3043740.89 frames. ], batch size: 61, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:38:09,197 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 473900 2023-11-26 01:38:09,404 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3159320.0, ans=0.125 2023-11-26 01:38:17,247 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3159320.0, ans=0.125 2023-11-26 01:38:19,388 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3159386.6666666665, ans=0.125 2023-11-26 01:38:24,452 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.979e+01 8.576e+01 9.071e+01 1.006e+02 1.445e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-26 01:38:27,806 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3159386.6666666665, ans=0.125 2023-11-26 01:38:41,907 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 5000, loss[loss=0.06265, simple_loss=0.08846, pruned_loss=0.01098, audio_tagging_loss=0.00744, over 15404.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.09042, pruned_loss=0.01244, audio_tagging_loss=0.008881, over 3042141.49 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:38:47,333 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3159520.0, ans=0.2 2023-11-26 01:38:50,746 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.94 vs. limit=15.0 2023-11-26 01:38:57,382 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3159586.6666666665, ans=0.1 2023-11-26 01:39:04,633 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 473950 2023-11-26 01:39:04,796 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3159653.3333333335, ans=0.1 2023-11-26 01:39:09,021 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 01:39:20,654 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3159720.0, ans=0.125 2023-11-26 01:39:32,722 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3159786.6666666665, ans=0.0 2023-11-26 01:39:38,327 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 5050, loss[loss=0.06559, simple_loss=0.08544, pruned_loss=0.01243, audio_tagging_loss=0.01044, over 15795.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.08963, pruned_loss=0.01247, audio_tagging_loss=0.00883, over 3044399.00 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:39:40,733 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3159853.3333333335, ans=0.1 2023-11-26 01:39:53,808 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.19 vs. limit=22.5 2023-11-26 01:39:59,870 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 474000 2023-11-26 01:40:02,339 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3159986.6666666665, ans=0.0 2023-11-26 01:40:02,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3159986.6666666665, ans=10.0 2023-11-26 01:40:16,448 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.715e+01 8.830e+01 9.284e+01 9.877e+01 1.399e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-26 01:40:21,422 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3160053.3333333335, ans=0.0 2023-11-26 01:40:26,818 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3160120.0, ans=0.0 2023-11-26 01:40:34,112 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 5100, loss[loss=0.07695, simple_loss=0.1108, pruned_loss=0.017, audio_tagging_loss=0.004551, over 16365.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.0891, pruned_loss=0.01241, audio_tagging_loss=0.008738, over 3037300.24 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:40:34,280 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3160186.6666666665, ans=0.125 2023-11-26 01:40:44,821 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3160253.3333333335, ans=0.2 2023-11-26 01:40:56,379 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 474050 2023-11-26 01:41:03,715 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.60 vs. limit=15.0 2023-11-26 01:41:22,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3160453.3333333335, ans=0.125 2023-11-26 01:41:23,520 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3160453.3333333335, ans=0.2 2023-11-26 01:41:24,741 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3160453.3333333335, ans=0.2 2023-11-26 01:41:28,751 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 5150, loss[loss=0.06975, simple_loss=0.09513, pruned_loss=0.01406, audio_tagging_loss=0.008125, over 14556.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.09026, pruned_loss=0.01267, audio_tagging_loss=0.008737, over 3037111.47 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 01:41:36,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3160520.0, ans=0.0 2023-11-26 01:41:51,708 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 474100 2023-11-26 01:42:07,593 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.243e+01 8.876e+01 9.273e+01 9.906e+01 1.225e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-26 01:42:22,499 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.37 vs. limit=15.0 2023-11-26 01:42:24,397 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3160853.3333333335, ans=0.1 2023-11-26 01:42:25,186 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 5200, loss[loss=0.09404, simple_loss=0.1375, pruned_loss=0.02059, audio_tagging_loss=0.004682, over 15454.00 frames. ], tot_loss[loss=0.06737, simple_loss=0.09137, pruned_loss=0.01301, audio_tagging_loss=0.008672, over 3033259.59 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:42:34,243 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3160853.3333333335, ans=0.125 2023-11-26 01:42:46,951 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 474150 2023-11-26 01:43:11,404 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3161120.0, ans=0.1 2023-11-26 01:43:18,893 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3161120.0, ans=0.125 2023-11-26 01:43:20,745 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 5250, loss[loss=0.06928, simple_loss=0.08558, pruned_loss=0.01442, audio_tagging_loss=0.01207, over 14504.00 frames. ], tot_loss[loss=0.06768, simple_loss=0.09201, pruned_loss=0.01302, audio_tagging_loss=0.008657, over 3031097.62 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:43:27,189 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3161186.6666666665, ans=0.125 2023-11-26 01:43:35,808 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3161253.3333333335, ans=0.125 2023-11-26 01:43:35,871 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3161253.3333333335, ans=0.125 2023-11-26 01:43:43,068 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 474200 2023-11-26 01:43:48,255 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3161320.0, ans=0.0 2023-11-26 01:43:51,935 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3161320.0, ans=0.1 2023-11-26 01:44:00,247 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.455e+01 8.738e+01 9.374e+01 1.008e+02 2.043e+02, threshold=1.875e+02, percent-clipped=1.0 2023-11-26 01:44:01,513 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3161386.6666666665, ans=0.1 2023-11-26 01:44:16,296 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 5300, loss[loss=0.07734, simple_loss=0.106, pruned_loss=0.01859, audio_tagging_loss=0.005747, over 15160.00 frames. ], tot_loss[loss=0.0674, simple_loss=0.09161, pruned_loss=0.0129, audio_tagging_loss=0.008698, over 3035492.66 frames. ], batch size: 53, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:44:24,896 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3161520.0, ans=0.125 2023-11-26 01:44:25,974 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3161520.0, ans=0.125 2023-11-26 01:44:31,822 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3161586.6666666665, ans=0.2 2023-11-26 01:44:31,843 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3161586.6666666665, ans=0.2 2023-11-26 01:44:32,856 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 01:44:39,573 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 474250 2023-11-26 01:44:39,914 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.99 vs. limit=6.0 2023-11-26 01:44:48,056 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3161653.3333333335, ans=0.1 2023-11-26 01:45:03,821 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3161786.6666666665, ans=0.0 2023-11-26 01:45:12,043 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 5350, loss[loss=0.09389, simple_loss=0.1356, pruned_loss=0.01982, audio_tagging_loss=0.006279, over 15473.00 frames. ], tot_loss[loss=0.06762, simple_loss=0.0921, pruned_loss=0.0129, audio_tagging_loss=0.008672, over 3034613.29 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:45:14,150 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.85 vs. limit=15.0 2023-11-26 01:45:21,066 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.61 vs. limit=22.5 2023-11-26 01:45:25,400 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.45 vs. limit=15.0 2023-11-26 01:45:34,097 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 474300 2023-11-26 01:45:38,030 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.98 vs. limit=15.0 2023-11-26 01:45:40,108 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.02 vs. limit=6.0 2023-11-26 01:45:42,776 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 01:45:43,741 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3162053.3333333335, ans=0.125 2023-11-26 01:45:50,940 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.249e+01 8.668e+01 9.369e+01 1.006e+02 1.281e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-26 01:46:08,070 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 5400, loss[loss=0.05934, simple_loss=0.07685, pruned_loss=0.009753, audio_tagging_loss=0.01117, over 15467.00 frames. ], tot_loss[loss=0.06708, simple_loss=0.09128, pruned_loss=0.01267, audio_tagging_loss=0.008768, over 3032813.92 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:46:11,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3162186.6666666665, ans=0.0 2023-11-26 01:46:16,092 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.62 vs. limit=6.0 2023-11-26 01:46:21,889 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3162253.3333333335, ans=0.125 2023-11-26 01:46:23,028 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3162253.3333333335, ans=0.2 2023-11-26 01:46:29,629 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 474350 2023-11-26 01:46:39,071 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.84 vs. limit=15.0 2023-11-26 01:47:02,406 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 5450, loss[loss=0.06568, simple_loss=0.0916, pruned_loss=0.01319, audio_tagging_loss=0.006687, over 15241.00 frames. ], tot_loss[loss=0.06676, simple_loss=0.09051, pruned_loss=0.01263, audio_tagging_loss=0.008881, over 3028893.07 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:47:03,876 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.23 vs. limit=15.0 2023-11-26 01:47:25,641 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 474400 2023-11-26 01:47:29,192 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3162653.3333333335, ans=0.0 2023-11-26 01:47:34,416 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3162653.3333333335, ans=0.0 2023-11-26 01:47:41,656 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.978e+01 8.837e+01 9.304e+01 1.039e+02 1.325e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-26 01:47:58,530 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 5500, loss[loss=0.0566, simple_loss=0.07518, pruned_loss=0.009432, audio_tagging_loss=0.009575, over 14678.00 frames. ], tot_loss[loss=0.06679, simple_loss=0.09052, pruned_loss=0.01255, audio_tagging_loss=0.008985, over 3026597.85 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:48:04,986 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.35 vs. limit=15.0 2023-11-26 01:48:20,958 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 474450 2023-11-26 01:48:23,338 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3162986.6666666665, ans=0.0 2023-11-26 01:48:28,701 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.88 vs. limit=12.0 2023-11-26 01:48:42,302 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3163120.0, ans=0.1 2023-11-26 01:48:54,822 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 5550, loss[loss=0.06189, simple_loss=0.08215, pruned_loss=0.01084, audio_tagging_loss=0.009972, over 16179.00 frames. ], tot_loss[loss=0.06714, simple_loss=0.09097, pruned_loss=0.01263, audio_tagging_loss=0.009016, over 3031069.88 frames. ], batch size: 61, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 01:49:01,337 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3163186.6666666665, ans=0.125 2023-11-26 01:49:16,216 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 474500 2023-11-26 01:49:17,577 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3163320.0, ans=0.0 2023-11-26 01:49:34,523 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.448e+01 8.884e+01 9.348e+01 1.017e+02 1.220e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-26 01:49:49,103 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3163520.0, ans=0.05 2023-11-26 01:49:49,613 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.62 vs. limit=22.5 2023-11-26 01:49:49,997 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 5600, loss[loss=0.05845, simple_loss=0.0746, pruned_loss=0.01034, audio_tagging_loss=0.01081, over 15172.00 frames. ], tot_loss[loss=0.06741, simple_loss=0.09147, pruned_loss=0.01263, audio_tagging_loss=0.009048, over 3036180.29 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:50:07,657 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.14 vs. limit=22.5 2023-11-26 01:50:08,311 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.07 vs. limit=15.0 2023-11-26 01:50:12,813 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 474550 2023-11-26 01:50:24,220 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3163720.0, ans=0.125 2023-11-26 01:50:31,295 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 01:50:32,568 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3163720.0, ans=0.0 2023-11-26 01:50:44,853 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3163853.3333333335, ans=0.125 2023-11-26 01:50:46,146 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 5650, loss[loss=0.06019, simple_loss=0.07482, pruned_loss=0.01414, audio_tagging_loss=0.00864, over 14496.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.09058, pruned_loss=0.01243, audio_tagging_loss=0.009156, over 3040538.14 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:50:48,426 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3163853.3333333335, ans=0.125 2023-11-26 01:50:58,825 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3163920.0, ans=0.2 2023-11-26 01:50:59,877 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3163920.0, ans=0.125 2023-11-26 01:50:59,945 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3163920.0, ans=0.125 2023-11-26 01:51:09,006 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 474600 2023-11-26 01:51:14,736 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3163986.6666666665, ans=0.0 2023-11-26 01:51:19,645 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.67 vs. limit=15.0 2023-11-26 01:51:26,855 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.372e+01 8.458e+01 9.048e+01 9.939e+01 1.364e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-26 01:51:42,962 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 5700, loss[loss=0.06906, simple_loss=0.0881, pruned_loss=0.0184, audio_tagging_loss=0.006611, over 15399.00 frames. ], tot_loss[loss=0.06728, simple_loss=0.09137, pruned_loss=0.01257, audio_tagging_loss=0.00903, over 3044997.94 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:51:52,206 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3164186.6666666665, ans=0.1 2023-11-26 01:51:53,280 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3164253.3333333335, ans=0.125 2023-11-26 01:52:04,722 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 474650 2023-11-26 01:52:07,059 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3164320.0, ans=0.125 2023-11-26 01:52:08,767 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3164320.0, ans=0.2 2023-11-26 01:52:12,474 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3164320.0, ans=0.125 2023-11-26 01:52:15,901 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.89 vs. limit=15.0 2023-11-26 01:52:18,015 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.57 vs. limit=15.0 2023-11-26 01:52:30,629 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.84 vs. limit=12.0 2023-11-26 01:52:38,767 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 5750, loss[loss=0.04631, simple_loss=0.05581, pruned_loss=0.008025, audio_tagging_loss=0.01038, over 14720.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.09072, pruned_loss=0.01253, audio_tagging_loss=0.008918, over 3046570.51 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 01:52:39,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3164520.0, ans=0.125 2023-11-26 01:52:47,600 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.46 vs. limit=22.5 2023-11-26 01:52:59,283 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3164586.6666666665, ans=0.0 2023-11-26 01:53:01,291 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 474700 2023-11-26 01:53:20,235 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.530e+01 8.765e+01 9.381e+01 1.013e+02 1.424e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-26 01:53:28,863 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3164786.6666666665, ans=0.125 2023-11-26 01:53:29,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3164786.6666666665, ans=0.125 2023-11-26 01:53:34,545 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 5800, loss[loss=0.07279, simple_loss=0.09608, pruned_loss=0.01514, audio_tagging_loss=0.009609, over 16227.00 frames. ], tot_loss[loss=0.06695, simple_loss=0.09107, pruned_loss=0.01252, audio_tagging_loss=0.0089, over 3047980.89 frames. ], batch size: 60, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 01:53:47,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3164920.0, ans=0.125 2023-11-26 01:53:50,757 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3164920.0, ans=0.1 2023-11-26 01:53:57,409 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 474750 2023-11-26 01:54:06,356 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.50 vs. limit=22.5 2023-11-26 01:54:09,324 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3165053.3333333335, ans=0.125 2023-11-26 01:54:12,959 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.98 vs. limit=15.0 2023-11-26 01:54:21,882 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3165120.0, ans=0.125 2023-11-26 01:54:22,966 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3165120.0, ans=0.0 2023-11-26 01:54:27,206 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3165120.0, ans=0.0 2023-11-26 01:54:30,607 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 5850, loss[loss=0.05564, simple_loss=0.08038, pruned_loss=0.007669, audio_tagging_loss=0.007781, over 14601.00 frames. ], tot_loss[loss=0.06712, simple_loss=0.09141, pruned_loss=0.0126, audio_tagging_loss=0.008812, over 3047317.02 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 01:54:32,899 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3165186.6666666665, ans=0.2 2023-11-26 01:54:32,965 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3165186.6666666665, ans=0.1 2023-11-26 01:54:46,040 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3165253.3333333335, ans=0.125 2023-11-26 01:54:51,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3165320.0, ans=0.0 2023-11-26 01:54:52,371 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 474800 2023-11-26 01:55:05,424 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3165386.6666666665, ans=0.125 2023-11-26 01:55:11,938 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.931e+01 8.672e+01 9.332e+01 1.005e+02 2.095e+02, threshold=1.866e+02, percent-clipped=1.0 2023-11-26 01:55:14,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3165453.3333333335, ans=0.5 2023-11-26 01:55:20,561 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3165453.3333333335, ans=0.125 2023-11-26 01:55:26,150 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 5900, loss[loss=0.07783, simple_loss=0.0962, pruned_loss=0.01995, audio_tagging_loss=0.009786, over 15759.00 frames. ], tot_loss[loss=0.06675, simple_loss=0.09083, pruned_loss=0.01252, audio_tagging_loss=0.00881, over 3054277.96 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 01:55:27,444 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3165520.0, ans=0.0 2023-11-26 01:55:27,641 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.36 vs. limit=15.0 2023-11-26 01:55:46,502 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 01:55:48,424 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 474850 2023-11-26 01:55:51,703 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3165653.3333333335, ans=0.125 2023-11-26 01:55:54,375 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3165653.3333333335, ans=0.125 2023-11-26 01:55:59,105 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.50 vs. limit=15.0 2023-11-26 01:56:01,830 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3165720.0, ans=0.0 2023-11-26 01:56:02,996 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3165720.0, ans=0.2 2023-11-26 01:56:21,178 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 5950, loss[loss=0.0606, simple_loss=0.07692, pruned_loss=0.01073, audio_tagging_loss=0.01141, over 15178.00 frames. ], tot_loss[loss=0.0671, simple_loss=0.09153, pruned_loss=0.01265, audio_tagging_loss=0.008685, over 3057803.96 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 01:56:27,081 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.58 vs. limit=15.0 2023-11-26 01:56:44,131 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 474900 2023-11-26 01:56:45,300 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3165986.6666666665, ans=0.1 2023-11-26 01:57:02,561 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.309e+01 8.504e+01 9.073e+01 9.680e+01 1.067e+02, threshold=1.815e+02, percent-clipped=0.0 2023-11-26 01:57:10,247 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3166120.0, ans=0.1 2023-11-26 01:57:17,305 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 6000, loss[loss=0.05654, simple_loss=0.07056, pruned_loss=0.01272, audio_tagging_loss=0.008537, over 16272.00 frames. ], tot_loss[loss=0.0672, simple_loss=0.09162, pruned_loss=0.01265, audio_tagging_loss=0.008743, over 3061394.75 frames. ], batch size: 63, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:57:17,306 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-26 01:57:37,450 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.6165, 3.7195, 4.0489, 3.4855], device='cuda:3') 2023-11-26 01:57:42,473 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.4693, 3.8169, 2.9261, 3.8873], device='cuda:3') 2023-11-26 01:57:49,486 INFO [train_asr.py:1267] (3/4) Epoch 40, validation: loss=0.0577, simple_loss=0.05067, pruned_loss=0.005162, audio_tagging_loss=0.0272, over 4681554.00 frames. 2023-11-26 01:57:49,487 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-26 01:58:12,981 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 474950 2023-11-26 01:58:15,161 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3166320.0, ans=0.04949747468305833 2023-11-26 01:58:27,856 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3166386.6666666665, ans=0.0 2023-11-26 01:58:30,857 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 01:58:32,104 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3166386.6666666665, ans=0.0 2023-11-26 01:58:38,865 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3166453.3333333335, ans=0.125 2023-11-26 01:58:44,011 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.67 vs. limit=15.0 2023-11-26 01:58:45,671 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 6050, loss[loss=0.08809, simple_loss=0.1184, pruned_loss=0.02141, audio_tagging_loss=0.007485, over 15149.00 frames. ], tot_loss[loss=0.06711, simple_loss=0.0916, pruned_loss=0.01263, audio_tagging_loss=0.008682, over 3061265.11 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:58:55,694 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3166520.0, ans=0.1 2023-11-26 01:59:05,258 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3166586.6666666665, ans=0.125 2023-11-26 01:59:08,352 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 475000 2023-11-26 01:59:15,050 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3166653.3333333335, ans=0.0 2023-11-26 01:59:16,529 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.38 vs. limit=10.0 2023-11-26 01:59:22,063 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3166720.0, ans=0.125 2023-11-26 01:59:25,089 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3166720.0, ans=0.0 2023-11-26 01:59:27,535 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.174e+01 8.819e+01 9.341e+01 9.960e+01 1.201e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-26 01:59:35,659 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3166786.6666666665, ans=0.1 2023-11-26 01:59:39,329 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3166786.6666666665, ans=0.025 2023-11-26 01:59:42,377 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 6100, loss[loss=0.08215, simple_loss=0.1153, pruned_loss=0.01564, audio_tagging_loss=0.008882, over 15906.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.09115, pruned_loss=0.01252, audio_tagging_loss=0.008712, over 3060299.99 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 01:59:42,547 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3166853.3333333335, ans=0.2 2023-11-26 01:59:50,057 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3166853.3333333335, ans=0.125 2023-11-26 01:59:57,758 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.63 vs. limit=22.5 2023-11-26 01:59:58,389 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3166920.0, ans=0.0 2023-11-26 01:59:58,493 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3166920.0, ans=0.1 2023-11-26 02:00:01,819 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3166920.0, ans=0.0 2023-11-26 02:00:04,230 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 475050 2023-11-26 02:00:11,644 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.75 vs. limit=12.0 2023-11-26 02:00:27,452 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.97 vs. limit=15.0 2023-11-26 02:00:33,567 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3167120.0, ans=0.09899494936611666 2023-11-26 02:00:37,605 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 6150, loss[loss=0.08148, simple_loss=0.1187, pruned_loss=0.01525, audio_tagging_loss=0.006872, over 15466.00 frames. ], tot_loss[loss=0.06674, simple_loss=0.09105, pruned_loss=0.01249, audio_tagging_loss=0.008732, over 3061725.38 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:00:38,798 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3167186.6666666665, ans=0.125 2023-11-26 02:00:38,961 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3167186.6666666665, ans=0.5 2023-11-26 02:00:42,987 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3167186.6666666665, ans=0.1 2023-11-26 02:00:46,361 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3167186.6666666665, ans=0.1 2023-11-26 02:01:00,322 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 475100 2023-11-26 02:01:15,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3167386.6666666665, ans=0.0 2023-11-26 02:01:18,777 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.517e+01 8.792e+01 9.265e+01 9.786e+01 1.351e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-26 02:01:33,676 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 6200, loss[loss=0.09343, simple_loss=0.1247, pruned_loss=0.02181, audio_tagging_loss=0.009293, over 15505.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.09047, pruned_loss=0.01244, audio_tagging_loss=0.00886, over 3054501.65 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:01:35,245 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.08 vs. limit=15.0 2023-11-26 02:01:39,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3167520.0, ans=0.1 2023-11-26 02:01:56,165 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 475150 2023-11-26 02:02:30,242 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 6250, loss[loss=0.06005, simple_loss=0.07508, pruned_loss=0.01018, audio_tagging_loss=0.01233, over 15409.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.08988, pruned_loss=0.01234, audio_tagging_loss=0.008944, over 3056394.34 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:02:51,600 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 475200 2023-11-26 02:03:11,824 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.104e+01 8.743e+01 9.437e+01 1.009e+02 1.277e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-26 02:03:24,735 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3168186.6666666665, ans=0.2 2023-11-26 02:03:25,467 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 6300, loss[loss=0.06486, simple_loss=0.08133, pruned_loss=0.01332, audio_tagging_loss=0.01088, over 15871.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.09021, pruned_loss=0.01239, audio_tagging_loss=0.009073, over 3053204.67 frames. ], batch size: 60, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:03:29,361 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.11 vs. limit=15.0 2023-11-26 02:03:32,213 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 02:03:37,611 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3168253.3333333335, ans=0.0 2023-11-26 02:03:48,042 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.87 vs. limit=15.0 2023-11-26 02:03:48,493 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 475250 2023-11-26 02:03:48,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3168320.0, ans=0.0 2023-11-26 02:03:50,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3168320.0, ans=0.2 2023-11-26 02:03:54,510 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3168320.0, ans=0.125 2023-11-26 02:04:01,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3168386.6666666665, ans=0.125 2023-11-26 02:04:12,841 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.64 vs. limit=15.0 2023-11-26 02:04:13,586 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3168453.3333333335, ans=0.0 2023-11-26 02:04:20,887 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 6350, loss[loss=0.07978, simple_loss=0.1126, pruned_loss=0.01567, audio_tagging_loss=0.007801, over 14016.00 frames. ], tot_loss[loss=0.06689, simple_loss=0.09079, pruned_loss=0.01245, audio_tagging_loss=0.009043, over 3044097.37 frames. ], batch size: 52, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:04:44,219 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 475300 2023-11-26 02:04:55,397 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.53 vs. limit=15.0 2023-11-26 02:05:00,369 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3168720.0, ans=0.2 2023-11-26 02:05:02,216 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.087e+01 8.674e+01 9.185e+01 1.006e+02 1.507e+02, threshold=1.837e+02, percent-clipped=0.0 2023-11-26 02:05:17,594 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 6400, loss[loss=0.09806, simple_loss=0.1394, pruned_loss=0.01841, audio_tagging_loss=0.009946, over 15613.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.09032, pruned_loss=0.01232, audio_tagging_loss=0.009165, over 3048178.91 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:05:23,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3168853.3333333335, ans=0.125 2023-11-26 02:05:38,907 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 475350 2023-11-26 02:05:52,155 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3169053.3333333335, ans=0.125 2023-11-26 02:06:12,562 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 6450, loss[loss=0.0852, simple_loss=0.1182, pruned_loss=0.01868, audio_tagging_loss=0.007403, over 15912.00 frames. ], tot_loss[loss=0.06668, simple_loss=0.09026, pruned_loss=0.01238, audio_tagging_loss=0.009176, over 3048701.98 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:06:18,079 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3169186.6666666665, ans=0.125 2023-11-26 02:06:20,157 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3169186.6666666665, ans=0.2 2023-11-26 02:06:34,476 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 475400 2023-11-26 02:06:55,473 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.943e+01 8.666e+01 9.241e+01 9.984e+01 1.381e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-26 02:06:55,767 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3169386.6666666665, ans=0.125 2023-11-26 02:07:04,144 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3169453.3333333335, ans=0.5 2023-11-26 02:07:04,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3169453.3333333335, ans=0.2 2023-11-26 02:07:06,237 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3169453.3333333335, ans=0.1 2023-11-26 02:07:08,093 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 6500, loss[loss=0.06406, simple_loss=0.09142, pruned_loss=0.01048, audio_tagging_loss=0.007872, over 15422.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.0902, pruned_loss=0.01224, audio_tagging_loss=0.009204, over 3048212.65 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:07:17,457 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3169520.0, ans=0.125 2023-11-26 02:07:20,076 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3169586.6666666665, ans=0.2 2023-11-26 02:07:28,679 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3169586.6666666665, ans=0.1 2023-11-26 02:07:30,054 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.29 vs. limit=15.0 2023-11-26 02:07:30,818 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3169653.3333333335, ans=0.0 2023-11-26 02:07:31,672 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 475450 2023-11-26 02:07:39,688 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.71 vs. limit=15.0 2023-11-26 02:07:39,842 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.10 vs. limit=15.0 2023-11-26 02:08:04,752 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 6550, loss[loss=0.09496, simple_loss=0.1302, pruned_loss=0.02174, audio_tagging_loss=0.00815, over 15573.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.09017, pruned_loss=0.01225, audio_tagging_loss=0.009048, over 3052540.80 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:08:07,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3169853.3333333335, ans=0.125 2023-11-26 02:08:12,874 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.96 vs. limit=15.0 2023-11-26 02:08:27,232 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 475500 2023-11-26 02:08:37,547 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3170053.3333333335, ans=0.1 2023-11-26 02:08:47,307 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.189e+01 8.547e+01 9.134e+01 1.014e+02 1.239e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-26 02:08:48,554 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=3170120.0, ans=0.05 2023-11-26 02:08:54,848 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.03 vs. limit=15.0 2023-11-26 02:09:00,041 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3170186.6666666665, ans=0.0 2023-11-26 02:09:00,792 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 6600, loss[loss=0.07853, simple_loss=0.1081, pruned_loss=0.01689, audio_tagging_loss=0.007612, over 15625.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08911, pruned_loss=0.0122, audio_tagging_loss=0.008973, over 3051330.17 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:09:22,502 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 475550 2023-11-26 02:09:28,978 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.46 vs. limit=15.0 2023-11-26 02:09:55,024 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3170520.0, ans=0.125 2023-11-26 02:09:55,782 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 6650, loss[loss=0.051, simple_loss=0.07313, pruned_loss=0.008841, audio_tagging_loss=0.005595, over 15550.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08866, pruned_loss=0.01226, audio_tagging_loss=0.009019, over 3050842.97 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:10:09,878 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 02:10:19,357 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 475600 2023-11-26 02:10:19,570 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3170653.3333333335, ans=0.125 2023-11-26 02:10:38,693 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.085e+01 8.810e+01 9.306e+01 1.020e+02 1.538e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-26 02:10:40,994 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3170786.6666666665, ans=0.125 2023-11-26 02:10:46,940 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3170786.6666666665, ans=0.0 2023-11-26 02:10:48,383 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.60 vs. limit=6.0 2023-11-26 02:10:52,576 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 6700, loss[loss=0.06789, simple_loss=0.0973, pruned_loss=0.01219, audio_tagging_loss=0.007047, over 15759.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08848, pruned_loss=0.01217, audio_tagging_loss=0.008954, over 3051542.73 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:11:14,724 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 475650 2023-11-26 02:11:14,818 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3170986.6666666665, ans=0.125 2023-11-26 02:11:16,958 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3170986.6666666665, ans=0.125 2023-11-26 02:11:22,390 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3170986.6666666665, ans=0.0 2023-11-26 02:11:48,605 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 6750, loss[loss=0.05508, simple_loss=0.07329, pruned_loss=0.01153, audio_tagging_loss=0.006907, over 14455.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08893, pruned_loss=0.01216, audio_tagging_loss=0.008818, over 3048689.26 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:11:58,311 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3171253.3333333335, ans=0.07 2023-11-26 02:12:10,177 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 475700 2023-11-26 02:12:27,502 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3171386.6666666665, ans=0.2 2023-11-26 02:12:30,957 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.458e+01 8.571e+01 9.016e+01 1.004e+02 1.567e+02, threshold=1.803e+02, percent-clipped=0.0 2023-11-26 02:12:34,648 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.92 vs. limit=12.0 2023-11-26 02:12:38,687 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3171453.3333333335, ans=0.2 2023-11-26 02:12:43,784 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 6800, loss[loss=0.05536, simple_loss=0.08063, pruned_loss=0.00699, audio_tagging_loss=0.008056, over 14438.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08923, pruned_loss=0.01225, audio_tagging_loss=0.008817, over 3059260.21 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:13:02,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3171586.6666666665, ans=0.125 2023-11-26 02:13:06,603 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 475750 2023-11-26 02:13:16,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3171720.0, ans=0.125 2023-11-26 02:13:26,376 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3171720.0, ans=0.1 2023-11-26 02:13:37,633 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3171786.6666666665, ans=0.1 2023-11-26 02:13:39,501 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 6850, loss[loss=0.04468, simple_loss=0.05166, pruned_loss=0.007064, audio_tagging_loss=0.01179, over 15023.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.08957, pruned_loss=0.0124, audio_tagging_loss=0.008793, over 3053855.59 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:13:49,593 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.70 vs. limit=15.0 2023-11-26 02:14:02,223 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 475800 2023-11-26 02:14:07,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3171986.6666666665, ans=0.0 2023-11-26 02:14:22,090 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.977e+01 8.606e+01 9.440e+01 1.002e+02 1.257e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-26 02:14:29,770 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3172120.0, ans=0.125 2023-11-26 02:14:35,827 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 6900, loss[loss=0.0746, simple_loss=0.1033, pruned_loss=0.01513, audio_tagging_loss=0.007809, over 16186.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.09046, pruned_loss=0.01239, audio_tagging_loss=0.008727, over 3057638.47 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:14:58,175 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 475850 2023-11-26 02:15:03,654 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3172320.0, ans=0.04949747468305833 2023-11-26 02:15:19,169 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 02:15:20,900 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.39 vs. limit=15.0 2023-11-26 02:15:24,692 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3172453.3333333335, ans=0.2 2023-11-26 02:15:30,552 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3172520.0, ans=0.0 2023-11-26 02:15:31,307 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 6950, loss[loss=0.06558, simple_loss=0.09385, pruned_loss=0.01278, audio_tagging_loss=0.00588, over 15886.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.09087, pruned_loss=0.01241, audio_tagging_loss=0.008771, over 3055169.34 frames. ], batch size: 61, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:15:40,179 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.22 vs. limit=15.0 2023-11-26 02:15:42,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3172586.6666666665, ans=0.09899494936611666 2023-11-26 02:15:44,663 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3172586.6666666665, ans=0.125 2023-11-26 02:15:54,163 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 475900 2023-11-26 02:16:03,786 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3172720.0, ans=0.125 2023-11-26 02:16:14,714 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.019e+01 8.522e+01 9.109e+01 9.823e+01 1.262e+02, threshold=1.822e+02, percent-clipped=0.0 2023-11-26 02:16:18,255 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3172786.6666666665, ans=0.125 2023-11-26 02:16:26,957 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 7000, loss[loss=0.06513, simple_loss=0.0888, pruned_loss=0.009922, audio_tagging_loss=0.01081, over 14363.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.0902, pruned_loss=0.01232, audio_tagging_loss=0.008805, over 3053213.04 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:16:27,233 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3172853.3333333335, ans=0.95 2023-11-26 02:16:35,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3172853.3333333335, ans=0.0 2023-11-26 02:16:48,079 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3172986.6666666665, ans=0.2 2023-11-26 02:16:49,055 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 475950 2023-11-26 02:17:22,599 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 7050, loss[loss=0.06236, simple_loss=0.09012, pruned_loss=0.008102, audio_tagging_loss=0.009191, over 15770.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.09024, pruned_loss=0.01238, audio_tagging_loss=0.008898, over 3050273.32 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:17:34,734 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=3173253.3333333335, ans=15.0 2023-11-26 02:17:44,238 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 476000 2023-11-26 02:17:50,197 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3173320.0, ans=0.1 2023-11-26 02:18:08,093 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.940e+01 8.411e+01 9.041e+01 9.968e+01 1.223e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-26 02:18:14,158 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.31 vs. limit=15.0 2023-11-26 02:18:19,680 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 7100, loss[loss=0.06841, simple_loss=0.08818, pruned_loss=0.01463, audio_tagging_loss=0.009687, over 15470.00 frames. ], tot_loss[loss=0.06675, simple_loss=0.09051, pruned_loss=0.01255, audio_tagging_loss=0.008948, over 3046525.95 frames. ], batch size: 61, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:18:21,837 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.70 vs. limit=15.0 2023-11-26 02:18:31,108 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3173586.6666666665, ans=0.1 2023-11-26 02:18:36,123 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.30 vs. limit=15.0 2023-11-26 02:18:40,424 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3173586.6666666665, ans=0.0 2023-11-26 02:18:42,513 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 476050 2023-11-26 02:18:47,988 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.54 vs. limit=5.0 2023-11-26 02:18:49,851 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.09 vs. limit=15.0 2023-11-26 02:19:15,650 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 7150, loss[loss=0.05911, simple_loss=0.07995, pruned_loss=0.01132, audio_tagging_loss=0.007818, over 15564.00 frames. ], tot_loss[loss=0.06688, simple_loss=0.09076, pruned_loss=0.01257, audio_tagging_loss=0.008926, over 3044061.18 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:19:15,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3173853.3333333335, ans=0.0 2023-11-26 02:19:37,913 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 476100 2023-11-26 02:19:58,941 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.217e+01 8.796e+01 9.304e+01 9.946e+01 1.523e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-26 02:20:06,004 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 02:20:08,274 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3174120.0, ans=0.0 2023-11-26 02:20:10,958 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3174186.6666666665, ans=0.0 2023-11-26 02:20:11,700 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 7200, loss[loss=0.07628, simple_loss=0.103, pruned_loss=0.01703, audio_tagging_loss=0.007773, over 15442.00 frames. ], tot_loss[loss=0.06676, simple_loss=0.09049, pruned_loss=0.01248, audio_tagging_loss=0.009034, over 3041657.66 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:20:15,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3174186.6666666665, ans=0.2 2023-11-26 02:20:17,152 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3174186.6666666665, ans=0.035 2023-11-26 02:20:33,551 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 476150 2023-11-26 02:20:34,732 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3174320.0, ans=0.125 2023-11-26 02:20:59,470 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3174453.3333333335, ans=0.0 2023-11-26 02:21:06,646 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 7250, loss[loss=0.07382, simple_loss=0.09848, pruned_loss=0.01661, audio_tagging_loss=0.007976, over 14770.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.08991, pruned_loss=0.01241, audio_tagging_loss=0.009057, over 3036690.42 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:21:10,174 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3174520.0, ans=0.0 2023-11-26 02:21:23,079 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3174586.6666666665, ans=0.0 2023-11-26 02:21:27,196 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3174586.6666666665, ans=0.125 2023-11-26 02:21:29,704 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 476200 2023-11-26 02:21:34,943 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3174653.3333333335, ans=0.09899494936611666 2023-11-26 02:21:51,696 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.120e+01 8.421e+01 9.196e+01 9.750e+01 1.203e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-26 02:21:58,952 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.82 vs. limit=10.0 2023-11-26 02:22:02,892 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 7300, loss[loss=0.08069, simple_loss=0.11, pruned_loss=0.01799, audio_tagging_loss=0.007705, over 15367.00 frames. ], tot_loss[loss=0.06716, simple_loss=0.091, pruned_loss=0.01273, audio_tagging_loss=0.008938, over 3046285.69 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:22:05,911 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3174853.3333333335, ans=0.125 2023-11-26 02:22:17,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3174920.0, ans=0.0 2023-11-26 02:22:25,740 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 476250 2023-11-26 02:22:50,208 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3175120.0, ans=0.0 2023-11-26 02:22:53,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3175120.0, ans=0.0 2023-11-26 02:22:55,452 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.34 vs. limit=6.0 2023-11-26 02:22:58,974 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 7350, loss[loss=0.06569, simple_loss=0.09385, pruned_loss=0.01102, audio_tagging_loss=0.007739, over 16572.00 frames. ], tot_loss[loss=0.06715, simple_loss=0.09117, pruned_loss=0.01278, audio_tagging_loss=0.008784, over 3049332.88 frames. ], batch size: 60, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 02:23:09,266 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3175253.3333333335, ans=0.0 2023-11-26 02:23:13,234 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3175253.3333333335, ans=0.0 2023-11-26 02:23:14,301 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3175253.3333333335, ans=0.125 2023-11-26 02:23:20,519 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 476300 2023-11-26 02:23:44,140 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 02:23:44,887 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.015e+01 8.628e+01 9.246e+01 9.778e+01 1.248e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-26 02:23:54,450 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 7400, loss[loss=0.07817, simple_loss=0.1126, pruned_loss=0.01471, audio_tagging_loss=0.007141, over 14485.00 frames. ], tot_loss[loss=0.06752, simple_loss=0.09205, pruned_loss=0.01291, audio_tagging_loss=0.008586, over 3050581.21 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 02:24:09,304 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3175586.6666666665, ans=0.125 2023-11-26 02:24:16,196 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 476350 2023-11-26 02:24:20,350 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.40 vs. limit=22.5 2023-11-26 02:24:34,440 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3175720.0, ans=0.0 2023-11-26 02:24:46,161 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 02:24:49,170 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 7450, loss[loss=0.06048, simple_loss=0.08634, pruned_loss=0.009662, audio_tagging_loss=0.007647, over 15873.00 frames. ], tot_loss[loss=0.06674, simple_loss=0.09074, pruned_loss=0.01271, audio_tagging_loss=0.008667, over 3054478.02 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 02:25:02,625 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3175920.0, ans=0.0 2023-11-26 02:25:12,732 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 476400 2023-11-26 02:25:23,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3176053.3333333335, ans=0.07 2023-11-26 02:25:25,119 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.79 vs. limit=22.5 2023-11-26 02:25:30,991 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3176053.3333333335, ans=0.0 2023-11-26 02:25:31,114 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3176053.3333333335, ans=0.0 2023-11-26 02:25:31,402 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.06 vs. limit=22.5 2023-11-26 02:25:35,539 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.103e+01 8.524e+01 9.234e+01 9.933e+01 1.379e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-26 02:25:45,207 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 02:25:46,043 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 7500, loss[loss=0.05205, simple_loss=0.066, pruned_loss=0.007728, audio_tagging_loss=0.01132, over 15197.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.09044, pruned_loss=0.01263, audio_tagging_loss=0.008714, over 3054519.40 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 02:25:52,202 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3176186.6666666665, ans=0.2 2023-11-26 02:26:07,828 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 476450 2023-11-26 02:26:08,231 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.57 vs. limit=15.0 2023-11-26 02:26:14,377 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 02:26:26,571 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3176386.6666666665, ans=0.125 2023-11-26 02:26:41,560 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 7550, loss[loss=0.05967, simple_loss=0.07918, pruned_loss=0.009925, audio_tagging_loss=0.01016, over 15627.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.09022, pruned_loss=0.01257, audio_tagging_loss=0.008658, over 3059958.80 frames. ], batch size: 60, lr: 1.69e-03, grad_scale: 8.0 2023-11-26 02:26:46,995 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3176520.0, ans=0.0 2023-11-26 02:26:54,412 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3176586.6666666665, ans=0.125 2023-11-26 02:27:03,132 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 476500 2023-11-26 02:27:11,609 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3176653.3333333335, ans=0.0 2023-11-26 02:27:13,718 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.37 vs. limit=15.0 2023-11-26 02:27:21,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3176720.0, ans=0.1 2023-11-26 02:27:26,803 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.676e+01 8.615e+01 8.990e+01 9.647e+01 1.278e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-26 02:27:30,562 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.77 vs. limit=15.0 2023-11-26 02:27:36,401 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 7600, loss[loss=0.07715, simple_loss=0.1063, pruned_loss=0.01582, audio_tagging_loss=0.008165, over 15588.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.08999, pruned_loss=0.01258, audio_tagging_loss=0.008687, over 3054483.35 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:27:47,829 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3176920.0, ans=0.125 2023-11-26 02:27:49,388 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3176920.0, ans=0.125 2023-11-26 02:27:51,974 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3176920.0, ans=0.0 2023-11-26 02:27:59,756 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 476550 2023-11-26 02:28:02,067 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3176986.6666666665, ans=0.125 2023-11-26 02:28:17,561 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3177053.3333333335, ans=0.1 2023-11-26 02:28:31,836 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 7650, loss[loss=0.05318, simple_loss=0.07311, pruned_loss=0.00786, audio_tagging_loss=0.008768, over 14162.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.09012, pruned_loss=0.01271, audio_tagging_loss=0.008684, over 3049519.93 frames. ], batch size: 53, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:28:54,251 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 476600 2023-11-26 02:29:17,466 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3177453.3333333335, ans=0.1 2023-11-26 02:29:18,274 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.997e+01 8.582e+01 9.244e+01 1.012e+02 1.285e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-26 02:29:28,320 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 7700, loss[loss=0.06176, simple_loss=0.07835, pruned_loss=0.01168, audio_tagging_loss=0.0109, over 17042.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.09022, pruned_loss=0.01272, audio_tagging_loss=0.008747, over 3049121.39 frames. ], batch size: 65, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:29:40,080 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3177586.6666666665, ans=0.125 2023-11-26 02:29:41,159 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 02:29:46,585 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3177586.6666666665, ans=0.0 2023-11-26 02:29:50,124 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 476650 2023-11-26 02:29:51,394 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3177653.3333333335, ans=0.1 2023-11-26 02:29:58,443 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.04 vs. limit=15.0 2023-11-26 02:30:23,203 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 7750, loss[loss=0.06582, simple_loss=0.08837, pruned_loss=0.0128, audio_tagging_loss=0.008835, over 15360.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.08929, pruned_loss=0.01253, audio_tagging_loss=0.008866, over 3048979.41 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:30:28,649 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3177853.3333333335, ans=0.2 2023-11-26 02:30:45,804 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 476700 2023-11-26 02:30:46,050 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3177986.6666666665, ans=0.125 2023-11-26 02:30:51,297 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3177986.6666666665, ans=0.09899494936611666 2023-11-26 02:30:58,592 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3178053.3333333335, ans=0.07 2023-11-26 02:31:01,754 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3178053.3333333335, ans=0.125 2023-11-26 02:31:07,840 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.997e+01 8.684e+01 9.280e+01 1.001e+02 1.211e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-26 02:31:17,896 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 7800, loss[loss=0.06414, simple_loss=0.09017, pruned_loss=0.00865, audio_tagging_loss=0.01041, over 14526.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.09045, pruned_loss=0.0127, audio_tagging_loss=0.008906, over 3050483.39 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:31:40,631 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 476750 2023-11-26 02:31:44,013 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3178320.0, ans=0.0 2023-11-26 02:31:44,109 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3178320.0, ans=0.5 2023-11-26 02:31:52,530 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3178386.6666666665, ans=0.2 2023-11-26 02:31:57,189 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3178386.6666666665, ans=0.025 2023-11-26 02:31:58,254 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3178386.6666666665, ans=0.1 2023-11-26 02:32:05,054 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3178453.3333333335, ans=0.125 2023-11-26 02:32:13,361 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3178520.0, ans=0.0 2023-11-26 02:32:14,219 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 7850, loss[loss=0.06072, simple_loss=0.071, pruned_loss=0.0116, audio_tagging_loss=0.01362, over 15209.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.08992, pruned_loss=0.01259, audio_tagging_loss=0.009038, over 3036957.76 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:32:16,614 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3178520.0, ans=0.0 2023-11-26 02:32:23,962 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3178586.6666666665, ans=0.125 2023-11-26 02:32:29,094 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3178586.6666666665, ans=0.125 2023-11-26 02:32:33,682 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.15 vs. limit=22.5 2023-11-26 02:32:35,296 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 476800 2023-11-26 02:32:59,772 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.148e+01 8.601e+01 9.556e+01 1.008e+02 1.371e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-26 02:33:01,154 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3178786.6666666665, ans=0.125 2023-11-26 02:33:03,450 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.06 vs. limit=15.0 2023-11-26 02:33:06,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3178786.6666666665, ans=0.0 2023-11-26 02:33:09,235 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 7900, loss[loss=0.07857, simple_loss=0.1096, pruned_loss=0.01694, audio_tagging_loss=0.006829, over 14743.00 frames. ], tot_loss[loss=0.0668, simple_loss=0.08991, pruned_loss=0.01276, audio_tagging_loss=0.009083, over 3038947.30 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:33:31,528 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 476850 2023-11-26 02:33:44,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=3179053.3333333335, ans=0.1 2023-11-26 02:34:04,797 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 7950, loss[loss=0.07799, simple_loss=0.1109, pruned_loss=0.01193, audio_tagging_loss=0.0106, over 15667.00 frames. ], tot_loss[loss=0.06694, simple_loss=0.08994, pruned_loss=0.01272, audio_tagging_loss=0.009241, over 3041104.31 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:34:15,736 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.17 vs. limit=22.5 2023-11-26 02:34:17,550 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3179253.3333333335, ans=0.125 2023-11-26 02:34:17,605 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3179253.3333333335, ans=0.125 2023-11-26 02:34:17,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3179253.3333333335, ans=0.125 2023-11-26 02:34:19,496 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 02:34:21,389 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3179253.3333333335, ans=0.025 2023-11-26 02:34:27,421 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 476900 2023-11-26 02:34:28,588 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3179320.0, ans=0.2 2023-11-26 02:34:31,841 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3179320.0, ans=0.0 2023-11-26 02:34:47,554 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=3179386.6666666665, ans=22.5 2023-11-26 02:34:50,081 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.398e+01 8.906e+01 9.549e+01 1.026e+02 1.284e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-26 02:35:00,599 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 8000, loss[loss=0.05551, simple_loss=0.07002, pruned_loss=0.01188, audio_tagging_loss=0.008619, over 14548.00 frames. ], tot_loss[loss=0.067, simple_loss=0.08985, pruned_loss=0.01273, audio_tagging_loss=0.009348, over 3042198.34 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:35:09,170 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2023-11-26 02:35:09,740 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3179520.0, ans=0.125 2023-11-26 02:35:22,358 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 476950 2023-11-26 02:35:24,565 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3179653.3333333335, ans=0.0 2023-11-26 02:35:27,311 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3179653.3333333335, ans=0.2 2023-11-26 02:35:29,244 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3179653.3333333335, ans=0.125 2023-11-26 02:35:32,444 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3179720.0, ans=0.0 2023-11-26 02:35:35,018 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.76 vs. limit=15.0 2023-11-26 02:35:39,768 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3179720.0, ans=0.0 2023-11-26 02:35:39,814 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3179720.0, ans=0.2 2023-11-26 02:35:49,171 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.08 vs. limit=15.0 2023-11-26 02:35:56,009 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 8050, loss[loss=0.08343, simple_loss=0.1189, pruned_loss=0.01655, audio_tagging_loss=0.007444, over 15712.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.08814, pruned_loss=0.0124, audio_tagging_loss=0.009421, over 3036721.70 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:35:56,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3179853.3333333335, ans=0.125 2023-11-26 02:36:08,393 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3179920.0, ans=0.125 2023-11-26 02:36:18,272 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 477000 2023-11-26 02:36:18,498 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3179986.6666666665, ans=0.2 2023-11-26 02:36:41,467 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.44 vs. limit=15.0 2023-11-26 02:36:41,823 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.105e+01 8.827e+01 9.529e+01 1.030e+02 1.385e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-26 02:36:51,848 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 8100, loss[loss=0.08036, simple_loss=0.1199, pruned_loss=0.01655, audio_tagging_loss=0.003842, over 14857.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08804, pruned_loss=0.01239, audio_tagging_loss=0.009283, over 3023452.50 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:36:52,357 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.54 vs. limit=15.0 2023-11-26 02:36:55,537 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.01 vs. limit=15.0 2023-11-26 02:36:59,582 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3180186.6666666665, ans=0.07 2023-11-26 02:37:08,967 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3180253.3333333335, ans=0.125 2023-11-26 02:37:09,964 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3180253.3333333335, ans=0.125 2023-11-26 02:37:14,057 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 477050 2023-11-26 02:37:19,368 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.17 vs. limit=6.0 2023-11-26 02:37:23,091 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3180320.0, ans=0.125 2023-11-26 02:37:28,289 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3180386.6666666665, ans=0.125 2023-11-26 02:37:46,187 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.18 vs. limit=15.0 2023-11-26 02:37:47,791 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 8150, loss[loss=0.0665, simple_loss=0.09564, pruned_loss=0.01002, audio_tagging_loss=0.008654, over 15418.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08792, pruned_loss=0.01237, audio_tagging_loss=0.009099, over 3029076.17 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:37:57,400 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3180586.6666666665, ans=0.2 2023-11-26 02:38:09,405 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 477100 2023-11-26 02:38:25,399 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3180720.0, ans=0.125 2023-11-26 02:38:33,157 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.717e+01 8.566e+01 9.339e+01 1.007e+02 1.243e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-26 02:38:43,164 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 8200, loss[loss=0.04587, simple_loss=0.0633, pruned_loss=0.004722, audio_tagging_loss=0.009497, over 15472.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.08816, pruned_loss=0.0123, audio_tagging_loss=0.008992, over 3035167.73 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:38:44,421 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3180853.3333333335, ans=0.125 2023-11-26 02:38:45,265 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 02:38:53,734 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3180920.0, ans=0.2 2023-11-26 02:39:04,723 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 477150 2023-11-26 02:39:15,583 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3181053.3333333335, ans=0.125 2023-11-26 02:39:25,329 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.62 vs. limit=22.5 2023-11-26 02:39:28,668 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3181120.0, ans=0.0 2023-11-26 02:39:38,064 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 8250, loss[loss=0.05087, simple_loss=0.06377, pruned_loss=0.007232, audio_tagging_loss=0.01175, over 14229.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.08917, pruned_loss=0.01238, audio_tagging_loss=0.008861, over 3037775.11 frames. ], batch size: 53, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:39:46,602 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.35 vs. limit=15.0 2023-11-26 02:39:55,488 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3181253.3333333335, ans=0.1 2023-11-26 02:40:00,632 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 477200 2023-11-26 02:40:20,411 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3181386.6666666665, ans=0.2 2023-11-26 02:40:23,954 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.842e+01 8.407e+01 9.078e+01 9.588e+01 1.625e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-26 02:40:34,005 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 8300, loss[loss=0.06464, simple_loss=0.09342, pruned_loss=0.01136, audio_tagging_loss=0.00657, over 14871.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.0898, pruned_loss=0.01242, audio_tagging_loss=0.008787, over 3042278.73 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:40:44,774 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.07 vs. limit=22.5 2023-11-26 02:40:56,323 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 477250 2023-11-26 02:41:01,785 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3181653.3333333335, ans=0.125 2023-11-26 02:41:08,740 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3181720.0, ans=0.125 2023-11-26 02:41:29,716 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 8350, loss[loss=0.0617, simple_loss=0.08806, pruned_loss=0.01122, audio_tagging_loss=0.00645, over 14649.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.09063, pruned_loss=0.01254, audio_tagging_loss=0.008619, over 3037130.34 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:41:34,115 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3181853.3333333335, ans=0.125 2023-11-26 02:41:37,951 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3181853.3333333335, ans=0.125 2023-11-26 02:41:45,923 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.04 vs. limit=15.0 2023-11-26 02:41:46,367 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3181920.0, ans=0.1 2023-11-26 02:41:52,062 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 477300 2023-11-26 02:41:55,413 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3181986.6666666665, ans=0.0 2023-11-26 02:41:57,458 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.74 vs. limit=22.5 2023-11-26 02:41:59,351 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.82 vs. limit=15.0 2023-11-26 02:42:01,700 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3181986.6666666665, ans=0.125 2023-11-26 02:42:16,198 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.210e+01 8.717e+01 9.531e+01 1.032e+02 1.340e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-26 02:42:21,218 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3182120.0, ans=0.125 2023-11-26 02:42:25,229 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 8400, loss[loss=0.0626, simple_loss=0.08506, pruned_loss=0.009713, audio_tagging_loss=0.01036, over 16622.00 frames. ], tot_loss[loss=0.0666, simple_loss=0.09081, pruned_loss=0.01259, audio_tagging_loss=0.00861, over 3043564.83 frames. ], batch size: 63, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:42:27,533 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3182186.6666666665, ans=0.125 2023-11-26 02:42:28,584 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3182186.6666666665, ans=0.0 2023-11-26 02:42:33,215 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3182186.6666666665, ans=0.125 2023-11-26 02:42:33,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3182186.6666666665, ans=0.5 2023-11-26 02:42:39,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3182253.3333333335, ans=0.2 2023-11-26 02:42:47,847 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 477350 2023-11-26 02:42:49,098 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3182320.0, ans=0.1 2023-11-26 02:43:11,602 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3182453.3333333335, ans=0.125 2023-11-26 02:43:20,963 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 8450, loss[loss=0.05258, simple_loss=0.0669, pruned_loss=0.009097, audio_tagging_loss=0.01003, over 15319.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.08976, pruned_loss=0.01251, audio_tagging_loss=0.008736, over 3043387.12 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:43:34,269 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3182586.6666666665, ans=0.0 2023-11-26 02:43:35,392 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3182586.6666666665, ans=0.125 2023-11-26 02:43:42,524 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 477400 2023-11-26 02:43:57,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3182720.0, ans=0.2 2023-11-26 02:43:59,072 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.98 vs. limit=15.0 2023-11-26 02:44:09,229 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.467e+01 8.762e+01 9.219e+01 9.767e+01 1.234e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-26 02:44:16,647 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 8500, loss[loss=0.05714, simple_loss=0.07829, pruned_loss=0.009363, audio_tagging_loss=0.008636, over 15109.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.08928, pruned_loss=0.01258, audio_tagging_loss=0.008903, over 3040614.41 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:44:39,006 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 477450 2023-11-26 02:44:53,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3183053.3333333335, ans=0.125 2023-11-26 02:45:00,145 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3183120.0, ans=0.0 2023-11-26 02:45:09,696 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3183120.0, ans=0.125 2023-11-26 02:45:11,607 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 8550, loss[loss=0.07121, simple_loss=0.09856, pruned_loss=0.01409, audio_tagging_loss=0.007845, over 15444.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.08955, pruned_loss=0.01268, audio_tagging_loss=0.008831, over 3044396.93 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:45:23,625 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.25 vs. limit=15.0 2023-11-26 02:45:25,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3183253.3333333335, ans=0.1 2023-11-26 02:45:26,566 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3183253.3333333335, ans=0.125 2023-11-26 02:45:34,870 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 477500 2023-11-26 02:45:50,673 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3183386.6666666665, ans=0.0 2023-11-26 02:45:59,536 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.757e+01 8.847e+01 9.604e+01 1.040e+02 3.358e+02, threshold=1.921e+02, percent-clipped=1.0 2023-11-26 02:46:07,456 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 8600, loss[loss=0.04883, simple_loss=0.0588, pruned_loss=0.007581, audio_tagging_loss=0.01185, over 14663.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08871, pruned_loss=0.01257, audio_tagging_loss=0.008983, over 3046825.39 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:46:24,495 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 02:46:26,580 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3183586.6666666665, ans=0.125 2023-11-26 02:46:29,597 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 477550 2023-11-26 02:46:31,912 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3183653.3333333335, ans=0.0 2023-11-26 02:46:43,499 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3183720.0, ans=0.125 2023-11-26 02:47:03,345 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 8650, loss[loss=0.05161, simple_loss=0.06506, pruned_loss=0.009976, audio_tagging_loss=0.009102, over 14469.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.08894, pruned_loss=0.01247, audio_tagging_loss=0.008982, over 3045341.15 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:47:04,663 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3183853.3333333335, ans=0.125 2023-11-26 02:47:15,365 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.51 vs. limit=22.5 2023-11-26 02:47:19,411 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3183920.0, ans=0.125 2023-11-26 02:47:20,359 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3183920.0, ans=10.0 2023-11-26 02:47:20,468 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3183920.0, ans=0.125 2023-11-26 02:47:24,860 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 477600 2023-11-26 02:47:25,056 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3183986.6666666665, ans=0.0 2023-11-26 02:47:42,714 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.93 vs. limit=22.5 2023-11-26 02:47:50,491 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.395e+01 8.714e+01 9.384e+01 1.008e+02 1.495e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-26 02:47:55,903 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3184120.0, ans=0.1 2023-11-26 02:47:57,878 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 8700, loss[loss=0.08526, simple_loss=0.1152, pruned_loss=0.01913, audio_tagging_loss=0.00856, over 15406.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.08977, pruned_loss=0.01263, audio_tagging_loss=0.00903, over 3050580.22 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:48:00,227 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3184186.6666666665, ans=0.0 2023-11-26 02:48:21,286 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 477650 2023-11-26 02:48:22,438 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3184320.0, ans=0.125 2023-11-26 02:48:28,701 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3184320.0, ans=0.125 2023-11-26 02:48:34,627 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.34 vs. limit=12.0 2023-11-26 02:48:53,190 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.98 vs. limit=12.0 2023-11-26 02:48:53,990 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 8750, loss[loss=0.07174, simple_loss=0.1038, pruned_loss=0.01106, audio_tagging_loss=0.008759, over 15087.00 frames. ], tot_loss[loss=0.06772, simple_loss=0.09142, pruned_loss=0.01302, audio_tagging_loss=0.00899, over 3050115.18 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:49:11,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3184586.6666666665, ans=0.2 2023-11-26 02:49:16,032 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 477700 2023-11-26 02:49:19,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3184653.3333333335, ans=0.125 2023-11-26 02:49:24,518 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3184653.3333333335, ans=0.0 2023-11-26 02:49:35,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3184720.0, ans=0.125 2023-11-26 02:49:41,952 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.014e+01 8.883e+01 9.536e+01 1.029e+02 1.726e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-26 02:49:47,088 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.70 vs. limit=22.5 2023-11-26 02:49:49,891 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 8800, loss[loss=0.05645, simple_loss=0.07738, pruned_loss=0.008677, audio_tagging_loss=0.009086, over 15096.00 frames. ], tot_loss[loss=0.06729, simple_loss=0.09075, pruned_loss=0.01282, audio_tagging_loss=0.009101, over 3047691.29 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:50:10,362 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3184986.6666666665, ans=0.1 2023-11-26 02:50:11,196 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 477750 2023-11-26 02:50:26,617 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3185053.3333333335, ans=0.0 2023-11-26 02:50:44,757 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 8850, loss[loss=0.07097, simple_loss=0.1069, pruned_loss=0.009546, audio_tagging_loss=0.007951, over 14444.00 frames. ], tot_loss[loss=0.06731, simple_loss=0.09101, pruned_loss=0.01269, audio_tagging_loss=0.009122, over 3046891.07 frames. ], batch size: 52, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:50:51,596 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.27 vs. limit=22.5 2023-11-26 02:50:56,857 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 02:51:06,930 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 477800 2023-11-26 02:51:07,015 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3185320.0, ans=0.125 2023-11-26 02:51:07,170 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3185320.0, ans=0.0 2023-11-26 02:51:31,908 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.135e+01 8.669e+01 9.316e+01 9.941e+01 1.332e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-26 02:51:39,821 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 8900, loss[loss=0.07622, simple_loss=0.1175, pruned_loss=0.01204, audio_tagging_loss=0.005415, over 15551.00 frames. ], tot_loss[loss=0.06748, simple_loss=0.0915, pruned_loss=0.01282, audio_tagging_loss=0.008906, over 3053099.27 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:51:56,161 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3185586.6666666665, ans=0.015 2023-11-26 02:51:57,529 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.85 vs. limit=15.0 2023-11-26 02:52:00,998 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.42 vs. limit=6.0 2023-11-26 02:52:02,493 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 477850 2023-11-26 02:52:04,113 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.81 vs. limit=22.5 2023-11-26 02:52:07,998 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3185653.3333333335, ans=0.0 2023-11-26 02:52:31,774 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3185786.6666666665, ans=0.2 2023-11-26 02:52:32,779 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3185786.6666666665, ans=0.125 2023-11-26 02:52:36,334 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 8950, loss[loss=0.08526, simple_loss=0.1171, pruned_loss=0.02136, audio_tagging_loss=0.005335, over 14485.00 frames. ], tot_loss[loss=0.0671, simple_loss=0.09105, pruned_loss=0.01276, audio_tagging_loss=0.008812, over 3050947.84 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 32.0 2023-11-26 02:52:41,718 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3185853.3333333335, ans=0.1 2023-11-26 02:52:51,315 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3185920.0, ans=0.1 2023-11-26 02:52:51,839 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.75 vs. limit=15.0 2023-11-26 02:52:57,353 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 477900 2023-11-26 02:52:59,623 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3185986.6666666665, ans=0.125 2023-11-26 02:53:15,655 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.44 vs. limit=15.0 2023-11-26 02:53:23,900 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.727e+01 8.837e+01 9.264e+01 1.015e+02 1.330e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-26 02:53:29,501 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 02:53:31,289 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 9000, loss[loss=0.07409, simple_loss=0.1077, pruned_loss=0.01302, audio_tagging_loss=0.007222, over 15698.00 frames. ], tot_loss[loss=0.06716, simple_loss=0.09102, pruned_loss=0.01288, audio_tagging_loss=0.008763, over 3052926.81 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:53:31,290 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-26 02:53:44,431 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([5.1547, 4.4344, 4.4643, 4.3868], device='cuda:3') 2023-11-26 02:54:03,242 INFO [train_asr.py:1267] (3/4) Epoch 40, validation: loss=0.05846, simple_loss=0.05059, pruned_loss=0.005121, audio_tagging_loss=0.02804, over 4681554.00 frames. 2023-11-26 02:54:03,243 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-26 02:54:25,317 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 477950 2023-11-26 02:54:29,797 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 02:54:56,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3186453.3333333335, ans=0.125 2023-11-26 02:54:59,519 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 9050, loss[loss=0.07874, simple_loss=0.108, pruned_loss=0.01878, audio_tagging_loss=0.005967, over 14458.00 frames. ], tot_loss[loss=0.06773, simple_loss=0.09189, pruned_loss=0.01305, audio_tagging_loss=0.008732, over 3056998.06 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:55:01,803 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3186520.0, ans=0.125 2023-11-26 02:55:04,044 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3186520.0, ans=0.2 2023-11-26 02:55:12,462 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3186586.6666666665, ans=0.2 2023-11-26 02:55:20,702 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 478000 2023-11-26 02:55:25,963 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3186653.3333333335, ans=0.07 2023-11-26 02:55:45,640 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3186786.6666666665, ans=0.0 2023-11-26 02:55:48,459 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.924e+01 8.738e+01 9.239e+01 1.027e+02 1.338e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-26 02:55:54,936 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 9100, loss[loss=0.07575, simple_loss=0.1054, pruned_loss=0.01572, audio_tagging_loss=0.007341, over 15757.00 frames. ], tot_loss[loss=0.06772, simple_loss=0.09206, pruned_loss=0.01305, audio_tagging_loss=0.008647, over 3058432.97 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-26 02:56:07,494 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 02:56:16,832 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.96 vs. limit=6.0 2023-11-26 02:56:17,379 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 478050 2023-11-26 02:56:24,844 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3186986.6666666665, ans=0.0 2023-11-26 02:56:34,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3187053.3333333335, ans=0.0 2023-11-26 02:56:36,629 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3187053.3333333335, ans=0.125 2023-11-26 02:56:37,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3187053.3333333335, ans=0.125 2023-11-26 02:56:41,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3187120.0, ans=0.0 2023-11-26 02:56:50,661 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 9150, loss[loss=0.05085, simple_loss=0.06743, pruned_loss=0.006597, audio_tagging_loss=0.01054, over 14289.00 frames. ], tot_loss[loss=0.06706, simple_loss=0.09087, pruned_loss=0.0129, audio_tagging_loss=0.008726, over 3049982.72 frames. ], batch size: 55, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 02:57:10,784 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.75 vs. limit=15.0 2023-11-26 02:57:13,557 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 478100 2023-11-26 02:57:39,820 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.230e+01 8.630e+01 9.008e+01 9.650e+01 1.509e+02, threshold=1.802e+02, percent-clipped=0.0 2023-11-26 02:57:41,113 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3187453.3333333335, ans=0.125 2023-11-26 02:57:46,738 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 9200, loss[loss=0.05787, simple_loss=0.07567, pruned_loss=0.009521, audio_tagging_loss=0.01051, over 16095.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.09075, pruned_loss=0.01273, audio_tagging_loss=0.008708, over 3046647.07 frames. ], batch size: 60, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 02:57:48,020 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3187520.0, ans=0.1 2023-11-26 02:58:07,790 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3187653.3333333335, ans=0.0 2023-11-26 02:58:08,657 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 478150 2023-11-26 02:58:20,118 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3187720.0, ans=0.0 2023-11-26 02:58:42,626 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 9250, loss[loss=0.08136, simple_loss=0.1102, pruned_loss=0.01569, audio_tagging_loss=0.01057, over 15415.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09017, pruned_loss=0.01278, audio_tagging_loss=0.008683, over 3049668.69 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 02:58:48,382 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.00 vs. limit=22.5 2023-11-26 02:58:55,619 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3187920.0, ans=0.1 2023-11-26 02:58:58,764 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.27 vs. limit=15.0 2023-11-26 02:59:04,969 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 478200 2023-11-26 02:59:07,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3187986.6666666665, ans=0.2 2023-11-26 02:59:27,991 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.88 vs. limit=15.0 2023-11-26 02:59:31,492 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.730e+01 8.687e+01 9.467e+01 1.008e+02 1.287e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 02:59:38,524 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 9300, loss[loss=0.09693, simple_loss=0.1326, pruned_loss=0.02488, audio_tagging_loss=0.005769, over 15618.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.09018, pruned_loss=0.01269, audio_tagging_loss=0.008648, over 3046403.03 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 02:59:38,827 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3188186.6666666665, ans=0.5 2023-11-26 02:59:53,510 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 03:00:00,892 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 478250 2023-11-26 03:00:10,093 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3188320.0, ans=0.0 2023-11-26 03:00:19,593 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3188386.6666666665, ans=0.125 2023-11-26 03:00:24,529 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3188453.3333333335, ans=0.125 2023-11-26 03:00:27,937 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.45 vs. limit=22.5 2023-11-26 03:00:34,955 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 9350, loss[loss=0.06857, simple_loss=0.08988, pruned_loss=0.0136, audio_tagging_loss=0.01003, over 14654.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.09038, pruned_loss=0.01276, audio_tagging_loss=0.008701, over 3050926.95 frames. ], batch size: 55, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:00:35,094 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3188520.0, ans=0.2 2023-11-26 03:00:40,589 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3188520.0, ans=0.125 2023-11-26 03:00:52,635 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3188586.6666666665, ans=0.125 2023-11-26 03:00:56,914 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 478300 2023-11-26 03:01:00,154 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3188653.3333333335, ans=0.125 2023-11-26 03:01:10,554 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3188720.0, ans=0.125 2023-11-26 03:01:12,087 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3188720.0, ans=0.0 2023-11-26 03:01:19,145 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.29 vs. limit=15.0 2023-11-26 03:01:19,975 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3188786.6666666665, ans=0.125 2023-11-26 03:01:22,466 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.29 vs. limit=15.0 2023-11-26 03:01:23,926 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.316e+01 8.528e+01 9.262e+01 1.004e+02 1.387e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-26 03:01:30,310 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 9400, loss[loss=0.07208, simple_loss=0.106, pruned_loss=0.01078, audio_tagging_loss=0.008283, over 16096.00 frames. ], tot_loss[loss=0.06673, simple_loss=0.09021, pruned_loss=0.01277, audio_tagging_loss=0.008854, over 3051981.98 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:01:52,674 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 478350 2023-11-26 03:02:16,716 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3189120.0, ans=0.1 2023-11-26 03:02:24,992 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 03:02:26,071 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 9450, loss[loss=0.04799, simple_loss=0.05151, pruned_loss=0.007415, audio_tagging_loss=0.01483, over 15062.00 frames. ], tot_loss[loss=0.06677, simple_loss=0.0901, pruned_loss=0.01276, audio_tagging_loss=0.008953, over 3049400.09 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:02:27,313 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3189186.6666666665, ans=0.0 2023-11-26 03:02:46,054 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3189253.3333333335, ans=0.125 2023-11-26 03:02:47,041 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3189253.3333333335, ans=0.0 2023-11-26 03:02:49,012 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 478400 2023-11-26 03:02:52,664 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3189320.0, ans=0.09899494936611666 2023-11-26 03:02:56,951 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3189320.0, ans=0.0 2023-11-26 03:03:06,959 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3189386.6666666665, ans=0.125 2023-11-26 03:03:13,938 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3189453.3333333335, ans=0.0 2023-11-26 03:03:16,854 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.310e+01 8.757e+01 9.306e+01 1.009e+02 1.204e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-26 03:03:22,608 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 9500, loss[loss=0.08456, simple_loss=0.1124, pruned_loss=0.01919, audio_tagging_loss=0.009163, over 15183.00 frames. ], tot_loss[loss=0.06695, simple_loss=0.09006, pruned_loss=0.01288, audio_tagging_loss=0.009034, over 3038080.77 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:03:27,678 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 03:03:32,235 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.19 vs. limit=10.0 2023-11-26 03:03:35,122 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3189586.6666666665, ans=0.1 2023-11-26 03:03:41,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3189586.6666666665, ans=0.0 2023-11-26 03:03:45,024 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 478450 2023-11-26 03:03:45,192 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3189653.3333333335, ans=0.125 2023-11-26 03:03:46,315 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3189653.3333333335, ans=0.0 2023-11-26 03:03:56,403 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3189720.0, ans=0.0 2023-11-26 03:04:18,263 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 9550, loss[loss=0.06379, simple_loss=0.08829, pruned_loss=0.01183, audio_tagging_loss=0.007813, over 13805.00 frames. ], tot_loss[loss=0.0669, simple_loss=0.09007, pruned_loss=0.01279, audio_tagging_loss=0.009082, over 3032834.39 frames. ], batch size: 53, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:04:22,870 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3189853.3333333335, ans=0.125 2023-11-26 03:04:34,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3189920.0, ans=0.1 2023-11-26 03:04:40,875 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 478500 2023-11-26 03:04:48,784 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3189986.6666666665, ans=0.0 2023-11-26 03:05:01,109 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3190053.3333333335, ans=0.2 2023-11-26 03:05:08,238 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.252e+01 8.706e+01 9.165e+01 9.979e+01 1.359e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-26 03:05:08,769 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.47 vs. limit=15.0 2023-11-26 03:05:14,052 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 9600, loss[loss=0.06041, simple_loss=0.08868, pruned_loss=0.009209, audio_tagging_loss=0.006858, over 15423.00 frames. ], tot_loss[loss=0.06723, simple_loss=0.09069, pruned_loss=0.01276, audio_tagging_loss=0.009123, over 3038861.52 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:05:28,510 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.59 vs. limit=22.5 2023-11-26 03:05:28,580 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.12 vs. limit=22.5 2023-11-26 03:05:37,114 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 478550 2023-11-26 03:05:39,979 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.64 vs. limit=22.5 2023-11-26 03:05:49,323 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.03 vs. limit=15.0 2023-11-26 03:06:03,735 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3190453.3333333335, ans=0.125 2023-11-26 03:06:09,346 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3190520.0, ans=0.125 2023-11-26 03:06:09,893 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.whiten.whitening_limit, batch_count=3190520.0, ans=12.0 2023-11-26 03:06:10,231 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 9650, loss[loss=0.0599, simple_loss=0.07843, pruned_loss=0.01071, audio_tagging_loss=0.009974, over 14515.00 frames. ], tot_loss[loss=0.06718, simple_loss=0.0906, pruned_loss=0.01274, audio_tagging_loss=0.009136, over 3038489.36 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:06:16,112 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3190520.0, ans=0.0 2023-11-26 03:06:31,799 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 478600 2023-11-26 03:06:32,432 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.48 vs. limit=15.0 2023-11-26 03:06:56,720 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3190786.6666666665, ans=0.125 2023-11-26 03:06:59,841 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3190786.6666666665, ans=0.125 2023-11-26 03:07:00,683 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.911e+01 8.580e+01 9.247e+01 1.001e+02 1.210e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-26 03:07:06,034 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 9700, loss[loss=0.06774, simple_loss=0.1013, pruned_loss=0.00865, audio_tagging_loss=0.008463, over 15755.00 frames. ], tot_loss[loss=0.06708, simple_loss=0.09073, pruned_loss=0.01277, audio_tagging_loss=0.008947, over 3041193.05 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:07:14,789 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3190853.3333333335, ans=0.125 2023-11-26 03:07:17,833 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3190920.0, ans=0.0 2023-11-26 03:07:18,907 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3190920.0, ans=0.125 2023-11-26 03:07:28,291 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 478650 2023-11-26 03:07:28,427 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3190986.6666666665, ans=0.125 2023-11-26 03:07:47,348 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3191053.3333333335, ans=0.125 2023-11-26 03:07:48,424 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3191053.3333333335, ans=0.125 2023-11-26 03:07:57,112 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 03:08:01,014 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 9750, loss[loss=0.07045, simple_loss=0.09156, pruned_loss=0.01553, audio_tagging_loss=0.009141, over 15032.00 frames. ], tot_loss[loss=0.06677, simple_loss=0.09053, pruned_loss=0.01263, audio_tagging_loss=0.008874, over 3039100.41 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:08:06,046 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3191186.6666666665, ans=0.125 2023-11-26 03:08:15,194 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3191253.3333333335, ans=0.125 2023-11-26 03:08:23,710 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3191320.0, ans=0.125 2023-11-26 03:08:24,556 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 478700 2023-11-26 03:08:24,658 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3191320.0, ans=0.0 2023-11-26 03:08:35,264 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3191386.6666666665, ans=0.1 2023-11-26 03:08:42,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3191386.6666666665, ans=0.2 2023-11-26 03:08:43,877 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3191386.6666666665, ans=0.1 2023-11-26 03:08:48,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3191453.3333333335, ans=0.125 2023-11-26 03:08:51,220 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3191453.3333333335, ans=0.125 2023-11-26 03:08:52,040 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.291e+01 8.589e+01 9.136e+01 9.841e+01 1.317e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-26 03:08:57,335 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 9800, loss[loss=0.07631, simple_loss=0.1076, pruned_loss=0.01352, audio_tagging_loss=0.008982, over 14518.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.08906, pruned_loss=0.0124, audio_tagging_loss=0.008941, over 3036919.67 frames. ], batch size: 55, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:08:57,527 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3191520.0, ans=0.125 2023-11-26 03:08:59,210 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 03:09:03,353 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3191520.0, ans=0.1 2023-11-26 03:09:19,732 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 478750 2023-11-26 03:09:37,123 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.93 vs. limit=22.5 2023-11-26 03:09:48,428 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 03:09:48,817 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.65 vs. limit=10.0 2023-11-26 03:09:53,712 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 9850, loss[loss=0.07577, simple_loss=0.1013, pruned_loss=0.01726, audio_tagging_loss=0.007841, over 15254.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.09045, pruned_loss=0.01259, audio_tagging_loss=0.008906, over 3046665.30 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:10:01,463 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3191853.3333333335, ans=0.125 2023-11-26 03:10:08,716 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3191920.0, ans=0.125 2023-11-26 03:10:14,957 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 478800 2023-11-26 03:10:37,447 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.17 vs. limit=6.0 2023-11-26 03:10:42,995 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.03 vs. limit=22.5 2023-11-26 03:10:44,533 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.907e+01 8.656e+01 9.156e+01 9.841e+01 1.408e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-26 03:10:48,796 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 9900, loss[loss=0.06058, simple_loss=0.08564, pruned_loss=0.01013, audio_tagging_loss=0.00763, over 15540.00 frames. ], tot_loss[loss=0.06679, simple_loss=0.09083, pruned_loss=0.01257, audio_tagging_loss=0.008808, over 3049391.53 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:11:00,448 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.77 vs. limit=15.0 2023-11-26 03:11:03,610 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.63 vs. limit=12.0 2023-11-26 03:11:11,609 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 478850 2023-11-26 03:11:39,322 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3192453.3333333335, ans=10.0 2023-11-26 03:11:44,262 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 9950, loss[loss=0.06902, simple_loss=0.0929, pruned_loss=0.01445, audio_tagging_loss=0.008129, over 16144.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.08981, pruned_loss=0.01225, audio_tagging_loss=0.008771, over 3048187.65 frames. ], batch size: 61, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:11:51,340 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3192520.0, ans=0.125 2023-11-26 03:11:54,133 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3192520.0, ans=0.1 2023-11-26 03:11:56,247 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3192586.6666666665, ans=0.0 2023-11-26 03:12:06,720 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 478900 2023-11-26 03:12:18,746 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3192720.0, ans=0.09899494936611666 2023-11-26 03:12:30,961 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3192786.6666666665, ans=0.2 2023-11-26 03:12:34,422 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.04 vs. limit=15.0 2023-11-26 03:12:36,545 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.281e+01 8.464e+01 9.280e+01 9.880e+01 1.249e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-26 03:12:40,810 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 10000, loss[loss=0.05958, simple_loss=0.07815, pruned_loss=0.01184, audio_tagging_loss=0.008669, over 16725.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08933, pruned_loss=0.0122, audio_tagging_loss=0.00872, over 3045513.29 frames. ], batch size: 64, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:12:54,736 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3192920.0, ans=0.125 2023-11-26 03:13:00,174 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3192920.0, ans=0.2 2023-11-26 03:13:02,311 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 478950 2023-11-26 03:13:12,044 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3192986.6666666665, ans=0.125 2023-11-26 03:13:12,181 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 03:13:15,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3193053.3333333335, ans=0.1 2023-11-26 03:13:16,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3193053.3333333335, ans=0.0 2023-11-26 03:13:16,899 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3193053.3333333335, ans=0.125 2023-11-26 03:13:20,640 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3193053.3333333335, ans=0.125 2023-11-26 03:13:36,315 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 10050, loss[loss=0.06369, simple_loss=0.0846, pruned_loss=0.011, audio_tagging_loss=0.01039, over 15381.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.08969, pruned_loss=0.01218, audio_tagging_loss=0.008749, over 3048172.38 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:13:37,532 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3193186.6666666665, ans=0.04949747468305833 2023-11-26 03:13:51,856 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3193253.3333333335, ans=0.1 2023-11-26 03:13:58,684 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 479000 2023-11-26 03:14:26,600 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.210e+01 8.389e+01 9.159e+01 9.802e+01 1.976e+02, threshold=1.832e+02, percent-clipped=1.0 2023-11-26 03:14:31,505 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 10100, loss[loss=0.07092, simple_loss=0.0921, pruned_loss=0.01651, audio_tagging_loss=0.008368, over 13975.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.09076, pruned_loss=0.0124, audio_tagging_loss=0.008798, over 3047291.83 frames. ], batch size: 54, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:14:34,110 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.69 vs. limit=12.0 2023-11-26 03:14:36,247 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.75 vs. limit=12.0 2023-11-26 03:14:53,764 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 479050 2023-11-26 03:15:01,214 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3193653.3333333335, ans=0.0 2023-11-26 03:15:06,535 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3193720.0, ans=0.05 2023-11-26 03:15:16,537 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 03:15:17,291 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.56 vs. limit=10.0 2023-11-26 03:15:25,695 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3193786.6666666665, ans=0.0 2023-11-26 03:15:26,717 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3193853.3333333335, ans=0.125 2023-11-26 03:15:27,574 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 10150, loss[loss=0.0604, simple_loss=0.08562, pruned_loss=0.01018, audio_tagging_loss=0.007407, over 15516.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.09029, pruned_loss=0.01239, audio_tagging_loss=0.008829, over 3045949.73 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:15:36,354 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3193853.3333333335, ans=0.125 2023-11-26 03:15:49,040 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 479100 2023-11-26 03:15:53,168 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 03:16:10,310 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3194053.3333333335, ans=0.125 2023-11-26 03:16:12,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3194120.0, ans=0.125 2023-11-26 03:16:19,621 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.429e+01 8.867e+01 9.466e+01 1.017e+02 1.236e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 03:16:22,874 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 10200, loss[loss=0.05535, simple_loss=0.07394, pruned_loss=0.00913, audio_tagging_loss=0.009248, over 15989.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08887, pruned_loss=0.01221, audio_tagging_loss=0.008986, over 3047347.75 frames. ], batch size: 60, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:16:45,238 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 03:16:45,300 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 479150 2023-11-26 03:17:05,517 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.55 vs. limit=15.0 2023-11-26 03:17:17,602 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 10250, loss[loss=0.06089, simple_loss=0.0772, pruned_loss=0.01118, audio_tagging_loss=0.01112, over 14858.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08837, pruned_loss=0.01219, audio_tagging_loss=0.009064, over 3043242.91 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:17:20,499 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3194520.0, ans=0.125 2023-11-26 03:17:25,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3194520.0, ans=0.0 2023-11-26 03:17:41,079 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 479200 2023-11-26 03:17:44,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3194653.3333333335, ans=0.0 2023-11-26 03:17:45,712 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3194653.3333333335, ans=0.125 2023-11-26 03:17:50,122 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3194653.3333333335, ans=0.0 2023-11-26 03:18:10,889 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.032e+01 8.528e+01 9.326e+01 1.007e+02 1.324e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-26 03:18:14,624 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 10300, loss[loss=0.07371, simple_loss=0.1019, pruned_loss=0.01388, audio_tagging_loss=0.008863, over 14689.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08863, pruned_loss=0.01233, audio_tagging_loss=0.009058, over 3046417.02 frames. ], batch size: 54, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:18:19,626 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3194853.3333333335, ans=0.0 2023-11-26 03:18:21,557 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3194853.3333333335, ans=0.125 2023-11-26 03:18:25,915 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3194920.0, ans=0.2 2023-11-26 03:18:33,341 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3194920.0, ans=0.09899494936611666 2023-11-26 03:18:36,417 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 479250 2023-11-26 03:19:01,147 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3195120.0, ans=0.0 2023-11-26 03:19:10,598 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 10350, loss[loss=0.05713, simple_loss=0.07622, pruned_loss=0.007647, audio_tagging_loss=0.01137, over 16013.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.08894, pruned_loss=0.01235, audio_tagging_loss=0.009047, over 3047198.01 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:19:10,899 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3195186.6666666665, ans=0.2 2023-11-26 03:19:32,377 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 479300 2023-11-26 03:19:40,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3195320.0, ans=0.0 2023-11-26 03:19:43,254 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 03:19:44,306 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3195386.6666666665, ans=0.125 2023-11-26 03:19:59,028 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.95 vs. limit=15.0 2023-11-26 03:20:03,836 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.518e+01 8.860e+01 9.371e+01 1.044e+02 1.411e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-26 03:20:06,037 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 10400, loss[loss=0.06608, simple_loss=0.09944, pruned_loss=0.009738, audio_tagging_loss=0.006624, over 15754.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.08911, pruned_loss=0.01244, audio_tagging_loss=0.009147, over 3044206.67 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:20:13,120 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=3195520.0, ans=0.05 2023-11-26 03:20:24,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3195586.6666666665, ans=0.0 2023-11-26 03:20:29,527 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 479350 2023-11-26 03:20:47,836 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3195720.0, ans=0.1 2023-11-26 03:20:59,149 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3195786.6666666665, ans=0.125 2023-11-26 03:21:02,552 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 10450, loss[loss=0.065, simple_loss=0.08919, pruned_loss=0.01235, audio_tagging_loss=0.008062, over 14513.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.08912, pruned_loss=0.01241, audio_tagging_loss=0.009181, over 3043562.53 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:21:10,812 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 03:21:25,156 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 479400 2023-11-26 03:21:56,746 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.499e+01 8.740e+01 9.314e+01 1.011e+02 1.304e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-26 03:21:59,375 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 10500, loss[loss=0.06938, simple_loss=0.09316, pruned_loss=0.01452, audio_tagging_loss=0.008274, over 16175.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.08986, pruned_loss=0.01256, audio_tagging_loss=0.008906, over 3045864.45 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:22:21,257 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 479450 2023-11-26 03:22:26,001 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.99 vs. limit=15.0 2023-11-26 03:22:28,842 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3196320.0, ans=0.125 2023-11-26 03:22:37,239 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3196386.6666666665, ans=0.125 2023-11-26 03:22:37,732 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.61 vs. limit=22.5 2023-11-26 03:22:38,860 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.20 vs. limit=10.0 2023-11-26 03:22:39,405 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3196386.6666666665, ans=0.1 2023-11-26 03:22:41,594 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3196386.6666666665, ans=0.125 2023-11-26 03:22:54,784 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 10550, loss[loss=0.06395, simple_loss=0.08439, pruned_loss=0.01164, audio_tagging_loss=0.01012, over 15870.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.08952, pruned_loss=0.01254, audio_tagging_loss=0.008905, over 3037235.36 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:22:54,997 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3196520.0, ans=0.1 2023-11-26 03:22:56,031 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3196520.0, ans=0.0 2023-11-26 03:22:57,204 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3196520.0, ans=0.125 2023-11-26 03:23:03,359 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.20 vs. limit=6.0 2023-11-26 03:23:17,487 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.88 vs. limit=10.0 2023-11-26 03:23:17,873 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 479500 2023-11-26 03:23:24,328 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3196653.3333333335, ans=0.07 2023-11-26 03:23:31,410 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.24 vs. limit=15.0 2023-11-26 03:23:44,415 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3196786.6666666665, ans=0.125 2023-11-26 03:23:46,721 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3196786.6666666665, ans=0.125 2023-11-26 03:23:48,108 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.45 vs. limit=22.5 2023-11-26 03:23:48,528 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.377e+01 8.656e+01 9.441e+01 1.040e+02 1.486e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-26 03:23:48,785 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3196786.6666666665, ans=0.125 2023-11-26 03:23:49,824 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3196853.3333333335, ans=0.125 2023-11-26 03:23:50,639 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 10600, loss[loss=0.0632, simple_loss=0.09523, pruned_loss=0.009661, audio_tagging_loss=0.005925, over 16147.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.08938, pruned_loss=0.01241, audio_tagging_loss=0.008858, over 3038707.85 frames. ], batch size: 61, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:23:54,963 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3196853.3333333335, ans=0.1 2023-11-26 03:24:02,465 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3196920.0, ans=0.0 2023-11-26 03:24:13,339 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 479550 2023-11-26 03:24:25,566 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 03:24:31,900 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3197053.3333333335, ans=0.025 2023-11-26 03:24:40,316 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3197120.0, ans=0.125 2023-11-26 03:24:46,464 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 10650, loss[loss=0.0819, simple_loss=0.1181, pruned_loss=0.01467, audio_tagging_loss=0.008176, over 14693.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08878, pruned_loss=0.01224, audio_tagging_loss=0.008857, over 3036467.93 frames. ], batch size: 54, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:24:48,040 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.59 vs. limit=22.5 2023-11-26 03:25:08,810 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 479600 2023-11-26 03:25:31,595 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3197453.3333333335, ans=0.0 2023-11-26 03:25:41,500 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.674e+01 8.472e+01 9.133e+01 9.975e+01 1.277e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-26 03:25:42,603 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 10700, loss[loss=0.06089, simple_loss=0.08739, pruned_loss=0.01144, audio_tagging_loss=0.005759, over 14384.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08919, pruned_loss=0.01238, audio_tagging_loss=0.008888, over 3037259.67 frames. ], batch size: 54, lr: 1.68e-03, grad_scale: 8.0 2023-11-26 03:25:42,853 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 03:26:05,521 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 479650 2023-11-26 03:26:05,668 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3197653.3333333335, ans=0.0 2023-11-26 03:26:09,141 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.20 vs. limit=15.0 2023-11-26 03:26:16,676 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.41 vs. limit=15.0 2023-11-26 03:26:38,383 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 10750, loss[loss=0.07578, simple_loss=0.1019, pruned_loss=0.01697, audio_tagging_loss=0.007853, over 15601.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08905, pruned_loss=0.01231, audio_tagging_loss=0.008767, over 3045910.10 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 8.0 2023-11-26 03:26:40,130 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.68 vs. limit=15.0 2023-11-26 03:26:43,314 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3197853.3333333335, ans=0.0 2023-11-26 03:26:52,507 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3197920.0, ans=0.05 2023-11-26 03:27:00,718 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 479700 2023-11-26 03:27:07,732 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3197986.6666666665, ans=0.125 2023-11-26 03:27:14,641 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3198053.3333333335, ans=0.125 2023-11-26 03:27:30,622 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3198120.0, ans=0.125 2023-11-26 03:27:31,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3198120.0, ans=0.125 2023-11-26 03:27:33,595 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.243e+01 8.525e+01 9.180e+01 9.934e+01 1.273e+02, threshold=1.836e+02, percent-clipped=0.0 2023-11-26 03:27:34,667 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 10800, loss[loss=0.05861, simple_loss=0.07611, pruned_loss=0.0111, audio_tagging_loss=0.009453, over 14266.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08889, pruned_loss=0.0124, audio_tagging_loss=0.008876, over 3040967.43 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:27:52,433 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.80 vs. limit=15.0 2023-11-26 03:27:56,921 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 479750 2023-11-26 03:28:17,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3198386.6666666665, ans=0.0 2023-11-26 03:28:29,926 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 10850, loss[loss=0.0684, simple_loss=0.08169, pruned_loss=0.01659, audio_tagging_loss=0.01097, over 14860.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08881, pruned_loss=0.01241, audio_tagging_loss=0.008904, over 3039633.51 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:28:32,771 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3198520.0, ans=0.0 2023-11-26 03:28:48,313 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3198586.6666666665, ans=0.1 2023-11-26 03:28:53,343 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 479800 2023-11-26 03:29:10,660 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3198720.0, ans=0.125 2023-11-26 03:29:11,792 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3198720.0, ans=0.0 2023-11-26 03:29:20,712 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3198786.6666666665, ans=0.125 2023-11-26 03:29:23,493 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3198786.6666666665, ans=0.125 2023-11-26 03:29:24,319 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 03:29:25,266 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.001e+01 8.656e+01 9.276e+01 9.852e+01 1.367e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-26 03:29:26,355 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 10900, loss[loss=0.05713, simple_loss=0.08176, pruned_loss=0.007992, audio_tagging_loss=0.008264, over 15567.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.08911, pruned_loss=0.01252, audio_tagging_loss=0.008754, over 3045067.86 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:29:31,163 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.58 vs. limit=22.5 2023-11-26 03:29:48,680 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 479850 2023-11-26 03:29:58,379 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3199053.3333333335, ans=0.025 2023-11-26 03:30:05,939 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3199053.3333333335, ans=0.125 2023-11-26 03:30:14,783 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3199120.0, ans=0.0 2023-11-26 03:30:22,727 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 10950, loss[loss=0.07628, simple_loss=0.1076, pruned_loss=0.01568, audio_tagging_loss=0.00679, over 15306.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.0895, pruned_loss=0.01256, audio_tagging_loss=0.008808, over 3042273.23 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:30:44,443 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 479900 2023-11-26 03:30:52,499 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3199320.0, ans=0.125 2023-11-26 03:30:53,397 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3199320.0, ans=0.015 2023-11-26 03:31:04,260 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3199386.6666666665, ans=0.125 2023-11-26 03:31:13,971 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3199453.3333333335, ans=0.125 2023-11-26 03:31:16,797 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.102e+01 8.536e+01 9.275e+01 9.846e+01 1.244e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-26 03:31:17,920 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 11000, loss[loss=0.06079, simple_loss=0.08073, pruned_loss=0.009515, audio_tagging_loss=0.01091, over 14180.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.08896, pruned_loss=0.01247, audio_tagging_loss=0.009021, over 3041824.57 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:31:25,887 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.28 vs. limit=22.5 2023-11-26 03:31:27,933 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 03:31:28,381 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.33 vs. limit=15.0 2023-11-26 03:31:32,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3199586.6666666665, ans=0.1 2023-11-26 03:31:40,779 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 479950 2023-11-26 03:31:47,740 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3199653.3333333335, ans=0.0 2023-11-26 03:31:48,841 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3199653.3333333335, ans=0.0 2023-11-26 03:31:52,104 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=3199720.0, ans=0.5 2023-11-26 03:31:53,093 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3199720.0, ans=0.125 2023-11-26 03:32:14,138 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 11050, loss[loss=0.0602, simple_loss=0.07828, pruned_loss=0.01042, audio_tagging_loss=0.01064, over 16473.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.08964, pruned_loss=0.01265, audio_tagging_loss=0.009001, over 3045978.25 frames. ], batch size: 63, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:32:30,239 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3199920.0, ans=0.125 2023-11-26 03:32:32,546 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3199920.0, ans=0.125 2023-11-26 03:32:36,590 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 480000 2023-11-26 03:32:41,071 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3199986.6666666665, ans=0.125 2023-11-26 03:33:11,099 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.470e+01 8.630e+01 9.436e+01 1.031e+02 1.953e+02, threshold=1.887e+02, percent-clipped=1.0 2023-11-26 03:33:12,216 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 11100, loss[loss=0.0611, simple_loss=0.08213, pruned_loss=0.01099, audio_tagging_loss=0.009048, over 14573.00 frames. ], tot_loss[loss=0.06679, simple_loss=0.09033, pruned_loss=0.01259, audio_tagging_loss=0.009045, over 3054683.79 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:33:21,901 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3200253.3333333335, ans=0.125 2023-11-26 03:33:33,449 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 480050 2023-11-26 03:33:39,908 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3200320.0, ans=0.0 2023-11-26 03:33:44,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3200386.6666666665, ans=0.0 2023-11-26 03:33:51,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3200386.6666666665, ans=0.125 2023-11-26 03:34:03,221 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3200453.3333333335, ans=0.0 2023-11-26 03:34:07,198 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 11150, loss[loss=0.06341, simple_loss=0.0909, pruned_loss=0.007728, audio_tagging_loss=0.01023, over 14677.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.08993, pruned_loss=0.0126, audio_tagging_loss=0.009165, over 3055524.16 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:34:14,769 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3200520.0, ans=0.1 2023-11-26 03:34:24,708 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.04 vs. limit=22.5 2023-11-26 03:34:29,459 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 480100 2023-11-26 03:34:35,195 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.82 vs. limit=10.0 2023-11-26 03:34:38,005 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3200653.3333333335, ans=0.125 2023-11-26 03:34:41,210 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3200720.0, ans=0.95 2023-11-26 03:34:43,233 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3200720.0, ans=0.1 2023-11-26 03:35:00,980 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.121e+01 8.918e+01 9.641e+01 1.031e+02 1.753e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-26 03:35:02,651 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 11200, loss[loss=0.07838, simple_loss=0.1104, pruned_loss=0.01483, audio_tagging_loss=0.008377, over 17382.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.08954, pruned_loss=0.0124, audio_tagging_loss=0.009214, over 3050285.04 frames. ], batch size: 61, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:35:03,959 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3200853.3333333335, ans=0.0 2023-11-26 03:35:08,798 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3200853.3333333335, ans=0.1 2023-11-26 03:35:13,033 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3200920.0, ans=0.125 2023-11-26 03:35:21,421 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3200920.0, ans=0.125 2023-11-26 03:35:25,407 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 480150 2023-11-26 03:35:29,721 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3200986.6666666665, ans=0.035 2023-11-26 03:35:52,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3201120.0, ans=0.125 2023-11-26 03:35:59,477 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 11250, loss[loss=0.07961, simple_loss=0.1077, pruned_loss=0.01847, audio_tagging_loss=0.007271, over 15927.00 frames. ], tot_loss[loss=0.06673, simple_loss=0.0901, pruned_loss=0.01257, audio_tagging_loss=0.009115, over 3053531.27 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:36:08,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3201186.6666666665, ans=0.125 2023-11-26 03:36:14,700 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.96 vs. limit=15.0 2023-11-26 03:36:20,655 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 480200 2023-11-26 03:36:32,170 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3201386.6666666665, ans=0.0 2023-11-26 03:36:53,606 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.596e+01 8.665e+01 9.306e+01 1.002e+02 1.136e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-26 03:36:54,681 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 11300, loss[loss=0.07496, simple_loss=0.1121, pruned_loss=0.01425, audio_tagging_loss=0.004664, over 16457.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09002, pruned_loss=0.01252, audio_tagging_loss=0.009016, over 3048013.70 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:36:55,990 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3201520.0, ans=0.0 2023-11-26 03:36:59,089 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3201520.0, ans=0.2 2023-11-26 03:37:07,622 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3201586.6666666665, ans=0.125 2023-11-26 03:37:10,292 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 03:37:16,552 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 480250 2023-11-26 03:37:30,064 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3201720.0, ans=0.0 2023-11-26 03:37:35,305 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3201720.0, ans=0.0 2023-11-26 03:37:40,571 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3201786.6666666665, ans=0.0 2023-11-26 03:37:47,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3201786.6666666665, ans=0.1 2023-11-26 03:37:49,990 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 11350, loss[loss=0.08609, simple_loss=0.1248, pruned_loss=0.01802, audio_tagging_loss=0.005696, over 15637.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.0897, pruned_loss=0.01246, audio_tagging_loss=0.008885, over 3048921.62 frames. ], batch size: 55, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:37:52,982 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3201853.3333333335, ans=0.2 2023-11-26 03:37:56,022 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3201853.3333333335, ans=0.0 2023-11-26 03:37:57,313 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3201853.3333333335, ans=0.125 2023-11-26 03:38:10,048 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3201920.0, ans=0.125 2023-11-26 03:38:13,017 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 480300 2023-11-26 03:38:13,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=3201986.6666666665, ans=0.5 2023-11-26 03:38:35,424 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.48 vs. limit=6.0 2023-11-26 03:38:44,350 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.296e+01 8.638e+01 9.308e+01 1.022e+02 1.333e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-26 03:38:45,435 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 11400, loss[loss=0.06706, simple_loss=0.09076, pruned_loss=0.01472, audio_tagging_loss=0.006961, over 16014.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.0892, pruned_loss=0.01227, audio_tagging_loss=0.008762, over 3048387.42 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:39:07,939 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 480350 2023-11-26 03:39:08,391 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.31 vs. limit=15.0 2023-11-26 03:39:24,511 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3202386.6666666665, ans=0.125 2023-11-26 03:39:38,760 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3202453.3333333335, ans=0.125 2023-11-26 03:39:41,783 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 11450, loss[loss=0.06229, simple_loss=0.08445, pruned_loss=0.008941, audio_tagging_loss=0.01112, over 14981.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08952, pruned_loss=0.01226, audio_tagging_loss=0.008714, over 3053527.11 frames. ], batch size: 54, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:39:46,265 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3202520.0, ans=0.2 2023-11-26 03:39:52,676 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3202586.6666666665, ans=0.125 2023-11-26 03:39:52,862 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.95 vs. limit=22.5 2023-11-26 03:39:57,069 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.12 vs. limit=22.5 2023-11-26 03:40:03,719 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 480400 2023-11-26 03:40:16,493 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3202720.0, ans=0.125 2023-11-26 03:40:19,636 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3202720.0, ans=0.125 2023-11-26 03:40:32,347 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3202786.6666666665, ans=0.125 2023-11-26 03:40:34,637 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.31 vs. limit=15.0 2023-11-26 03:40:37,359 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.451e+01 8.830e+01 9.338e+01 1.004e+02 1.564e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-26 03:40:37,386 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 11500, loss[loss=0.05317, simple_loss=0.06549, pruned_loss=0.01028, audio_tagging_loss=0.01014, over 15464.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.08859, pruned_loss=0.01212, audio_tagging_loss=0.008819, over 3050276.52 frames. ], batch size: 60, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:40:38,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3202853.3333333335, ans=0.0 2023-11-26 03:41:00,697 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 480450 2023-11-26 03:41:08,377 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3202986.6666666665, ans=0.125 2023-11-26 03:41:20,013 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3203053.3333333335, ans=0.1 2023-11-26 03:41:21,051 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3203120.0, ans=0.0 2023-11-26 03:41:32,253 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3203186.6666666665, ans=0.2 2023-11-26 03:41:33,099 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 11550, loss[loss=0.0713, simple_loss=0.1005, pruned_loss=0.01376, audio_tagging_loss=0.007275, over 15081.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08908, pruned_loss=0.0124, audio_tagging_loss=0.00879, over 3048730.85 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:41:41,805 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3203186.6666666665, ans=0.125 2023-11-26 03:41:49,711 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3203253.3333333335, ans=0.0 2023-11-26 03:41:55,978 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 480500 2023-11-26 03:42:09,074 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 03:42:22,501 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3203453.3333333335, ans=0.125 2023-11-26 03:42:25,578 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3203453.3333333335, ans=0.0 2023-11-26 03:42:26,679 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3203453.3333333335, ans=0.0 2023-11-26 03:42:29,043 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.474e+01 8.932e+01 9.599e+01 1.033e+02 1.724e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-26 03:42:29,070 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 11600, loss[loss=0.07148, simple_loss=0.1005, pruned_loss=0.0127, audio_tagging_loss=0.00855, over 16407.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.08964, pruned_loss=0.01261, audio_tagging_loss=0.008694, over 3050036.23 frames. ], batch size: 61, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:42:32,425 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3203520.0, ans=0.0 2023-11-26 03:42:40,123 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten.whitening_limit, batch_count=3203586.6666666665, ans=22.5 2023-11-26 03:42:45,150 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3203586.6666666665, ans=0.125 2023-11-26 03:42:49,918 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3203653.3333333335, ans=0.0 2023-11-26 03:42:50,962 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 480550 2023-11-26 03:43:17,079 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3203786.6666666665, ans=0.0 2023-11-26 03:43:20,131 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3203786.6666666665, ans=0.125 2023-11-26 03:43:22,382 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3203786.6666666665, ans=0.125 2023-11-26 03:43:24,211 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 11650, loss[loss=0.08025, simple_loss=0.09975, pruned_loss=0.01997, audio_tagging_loss=0.01041, over 14139.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08912, pruned_loss=0.01252, audio_tagging_loss=0.008714, over 3045098.41 frames. ], batch size: 55, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:43:25,521 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff3.min_abs, batch_count=3203853.3333333335, ans=0.2 2023-11-26 03:43:44,968 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3203920.0, ans=0.125 2023-11-26 03:43:46,881 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 480600 2023-11-26 03:44:11,634 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.75 vs. limit=22.5 2023-11-26 03:44:15,073 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3204120.0, ans=0.125 2023-11-26 03:44:15,097 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3204120.0, ans=0.125 2023-11-26 03:44:19,953 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.338e+01 8.387e+01 9.006e+01 9.801e+01 1.650e+02, threshold=1.801e+02, percent-clipped=0.0 2023-11-26 03:44:19,980 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 11700, loss[loss=0.07265, simple_loss=0.1051, pruned_loss=0.01351, audio_tagging_loss=0.006591, over 16076.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08891, pruned_loss=0.01244, audio_tagging_loss=0.008802, over 3041787.27 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:44:30,799 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff3.min_abs, batch_count=3204253.3333333335, ans=0.2 2023-11-26 03:44:38,141 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3204253.3333333335, ans=0.1 2023-11-26 03:44:42,872 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 480650 2023-11-26 03:44:52,518 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3204386.6666666665, ans=0.125 2023-11-26 03:44:52,779 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.09 vs. limit=15.0 2023-11-26 03:45:04,585 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3204453.3333333335, ans=0.2 2023-11-26 03:45:09,793 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3204453.3333333335, ans=0.09899494936611666 2023-11-26 03:45:14,097 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3204453.3333333335, ans=0.125 2023-11-26 03:45:15,983 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 11750, loss[loss=0.06325, simple_loss=0.08706, pruned_loss=0.01221, audio_tagging_loss=0.007503, over 13958.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.0892, pruned_loss=0.01251, audio_tagging_loss=0.008831, over 3040077.34 frames. ], batch size: 54, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:45:19,506 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3204520.0, ans=0.2 2023-11-26 03:45:38,306 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 480700 2023-11-26 03:45:45,833 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3204653.3333333335, ans=0.125 2023-11-26 03:46:11,480 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.706e+01 8.820e+01 9.557e+01 1.032e+02 1.520e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-26 03:46:11,508 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 11800, loss[loss=0.06303, simple_loss=0.08354, pruned_loss=0.01176, audio_tagging_loss=0.009496, over 15533.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.08987, pruned_loss=0.01252, audio_tagging_loss=0.008813, over 3043466.88 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:46:29,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3204920.0, ans=0.07 2023-11-26 03:46:30,910 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.06 vs. limit=6.0 2023-11-26 03:46:34,374 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 480750 2023-11-26 03:47:07,426 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 11850, loss[loss=0.05819, simple_loss=0.077, pruned_loss=0.01, audio_tagging_loss=0.009689, over 15620.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.09015, pruned_loss=0.01264, audio_tagging_loss=0.008917, over 3046863.83 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:47:14,034 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3205186.6666666665, ans=0.07 2023-11-26 03:47:29,854 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 480800 2023-11-26 03:47:38,193 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3205320.0, ans=0.0 2023-11-26 03:47:47,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3205386.6666666665, ans=0.125 2023-11-26 03:48:03,907 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 11900, loss[loss=0.07436, simple_loss=0.1007, pruned_loss=0.01302, audio_tagging_loss=0.01101, over 15207.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.09003, pruned_loss=0.01254, audio_tagging_loss=0.009003, over 3045802.74 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:48:04,896 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.914e+01 8.863e+01 9.443e+01 1.007e+02 1.384e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-26 03:48:10,473 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3205520.0, ans=0.0 2023-11-26 03:48:19,705 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.97 vs. limit=22.5 2023-11-26 03:48:25,792 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 480850 2023-11-26 03:48:27,619 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3205653.3333333335, ans=0.1 2023-11-26 03:48:32,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3205653.3333333335, ans=0.125 2023-11-26 03:48:34,998 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3205653.3333333335, ans=0.0 2023-11-26 03:48:47,634 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3205786.6666666665, ans=0.125 2023-11-26 03:48:49,933 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.90 vs. limit=6.0 2023-11-26 03:48:52,937 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3205786.6666666665, ans=0.0 2023-11-26 03:48:59,010 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 11950, loss[loss=0.06656, simple_loss=0.087, pruned_loss=0.01511, audio_tagging_loss=0.00796, over 15033.00 frames. ], tot_loss[loss=0.06675, simple_loss=0.09022, pruned_loss=0.01264, audio_tagging_loss=0.008994, over 3047619.92 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-26 03:49:04,161 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3205853.3333333335, ans=0.2 2023-11-26 03:49:04,186 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3205853.3333333335, ans=0.125 2023-11-26 03:49:05,672 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.19 vs. limit=15.0 2023-11-26 03:49:06,367 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3205853.3333333335, ans=0.2 2023-11-26 03:49:21,505 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 480900 2023-11-26 03:49:53,274 INFO [train_asr.py:1235] (3/4) Epoch 40, batch 12000, loss[loss=0.06433, simple_loss=0.08349, pruned_loss=0.01364, audio_tagging_loss=0.008943, over 15189.00 frames. ], tot_loss[loss=0.06745, simple_loss=0.09097, pruned_loss=0.0128, audio_tagging_loss=0.009161, over 3050398.24 frames. ], batch size: 55, lr: 1.68e-03, grad_scale: 32.0 2023-11-26 03:49:53,274 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-26 03:50:03,937 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.7853, 2.1710, 3.4009, 3.4925, 3.2075, 3.4154, 3.1401, 3.4186], device='cuda:3') 2023-11-26 03:50:22,414 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.6120, 3.6991, 3.9861, 3.4787], device='cuda:3') 2023-11-26 03:50:25,664 INFO [train_asr.py:1267] (3/4) Epoch 40, validation: loss=0.0579, simple_loss=0.05064, pruned_loss=0.005235, audio_tagging_loss=0.02734, over 4681554.00 frames. 2023-11-26 03:50:25,665 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-26 03:50:26,639 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.710e+01 8.771e+01 9.492e+01 1.018e+02 1.259e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-26 03:50:29,831 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3206186.6666666665, ans=0.1 2023-11-26 03:50:39,621 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3206253.3333333335, ans=0.1 2023-11-26 03:50:47,184 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 480950 2023-11-26 03:50:49,355 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3206320.0, ans=0.1 2023-11-26 03:51:24,317 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 0, loss[loss=0.05612, simple_loss=0.05259, pruned_loss=0.003863, audio_tagging_loss=0.02596, over 15231.00 frames. ], tot_loss[loss=0.05612, simple_loss=0.05259, pruned_loss=0.003863, audio_tagging_loss=0.02596, over 15231.00 frames. ], batch size: 59, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 03:51:24,317 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-26 03:51:55,671 INFO [train_asr.py:1267] (3/4) Epoch 41, validation: loss=0.05811, simple_loss=0.05068, pruned_loss=0.005302, audio_tagging_loss=0.02746, over 4681554.00 frames. 2023-11-26 03:51:55,672 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-26 03:52:13,851 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=3206426.6666666665, ans=0.05 2023-11-26 03:52:16,124 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3206426.6666666665, ans=0.0 2023-11-26 03:52:39,822 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.57 vs. limit=12.0 2023-11-26 03:52:44,405 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 481000 2023-11-26 03:52:51,486 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 50, loss[loss=0.05761, simple_loss=0.06221, pruned_loss=0.008955, audio_tagging_loss=0.01755, over 15381.00 frames. ], tot_loss[loss=0.07572, simple_loss=0.0918, pruned_loss=0.01246, audio_tagging_loss=0.01735, over 685444.28 frames. ], batch size: 59, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 03:52:52,794 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3206693.3333333335, ans=0.125 2023-11-26 03:53:00,239 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.17 vs. limit=15.0 2023-11-26 03:53:06,050 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3206760.0, ans=0.125 2023-11-26 03:53:11,515 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3206760.0, ans=0.125 2023-11-26 03:53:17,910 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.78 vs. limit=15.0 2023-11-26 03:53:19,600 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.787e+01 9.351e+01 1.009e+02 1.085e+02 1.541e+02, threshold=2.017e+02, percent-clipped=0.0 2023-11-26 03:53:25,230 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3206893.3333333335, ans=0.0 2023-11-26 03:53:27,415 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3206893.3333333335, ans=0.125 2023-11-26 03:53:41,039 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 481050 2023-11-26 03:53:45,461 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3206960.0, ans=0.1 2023-11-26 03:53:47,339 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 100, loss[loss=0.06635, simple_loss=0.07305, pruned_loss=0.01038, audio_tagging_loss=0.01945, over 14526.00 frames. ], tot_loss[loss=0.07467, simple_loss=0.09126, pruned_loss=0.01231, audio_tagging_loss=0.01672, over 1204284.16 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 03:53:52,270 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3207026.6666666665, ans=0.125 2023-11-26 03:53:57,521 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3207093.3333333335, ans=0.2 2023-11-26 03:54:01,669 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3207093.3333333335, ans=0.2 2023-11-26 03:54:22,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3207226.6666666665, ans=0.125 2023-11-26 03:54:36,746 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 481100 2023-11-26 03:54:43,048 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 150, loss[loss=0.06557, simple_loss=0.09323, pruned_loss=0.009435, audio_tagging_loss=0.00952, over 15543.00 frames. ], tot_loss[loss=0.07261, simple_loss=0.09062, pruned_loss=0.01233, audio_tagging_loss=0.01497, over 1607637.17 frames. ], batch size: 57, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 03:54:51,910 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3207360.0, ans=0.0 2023-11-26 03:54:59,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3207426.6666666665, ans=0.025 2023-11-26 03:55:10,788 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.583e+01 9.007e+01 9.477e+01 1.014e+02 1.465e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-26 03:55:18,670 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.24 vs. limit=15.0 2023-11-26 03:55:30,313 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3207626.6666666665, ans=0.1 2023-11-26 03:55:32,195 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 481150 2023-11-26 03:55:34,960 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.86 vs. limit=22.5 2023-11-26 03:55:38,447 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 200, loss[loss=0.07915, simple_loss=0.1041, pruned_loss=0.01655, audio_tagging_loss=0.01055, over 15308.00 frames. ], tot_loss[loss=0.07071, simple_loss=0.09006, pruned_loss=0.01246, audio_tagging_loss=0.01322, over 1927227.43 frames. ], batch size: 58, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 03:55:43,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3207693.3333333335, ans=0.0 2023-11-26 03:55:48,346 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3207693.3333333335, ans=0.125 2023-11-26 03:56:04,199 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3207826.6666666665, ans=0.07 2023-11-26 03:56:15,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3207893.3333333335, ans=0.1 2023-11-26 03:56:28,206 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 481200 2023-11-26 03:56:35,418 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 250, loss[loss=0.06453, simple_loss=0.09065, pruned_loss=0.008951, audio_tagging_loss=0.01025, over 14944.00 frames. ], tot_loss[loss=0.07016, simple_loss=0.09117, pruned_loss=0.01264, audio_tagging_loss=0.01194, over 2176196.02 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 03:56:35,634 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 03:56:36,805 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3208026.6666666665, ans=0.125 2023-11-26 03:56:53,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3208093.3333333335, ans=0.125 2023-11-26 03:57:04,325 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.671e+01 8.798e+01 9.430e+01 1.056e+02 1.787e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-26 03:57:24,967 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 481250 2023-11-26 03:57:31,759 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 300, loss[loss=0.07081, simple_loss=0.09952, pruned_loss=0.01309, audio_tagging_loss=0.007956, over 16021.00 frames. ], tot_loss[loss=0.06957, simple_loss=0.09137, pruned_loss=0.01282, audio_tagging_loss=0.01106, over 2374479.45 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 03:57:46,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3208426.6666666665, ans=0.0 2023-11-26 03:57:58,595 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3208493.3333333335, ans=0.1 2023-11-26 03:58:05,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3208560.0, ans=0.09899494936611666 2023-11-26 03:58:05,521 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.25 vs. limit=15.0 2023-11-26 03:58:07,723 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3208560.0, ans=0.125 2023-11-26 03:58:10,364 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3208560.0, ans=0.2 2023-11-26 03:58:20,679 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 481300 2023-11-26 03:58:26,975 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 350, loss[loss=0.06659, simple_loss=0.09359, pruned_loss=0.007907, audio_tagging_loss=0.01189, over 14624.00 frames. ], tot_loss[loss=0.06873, simple_loss=0.09123, pruned_loss=0.0127, audio_tagging_loss=0.01042, over 2524572.79 frames. ], batch size: 52, lr: 1.66e-03, grad_scale: 8.0 2023-11-26 03:58:28,258 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3208693.3333333335, ans=0.2 2023-11-26 03:58:36,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3208693.3333333335, ans=0.125 2023-11-26 03:58:48,843 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.20 vs. limit=15.0 2023-11-26 03:58:52,536 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3208826.6666666665, ans=0.0 2023-11-26 03:58:56,946 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3208826.6666666665, ans=0.2 2023-11-26 03:58:57,803 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.712e+01 8.469e+01 9.311e+01 1.023e+02 1.499e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-26 03:58:58,082 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3208826.6666666665, ans=0.2 2023-11-26 03:59:01,196 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3208893.3333333335, ans=0.125 2023-11-26 03:59:16,448 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 481350 2023-11-26 03:59:18,104 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.25 vs. limit=22.5 2023-11-26 03:59:19,670 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3208960.0, ans=0.125 2023-11-26 03:59:22,709 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 400, loss[loss=0.05389, simple_loss=0.06911, pruned_loss=0.006792, audio_tagging_loss=0.01254, over 16788.00 frames. ], tot_loss[loss=0.06767, simple_loss=0.09037, pruned_loss=0.01245, audio_tagging_loss=0.01003, over 2645521.83 frames. ], batch size: 62, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 03:59:42,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3209093.3333333335, ans=0.0 2023-11-26 03:59:43,456 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3209093.3333333335, ans=0.125 2023-11-26 04:00:11,917 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 481400 2023-11-26 04:00:18,077 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.98 vs. limit=15.0 2023-11-26 04:00:19,427 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 450, loss[loss=0.07057, simple_loss=0.1006, pruned_loss=0.01352, audio_tagging_loss=0.006728, over 14788.00 frames. ], tot_loss[loss=0.06705, simple_loss=0.0896, pruned_loss=0.0124, audio_tagging_loss=0.00985, over 2732434.26 frames. ], batch size: 55, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:00:19,684 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3209360.0, ans=0.2 2023-11-26 04:00:39,824 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3209493.3333333335, ans=0.2 2023-11-26 04:00:48,531 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.983e+01 8.492e+01 9.023e+01 9.553e+01 1.244e+02, threshold=1.805e+02, percent-clipped=0.0 2023-11-26 04:01:08,237 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 481450 2023-11-26 04:01:08,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3209626.6666666665, ans=0.125 2023-11-26 04:01:14,668 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 500, loss[loss=0.07438, simple_loss=0.09172, pruned_loss=0.0157, audio_tagging_loss=0.01281, over 14414.00 frames. ], tot_loss[loss=0.06697, simple_loss=0.08958, pruned_loss=0.01259, audio_tagging_loss=0.009586, over 2804396.32 frames. ], batch size: 54, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:01:23,926 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3209693.3333333335, ans=0.2 2023-11-26 04:01:29,625 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.72 vs. limit=22.5 2023-11-26 04:01:39,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3209826.6666666665, ans=0.2 2023-11-26 04:01:45,920 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3209826.6666666665, ans=0.125 2023-11-26 04:01:59,904 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3209960.0, ans=0.05 2023-11-26 04:02:04,049 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 481500 2023-11-26 04:02:08,963 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3209960.0, ans=0.2 2023-11-26 04:02:10,840 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 550, loss[loss=0.07748, simple_loss=0.0984, pruned_loss=0.01995, audio_tagging_loss=0.008328, over 15941.00 frames. ], tot_loss[loss=0.06679, simple_loss=0.08996, pruned_loss=0.01238, audio_tagging_loss=0.009429, over 2860185.96 frames. ], batch size: 61, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:02:11,001 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3210026.6666666665, ans=0.0 2023-11-26 04:02:34,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3210160.0, ans=0.1 2023-11-26 04:02:41,090 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.145e+01 8.521e+01 9.213e+01 9.979e+01 1.259e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-26 04:02:51,392 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3210226.6666666665, ans=0.125 2023-11-26 04:02:57,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3210293.3333333335, ans=0.0 2023-11-26 04:02:59,638 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 481550 2023-11-26 04:03:06,630 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 600, loss[loss=0.05031, simple_loss=0.065, pruned_loss=0.008618, audio_tagging_loss=0.009193, over 13449.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08882, pruned_loss=0.01212, audio_tagging_loss=0.00933, over 2899159.45 frames. ], batch size: 53, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:03:10,059 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3210360.0, ans=10.0 2023-11-26 04:03:17,432 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3210426.6666666665, ans=0.0 2023-11-26 04:03:29,086 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3210493.3333333335, ans=0.125 2023-11-26 04:03:31,794 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 04:03:55,067 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 481600 2023-11-26 04:04:01,751 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 650, loss[loss=0.06405, simple_loss=0.08973, pruned_loss=0.01127, audio_tagging_loss=0.007919, over 14296.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.08966, pruned_loss=0.01219, audio_tagging_loss=0.009268, over 2928101.85 frames. ], batch size: 55, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:04:05,079 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3210693.3333333335, ans=0.125 2023-11-26 04:04:32,178 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.220e+01 8.848e+01 9.324e+01 9.991e+01 1.249e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-26 04:04:48,542 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3210960.0, ans=0.0 2023-11-26 04:04:50,517 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 481650 2023-11-26 04:04:57,405 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 700, loss[loss=0.04473, simple_loss=0.05631, pruned_loss=0.007042, audio_tagging_loss=0.00953, over 15647.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.08935, pruned_loss=0.01211, audio_tagging_loss=0.009246, over 2953530.97 frames. ], batch size: 59, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:05:13,193 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3211093.3333333335, ans=0.125 2023-11-26 04:05:15,806 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3211093.3333333335, ans=0.125 2023-11-26 04:05:33,337 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.73 vs. limit=22.5 2023-11-26 04:05:38,790 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3211226.6666666665, ans=0.0 2023-11-26 04:05:43,586 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.78 vs. limit=15.0 2023-11-26 04:05:46,373 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 481700 2023-11-26 04:05:52,714 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 750, loss[loss=0.0669, simple_loss=0.07971, pruned_loss=0.01684, audio_tagging_loss=0.01021, over 13696.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.09004, pruned_loss=0.01229, audio_tagging_loss=0.00912, over 2979464.26 frames. ], batch size: 55, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:06:00,749 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.02 vs. limit=15.0 2023-11-26 04:06:15,445 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3211493.3333333335, ans=0.0 2023-11-26 04:06:23,103 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.257e+01 8.760e+01 9.267e+01 1.006e+02 1.673e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-26 04:06:26,687 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3211560.0, ans=0.1 2023-11-26 04:06:32,022 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3211560.0, ans=0.04949747468305833 2023-11-26 04:06:32,943 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3211560.0, ans=0.125 2023-11-26 04:06:40,196 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.00 vs. limit=15.0 2023-11-26 04:06:40,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3211626.6666666665, ans=0.125 2023-11-26 04:06:41,747 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 481750 2023-11-26 04:06:48,718 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 800, loss[loss=0.08566, simple_loss=0.1231, pruned_loss=0.01802, audio_tagging_loss=0.006101, over 15878.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.08966, pruned_loss=0.01231, audio_tagging_loss=0.009162, over 2993823.61 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:06:52,097 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3211693.3333333335, ans=0.125 2023-11-26 04:07:36,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3211960.0, ans=0.2 2023-11-26 04:07:37,239 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 481800 2023-11-26 04:07:44,315 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 850, loss[loss=0.08671, simple_loss=0.1224, pruned_loss=0.01748, audio_tagging_loss=0.00803, over 14891.00 frames. ], tot_loss[loss=0.06694, simple_loss=0.09049, pruned_loss=0.01254, audio_tagging_loss=0.009158, over 3008276.84 frames. ], batch size: 53, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:07:49,961 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3212026.6666666665, ans=0.1 2023-11-26 04:08:14,112 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.717e+01 8.719e+01 9.497e+01 1.051e+02 1.257e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-26 04:08:14,472 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3212160.0, ans=0.04949747468305833 2023-11-26 04:08:15,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3212160.0, ans=0.125 2023-11-26 04:08:20,029 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.32 vs. limit=15.0 2023-11-26 04:08:24,854 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3212226.6666666665, ans=0.2 2023-11-26 04:08:28,994 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=3212293.3333333335, ans=15.0 2023-11-26 04:08:32,631 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 481850 2023-11-26 04:08:33,987 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3212293.3333333335, ans=0.04949747468305833 2023-11-26 04:08:38,952 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 900, loss[loss=0.08074, simple_loss=0.1091, pruned_loss=0.01889, audio_tagging_loss=0.00731, over 14680.00 frames. ], tot_loss[loss=0.06709, simple_loss=0.09033, pruned_loss=0.01272, audio_tagging_loss=0.009206, over 3014882.92 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:08:52,410 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3212426.6666666665, ans=0.0 2023-11-26 04:09:05,670 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3212493.3333333335, ans=0.125 2023-11-26 04:09:17,685 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3212560.0, ans=0.2 2023-11-26 04:09:24,655 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3212626.6666666665, ans=0.0 2023-11-26 04:09:27,756 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 481900 2023-11-26 04:09:30,991 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3212626.6666666665, ans=0.125 2023-11-26 04:09:34,172 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 950, loss[loss=0.05502, simple_loss=0.07477, pruned_loss=0.008641, audio_tagging_loss=0.008992, over 15464.00 frames. ], tot_loss[loss=0.06764, simple_loss=0.09119, pruned_loss=0.01296, audio_tagging_loss=0.009077, over 3024196.98 frames. ], batch size: 58, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:09:41,311 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3212693.3333333335, ans=0.125 2023-11-26 04:09:42,461 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3212693.3333333335, ans=0.0 2023-11-26 04:09:46,747 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3212760.0, ans=0.1 2023-11-26 04:10:04,068 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.539e+01 8.675e+01 9.421e+01 1.013e+02 1.384e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-26 04:10:22,261 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.79 vs. limit=15.0 2023-11-26 04:10:23,871 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 481950 2023-11-26 04:10:25,132 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3212960.0, ans=0.125 2023-11-26 04:10:30,173 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 1000, loss[loss=0.07451, simple_loss=0.1038, pruned_loss=0.01463, audio_tagging_loss=0.00798, over 14102.00 frames. ], tot_loss[loss=0.0672, simple_loss=0.09066, pruned_loss=0.01289, audio_tagging_loss=0.008976, over 3021791.38 frames. ], batch size: 54, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:10:43,609 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3213093.3333333335, ans=0.125 2023-11-26 04:10:54,192 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 04:11:13,920 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3213293.3333333335, ans=0.125 2023-11-26 04:11:18,677 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3213293.3333333335, ans=0.125 2023-11-26 04:11:18,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3213293.3333333335, ans=0.125 2023-11-26 04:11:19,744 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 482000 2023-11-26 04:11:20,951 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3213293.3333333335, ans=0.1 2023-11-26 04:11:26,292 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 1050, loss[loss=0.05768, simple_loss=0.07828, pruned_loss=0.01118, audio_tagging_loss=0.007364, over 15422.00 frames. ], tot_loss[loss=0.06737, simple_loss=0.09103, pruned_loss=0.01299, audio_tagging_loss=0.00887, over 3027757.66 frames. ], batch size: 57, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:11:41,402 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3213426.6666666665, ans=0.1 2023-11-26 04:11:49,343 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3213493.3333333335, ans=0.125 2023-11-26 04:11:57,697 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.120e+01 8.643e+01 9.285e+01 1.025e+02 1.343e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-26 04:12:13,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3213626.6666666665, ans=0.0 2023-11-26 04:12:16,599 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 482050 2023-11-26 04:12:22,926 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 1100, loss[loss=0.07634, simple_loss=0.09784, pruned_loss=0.01681, audio_tagging_loss=0.01061, over 15513.00 frames. ], tot_loss[loss=0.06721, simple_loss=0.09082, pruned_loss=0.01292, audio_tagging_loss=0.008881, over 3037707.56 frames. ], batch size: 58, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:12:25,130 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 04:13:11,055 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 482100 2023-11-26 04:13:14,941 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.84 vs. limit=15.0 2023-11-26 04:13:17,888 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 1150, loss[loss=0.07223, simple_loss=0.1019, pruned_loss=0.01313, audio_tagging_loss=0.00816, over 14929.00 frames. ], tot_loss[loss=0.06747, simple_loss=0.09155, pruned_loss=0.01291, audio_tagging_loss=0.008778, over 3042959.24 frames. ], batch size: 55, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:13:38,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3214093.3333333335, ans=0.1 2023-11-26 04:13:39,474 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3214160.0, ans=0.125 2023-11-26 04:13:48,699 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.669e+01 8.737e+01 9.281e+01 9.829e+01 1.139e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-26 04:14:06,865 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 482150 2023-11-26 04:14:13,200 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 1200, loss[loss=0.094, simple_loss=0.1312, pruned_loss=0.02181, audio_tagging_loss=0.006607, over 15341.00 frames. ], tot_loss[loss=0.0676, simple_loss=0.09174, pruned_loss=0.01299, audio_tagging_loss=0.008743, over 3052415.83 frames. ], batch size: 55, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:14:27,376 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3214426.6666666665, ans=0.2 2023-11-26 04:14:27,547 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.81 vs. limit=15.0 2023-11-26 04:14:35,900 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3214493.3333333335, ans=0.125 2023-11-26 04:14:43,196 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3214493.3333333335, ans=0.125 2023-11-26 04:15:00,026 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3214626.6666666665, ans=0.125 2023-11-26 04:15:00,045 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3214626.6666666665, ans=0.0 2023-11-26 04:15:02,002 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 482200 2023-11-26 04:15:06,184 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3214626.6666666665, ans=0.125 2023-11-26 04:15:09,165 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 1250, loss[loss=0.06402, simple_loss=0.08135, pruned_loss=0.01291, audio_tagging_loss=0.01044, over 15072.00 frames. ], tot_loss[loss=0.06723, simple_loss=0.09128, pruned_loss=0.01289, audio_tagging_loss=0.008708, over 3046707.83 frames. ], batch size: 57, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:15:39,739 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.598e+01 8.848e+01 9.499e+01 1.001e+02 1.397e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-26 04:15:57,542 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 482250 2023-11-26 04:16:03,851 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 1300, loss[loss=0.04997, simple_loss=0.06249, pruned_loss=0.009629, audio_tagging_loss=0.009093, over 15309.00 frames. ], tot_loss[loss=0.06709, simple_loss=0.09115, pruned_loss=0.01281, audio_tagging_loss=0.008708, over 3045157.39 frames. ], batch size: 57, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:16:22,280 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3215093.3333333335, ans=0.125 2023-11-26 04:16:24,342 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3215093.3333333335, ans=0.0 2023-11-26 04:16:28,643 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3215160.0, ans=0.125 2023-11-26 04:16:34,839 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3215160.0, ans=0.125 2023-11-26 04:16:34,917 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3215160.0, ans=0.1 2023-11-26 04:16:39,492 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.39 vs. limit=15.0 2023-11-26 04:16:53,406 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 482300 2023-11-26 04:16:59,833 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.05 vs. limit=15.0 2023-11-26 04:17:00,349 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 1350, loss[loss=0.06359, simple_loss=0.08837, pruned_loss=0.01136, audio_tagging_loss=0.008048, over 14668.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.09016, pruned_loss=0.01257, audio_tagging_loss=0.008787, over 3047528.14 frames. ], batch size: 55, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:17:14,648 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.30 vs. limit=22.5 2023-11-26 04:17:19,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3215426.6666666665, ans=0.125 2023-11-26 04:17:31,337 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.022e+01 8.487e+01 8.991e+01 9.732e+01 2.025e+02, threshold=1.798e+02, percent-clipped=1.0 2023-11-26 04:17:37,445 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3215560.0, ans=0.125 2023-11-26 04:17:41,001 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 04:17:41,349 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.66 vs. limit=15.0 2023-11-26 04:17:49,431 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 482350 2023-11-26 04:17:50,565 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3215626.6666666665, ans=0.125 2023-11-26 04:17:51,221 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3215626.6666666665, ans=0.125 2023-11-26 04:17:56,836 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 1400, loss[loss=0.06122, simple_loss=0.08102, pruned_loss=0.01285, audio_tagging_loss=0.007854, over 14673.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.08988, pruned_loss=0.01258, audio_tagging_loss=0.008861, over 3045616.41 frames. ], batch size: 57, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:18:09,629 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3215760.0, ans=0.125 2023-11-26 04:18:20,149 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.25 vs. limit=22.5 2023-11-26 04:18:27,122 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.61 vs. limit=15.0 2023-11-26 04:18:34,001 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3215893.3333333335, ans=0.0 2023-11-26 04:18:37,645 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3215893.3333333335, ans=0.125 2023-11-26 04:18:45,873 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 482400 2023-11-26 04:18:47,394 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.42 vs. limit=15.0 2023-11-26 04:18:51,668 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3216026.6666666665, ans=0.0 2023-11-26 04:18:52,445 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 1450, loss[loss=0.0692, simple_loss=0.09778, pruned_loss=0.01225, audio_tagging_loss=0.008062, over 15137.00 frames. ], tot_loss[loss=0.06714, simple_loss=0.09074, pruned_loss=0.01282, audio_tagging_loss=0.008952, over 3051403.59 frames. ], batch size: 59, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:19:24,160 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.356e+01 8.656e+01 9.210e+01 9.975e+01 1.432e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-26 04:19:24,807 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.77 vs. limit=6.0 2023-11-26 04:19:35,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3216226.6666666665, ans=0.0 2023-11-26 04:19:41,264 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 482450 2023-11-26 04:19:48,062 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 1500, loss[loss=0.0568, simple_loss=0.06756, pruned_loss=0.01048, audio_tagging_loss=0.01255, over 14699.00 frames. ], tot_loss[loss=0.06702, simple_loss=0.09045, pruned_loss=0.01275, audio_tagging_loss=0.009047, over 3040552.57 frames. ], batch size: 57, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:20:04,598 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3216426.6666666665, ans=0.5 2023-11-26 04:20:04,799 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.49 vs. limit=22.5 2023-11-26 04:20:16,201 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3216493.3333333335, ans=0.125 2023-11-26 04:20:17,146 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3216493.3333333335, ans=0.125 2023-11-26 04:20:19,428 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3216493.3333333335, ans=0.125 2023-11-26 04:20:30,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3216560.0, ans=0.0 2023-11-26 04:20:37,465 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 482500 2023-11-26 04:20:44,833 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 1550, loss[loss=0.08241, simple_loss=0.1199, pruned_loss=0.01646, audio_tagging_loss=0.005994, over 14650.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.0905, pruned_loss=0.01255, audio_tagging_loss=0.009032, over 3046393.81 frames. ], batch size: 54, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:20:46,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3216693.3333333335, ans=0.2 2023-11-26 04:20:46,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3216693.3333333335, ans=0.1 2023-11-26 04:20:52,543 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3216693.3333333335, ans=0.125 2023-11-26 04:21:05,202 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3216826.6666666665, ans=0.0 2023-11-26 04:21:15,035 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.470e+01 8.764e+01 9.258e+01 1.010e+02 1.215e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-26 04:21:23,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3216893.3333333335, ans=0.0 2023-11-26 04:21:31,771 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3216960.0, ans=0.95 2023-11-26 04:21:33,701 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 482550 2023-11-26 04:21:40,015 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 1600, loss[loss=0.06426, simple_loss=0.08482, pruned_loss=0.01198, audio_tagging_loss=0.009875, over 15315.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.08995, pruned_loss=0.01243, audio_tagging_loss=0.009113, over 3044817.00 frames. ], batch size: 59, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:21:44,394 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3217026.6666666665, ans=0.2 2023-11-26 04:21:44,783 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.30 vs. limit=12.0 2023-11-26 04:21:45,605 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3217026.6666666665, ans=0.125 2023-11-26 04:21:47,743 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3217026.6666666665, ans=0.0 2023-11-26 04:21:59,389 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.50 vs. limit=15.0 2023-11-26 04:22:27,989 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3217293.3333333335, ans=0.1 2023-11-26 04:22:28,870 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 482600 2023-11-26 04:22:36,041 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 1650, loss[loss=0.06562, simple_loss=0.08197, pruned_loss=0.01353, audio_tagging_loss=0.01111, over 15130.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.08988, pruned_loss=0.0123, audio_tagging_loss=0.009081, over 3039857.13 frames. ], batch size: 57, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:22:42,792 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3217360.0, ans=0.125 2023-11-26 04:22:53,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3217426.6666666665, ans=0.2 2023-11-26 04:22:53,365 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3217426.6666666665, ans=0.0 2023-11-26 04:23:07,732 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.278e+01 8.467e+01 9.120e+01 9.826e+01 1.173e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-26 04:23:24,374 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 482650 2023-11-26 04:23:31,214 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 1700, loss[loss=0.07447, simple_loss=0.09679, pruned_loss=0.0188, audio_tagging_loss=0.007275, over 15841.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.08938, pruned_loss=0.01236, audio_tagging_loss=0.009149, over 3039448.18 frames. ], batch size: 61, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:23:43,493 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3217760.0, ans=0.1 2023-11-26 04:23:46,814 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3217760.0, ans=0.125 2023-11-26 04:23:50,090 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.46 vs. limit=10.0 2023-11-26 04:24:09,386 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3217893.3333333335, ans=0.0 2023-11-26 04:24:09,808 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.99 vs. limit=22.5 2023-11-26 04:24:19,591 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3217960.0, ans=0.95 2023-11-26 04:24:20,460 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 482700 2023-11-26 04:24:26,773 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 1750, loss[loss=0.06181, simple_loss=0.07865, pruned_loss=0.01221, audio_tagging_loss=0.01028, over 14868.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.08938, pruned_loss=0.01238, audio_tagging_loss=0.009086, over 3047975.31 frames. ], batch size: 55, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:24:29,071 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3218026.6666666665, ans=0.0 2023-11-26 04:24:48,870 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.06 vs. limit=6.0 2023-11-26 04:24:59,626 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.364e+01 8.710e+01 9.428e+01 1.004e+02 1.247e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-26 04:25:11,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3218293.3333333335, ans=0.0 2023-11-26 04:25:15,514 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 482750 2023-11-26 04:25:22,298 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 1800, loss[loss=0.06186, simple_loss=0.07904, pruned_loss=0.01463, audio_tagging_loss=0.007707, over 15353.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08902, pruned_loss=0.01231, audio_tagging_loss=0.008964, over 3046162.63 frames. ], batch size: 58, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:25:25,783 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3218360.0, ans=0.125 2023-11-26 04:25:35,906 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3218426.6666666665, ans=0.125 2023-11-26 04:25:47,808 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3218493.3333333335, ans=0.125 2023-11-26 04:25:55,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3218560.0, ans=0.125 2023-11-26 04:26:07,527 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3218626.6666666665, ans=0.125 2023-11-26 04:26:10,734 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3218626.6666666665, ans=0.07 2023-11-26 04:26:11,628 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 482800 2023-11-26 04:26:16,619 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.03 vs. limit=15.0 2023-11-26 04:26:18,153 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 1850, loss[loss=0.04621, simple_loss=0.06001, pruned_loss=0.009893, audio_tagging_loss=0.006306, over 14409.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08853, pruned_loss=0.01229, audio_tagging_loss=0.008933, over 3045915.75 frames. ], batch size: 57, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:26:28,567 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3218693.3333333335, ans=0.125 2023-11-26 04:26:39,286 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3218760.0, ans=0.125 2023-11-26 04:26:39,352 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3218760.0, ans=0.125 2023-11-26 04:26:39,374 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3218760.0, ans=0.125 2023-11-26 04:26:48,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3218826.6666666665, ans=0.0 2023-11-26 04:26:51,386 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.983e+01 8.663e+01 9.346e+01 1.025e+02 1.313e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-26 04:26:53,743 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3218893.3333333335, ans=0.1 2023-11-26 04:26:55,141 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.74 vs. limit=15.0 2023-11-26 04:27:03,852 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3218960.0, ans=0.125 2023-11-26 04:27:08,505 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 482850 2023-11-26 04:27:15,284 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 1900, loss[loss=0.06383, simple_loss=0.08278, pruned_loss=0.01169, audio_tagging_loss=0.01075, over 15141.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.08897, pruned_loss=0.01232, audio_tagging_loss=0.008937, over 3047378.99 frames. ], batch size: 57, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:27:29,736 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=3219093.3333333335, ans=0.025 2023-11-26 04:27:35,219 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3219093.3333333335, ans=0.0 2023-11-26 04:27:47,876 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3219226.6666666665, ans=0.125 2023-11-26 04:27:47,952 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3219226.6666666665, ans=0.125 2023-11-26 04:27:52,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3219226.6666666665, ans=0.125 2023-11-26 04:27:55,967 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3219226.6666666665, ans=0.125 2023-11-26 04:28:04,357 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 482900 2023-11-26 04:28:11,346 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 1950, loss[loss=0.07711, simple_loss=0.09182, pruned_loss=0.02038, audio_tagging_loss=0.01082, over 14830.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08871, pruned_loss=0.01218, audio_tagging_loss=0.008933, over 3044356.93 frames. ], batch size: 57, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:28:22,417 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.30 vs. limit=15.0 2023-11-26 04:28:30,525 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3219426.6666666665, ans=0.125 2023-11-26 04:28:32,147 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.15 vs. limit=15.0 2023-11-26 04:28:43,714 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.058e+01 8.481e+01 9.198e+01 9.869e+01 1.193e+02, threshold=1.840e+02, percent-clipped=0.0 2023-11-26 04:29:00,264 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 482950 2023-11-26 04:29:04,587 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3219626.6666666665, ans=0.07 2023-11-26 04:29:06,470 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 2000, loss[loss=0.08976, simple_loss=0.1325, pruned_loss=0.01622, audio_tagging_loss=0.007292, over 14474.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.08861, pruned_loss=0.01235, audio_tagging_loss=0.008894, over 3041080.37 frames. ], batch size: 55, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:29:06,626 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3219693.3333333335, ans=0.125 2023-11-26 04:29:27,727 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3219760.0, ans=0.0 2023-11-26 04:29:33,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3219826.6666666665, ans=0.1 2023-11-26 04:29:56,810 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 483000 2023-11-26 04:30:03,469 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 2050, loss[loss=0.069, simple_loss=0.09451, pruned_loss=0.01387, audio_tagging_loss=0.007873, over 13675.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.08906, pruned_loss=0.01244, audio_tagging_loss=0.008831, over 3042577.65 frames. ], batch size: 53, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:30:14,040 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3220093.3333333335, ans=0.1 2023-11-26 04:30:22,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=3220093.3333333335, ans=0.05 2023-11-26 04:30:30,535 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3220160.0, ans=0.0 2023-11-26 04:30:32,589 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=3220160.0, ans=0.05 2023-11-26 04:30:36,115 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.199e+01 8.605e+01 9.268e+01 1.003e+02 1.182e+02, threshold=1.854e+02, percent-clipped=0.0 2023-11-26 04:30:48,984 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.86 vs. limit=15.0 2023-11-26 04:30:53,351 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 483050 2023-11-26 04:30:55,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3220293.3333333335, ans=0.125 2023-11-26 04:30:57,841 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3220293.3333333335, ans=0.125 2023-11-26 04:30:59,861 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 2100, loss[loss=0.0667, simple_loss=0.09052, pruned_loss=0.01123, audio_tagging_loss=0.0102, over 16725.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.0901, pruned_loss=0.01248, audio_tagging_loss=0.008676, over 3047014.58 frames. ], batch size: 62, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:31:20,278 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3220426.6666666665, ans=0.125 2023-11-26 04:31:26,267 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.56 vs. limit=15.0 2023-11-26 04:31:37,961 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3220560.0, ans=0.1 2023-11-26 04:31:41,239 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3220560.0, ans=0.0 2023-11-26 04:31:47,005 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3220626.6666666665, ans=0.125 2023-11-26 04:31:49,122 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 483100 2023-11-26 04:31:55,336 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 2150, loss[loss=0.07285, simple_loss=0.09289, pruned_loss=0.0164, audio_tagging_loss=0.01, over 14266.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.0903, pruned_loss=0.0125, audio_tagging_loss=0.008679, over 3046080.97 frames. ], batch size: 53, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:32:00,446 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3220693.3333333335, ans=0.125 2023-11-26 04:32:10,692 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3220760.0, ans=0.125 2023-11-26 04:32:19,295 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3220826.6666666665, ans=0.0 2023-11-26 04:32:28,583 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.889e+01 8.606e+01 9.465e+01 1.020e+02 1.219e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 04:32:29,711 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 04:32:43,343 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3220960.0, ans=0.125 2023-11-26 04:32:45,728 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 483150 2023-11-26 04:32:47,016 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3220960.0, ans=0.0 2023-11-26 04:32:49,706 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.16 vs. limit=10.0 2023-11-26 04:32:50,629 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.48 vs. limit=12.0 2023-11-26 04:32:52,099 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 2200, loss[loss=0.07311, simple_loss=0.09463, pruned_loss=0.01421, audio_tagging_loss=0.01158, over 15073.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.09045, pruned_loss=0.01255, audio_tagging_loss=0.00875, over 3047494.11 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 32.0 2023-11-26 04:33:04,123 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3221093.3333333335, ans=0.025 2023-11-26 04:33:07,906 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 04:33:08,919 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3221093.3333333335, ans=0.125 2023-11-26 04:33:21,232 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.01 vs. limit=10.0 2023-11-26 04:33:30,673 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3221226.6666666665, ans=0.125 2023-11-26 04:33:36,345 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.51 vs. limit=15.0 2023-11-26 04:33:41,023 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 483200 2023-11-26 04:33:44,594 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3221293.3333333335, ans=0.125 2023-11-26 04:33:47,555 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 2250, loss[loss=0.07146, simple_loss=0.09794, pruned_loss=0.01339, audio_tagging_loss=0.0091, over 15338.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.09003, pruned_loss=0.01258, audio_tagging_loss=0.00883, over 3049284.38 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:34:08,593 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3221426.6666666665, ans=0.0 2023-11-26 04:34:21,562 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.244e+01 8.817e+01 9.211e+01 9.808e+01 1.275e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-26 04:34:23,475 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.16 vs. limit=15.0 2023-11-26 04:34:23,915 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3221560.0, ans=0.125 2023-11-26 04:34:37,731 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 483250 2023-11-26 04:34:38,906 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3221626.6666666665, ans=0.125 2023-11-26 04:34:44,125 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 2300, loss[loss=0.07852, simple_loss=0.1165, pruned_loss=0.01429, audio_tagging_loss=0.005968, over 16803.00 frames. ], tot_loss[loss=0.06663, simple_loss=0.09017, pruned_loss=0.01273, audio_tagging_loss=0.008816, over 3049516.70 frames. ], batch size: 63, lr: 1.66e-03, grad_scale: 16.0 2023-11-26 04:34:55,022 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3221760.0, ans=0.125 2023-11-26 04:35:32,601 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 04:35:32,653 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 483300 2023-11-26 04:35:37,255 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3221960.0, ans=0.125 2023-11-26 04:35:40,106 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 2350, loss[loss=0.04666, simple_loss=0.05009, pruned_loss=0.008704, audio_tagging_loss=0.01291, over 14457.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.08946, pruned_loss=0.01259, audio_tagging_loss=0.008901, over 3045045.21 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 04:36:01,734 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.14 vs. limit=6.0 2023-11-26 04:36:13,184 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.504e+01 8.775e+01 9.413e+01 9.957e+01 1.252e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-26 04:36:14,599 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3222226.6666666665, ans=0.2 2023-11-26 04:36:17,141 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3222226.6666666665, ans=0.04949747468305833 2023-11-26 04:36:29,240 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 483350 2023-11-26 04:36:34,211 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.11 vs. limit=15.0 2023-11-26 04:36:35,655 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 2400, loss[loss=0.07085, simple_loss=0.0957, pruned_loss=0.01473, audio_tagging_loss=0.008269, over 15468.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.08954, pruned_loss=0.01249, audio_tagging_loss=0.009024, over 3047020.84 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:36:52,792 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3222426.6666666665, ans=0.09899494936611666 2023-11-26 04:37:02,355 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3222493.3333333335, ans=0.1 2023-11-26 04:37:04,468 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3222493.3333333335, ans=0.1 2023-11-26 04:37:07,617 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3222493.3333333335, ans=0.125 2023-11-26 04:37:07,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3222493.3333333335, ans=0.0 2023-11-26 04:37:17,264 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 04:37:19,352 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3222626.6666666665, ans=0.2 2023-11-26 04:37:19,466 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3222626.6666666665, ans=0.0 2023-11-26 04:37:23,692 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3222626.6666666665, ans=0.125 2023-11-26 04:37:24,482 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 483400 2023-11-26 04:37:32,121 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 2450, loss[loss=0.07197, simple_loss=0.1035, pruned_loss=0.01331, audio_tagging_loss=0.006924, over 14985.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.08951, pruned_loss=0.01247, audio_tagging_loss=0.009118, over 3045279.01 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:37:34,549 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3222693.3333333335, ans=0.2 2023-11-26 04:37:45,079 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3222760.0, ans=0.2 2023-11-26 04:37:47,270 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3222760.0, ans=0.125 2023-11-26 04:37:59,003 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3222826.6666666665, ans=0.04949747468305833 2023-11-26 04:38:05,006 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.815e+01 8.820e+01 9.460e+01 9.914e+01 1.229e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-26 04:38:20,995 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 483450 2023-11-26 04:38:28,468 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 2500, loss[loss=0.0535, simple_loss=0.06488, pruned_loss=0.009034, audio_tagging_loss=0.01203, over 15140.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.08905, pruned_loss=0.01249, audio_tagging_loss=0.009224, over 3040467.15 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:38:48,053 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.80 vs. limit=15.0 2023-11-26 04:39:01,607 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3223226.6666666665, ans=0.125 2023-11-26 04:39:17,465 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 483500 2023-11-26 04:39:19,684 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3223293.3333333335, ans=0.0 2023-11-26 04:39:23,659 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 2550, loss[loss=0.09306, simple_loss=0.1222, pruned_loss=0.02424, audio_tagging_loss=0.00773, over 15369.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08856, pruned_loss=0.01245, audio_tagging_loss=0.009133, over 3044920.73 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:39:28,053 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3223360.0, ans=0.2 2023-11-26 04:39:39,406 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3223426.6666666665, ans=0.125 2023-11-26 04:39:58,033 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.362e+01 8.655e+01 9.369e+01 9.898e+01 1.233e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-26 04:39:58,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3223560.0, ans=0.2 2023-11-26 04:39:59,384 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3223560.0, ans=0.0 2023-11-26 04:40:05,822 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3223560.0, ans=0.125 2023-11-26 04:40:13,207 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 483550 2023-11-26 04:40:19,302 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3223693.3333333335, ans=0.125 2023-11-26 04:40:20,118 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 2600, loss[loss=0.05281, simple_loss=0.07355, pruned_loss=0.007672, audio_tagging_loss=0.008366, over 15170.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08869, pruned_loss=0.01227, audio_tagging_loss=0.008974, over 3045159.91 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:40:21,429 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 04:41:09,396 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 483600 2023-11-26 04:41:17,196 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 2650, loss[loss=0.07594, simple_loss=0.1117, pruned_loss=0.01285, audio_tagging_loss=0.00724, over 15558.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08893, pruned_loss=0.01234, audio_tagging_loss=0.008911, over 3044515.38 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:41:22,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3224026.6666666665, ans=0.125 2023-11-26 04:41:37,273 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.14 vs. limit=15.0 2023-11-26 04:41:40,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3224160.0, ans=0.07 2023-11-26 04:41:50,091 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.141e+01 8.492e+01 9.203e+01 1.002e+02 1.237e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-26 04:42:06,580 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 483650 2023-11-26 04:42:12,275 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.08 vs. limit=15.0 2023-11-26 04:42:12,970 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 2700, loss[loss=0.07104, simple_loss=0.09301, pruned_loss=0.01485, audio_tagging_loss=0.00968, over 14887.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.08981, pruned_loss=0.01239, audio_tagging_loss=0.008782, over 3047090.52 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 04:42:14,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3224360.0, ans=0.0 2023-11-26 04:42:14,243 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3224360.0, ans=0.125 2023-11-26 04:42:27,555 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3224426.6666666665, ans=0.0 2023-11-26 04:42:50,001 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.79 vs. limit=12.0 2023-11-26 04:42:50,659 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=3224560.0, ans=0.5 2023-11-26 04:42:50,758 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3224560.0, ans=0.125 2023-11-26 04:42:57,027 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3224626.6666666665, ans=0.125 2023-11-26 04:43:02,275 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 483700 2023-11-26 04:43:08,503 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 2750, loss[loss=0.06962, simple_loss=0.08773, pruned_loss=0.01616, audio_tagging_loss=0.009596, over 13703.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08917, pruned_loss=0.0123, audio_tagging_loss=0.008826, over 3054272.46 frames. ], batch size: 53, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 04:43:12,528 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 04:43:15,842 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.75 vs. limit=22.5 2023-11-26 04:43:19,213 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.19 vs. limit=15.0 2023-11-26 04:43:32,196 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3224826.6666666665, ans=0.125 2023-11-26 04:43:40,317 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=8.02 vs. limit=8.0 2023-11-26 04:43:43,678 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.484e+01 8.913e+01 9.370e+01 9.874e+01 1.312e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-26 04:43:54,940 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3224960.0, ans=0.125 2023-11-26 04:43:55,854 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 04:43:57,983 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 483750 2023-11-26 04:44:04,799 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 2800, loss[loss=0.06767, simple_loss=0.09336, pruned_loss=0.01171, audio_tagging_loss=0.009275, over 14454.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.08959, pruned_loss=0.01243, audio_tagging_loss=0.00879, over 3051743.62 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:44:13,554 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3225026.6666666665, ans=0.0 2023-11-26 04:44:16,825 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3225093.3333333335, ans=0.5 2023-11-26 04:44:19,880 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3225093.3333333335, ans=0.125 2023-11-26 04:44:22,112 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3225093.3333333335, ans=0.125 2023-11-26 04:44:30,955 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.24 vs. limit=22.5 2023-11-26 04:44:31,683 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3225160.0, ans=0.1 2023-11-26 04:44:41,148 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.98 vs. limit=22.5 2023-11-26 04:44:48,438 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.54 vs. limit=5.0 2023-11-26 04:44:55,140 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 483800 2023-11-26 04:45:01,822 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 2850, loss[loss=0.08653, simple_loss=0.1202, pruned_loss=0.01941, audio_tagging_loss=0.007026, over 14132.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.08979, pruned_loss=0.01241, audio_tagging_loss=0.008722, over 3043929.30 frames. ], batch size: 52, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:45:03,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3225360.0, ans=0.0 2023-11-26 04:45:07,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3225360.0, ans=0.0 2023-11-26 04:45:32,003 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3225493.3333333335, ans=0.0 2023-11-26 04:45:36,990 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.154e+01 8.846e+01 9.347e+01 1.008e+02 1.244e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-26 04:45:40,788 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.91 vs. limit=6.0 2023-11-26 04:45:50,791 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 483850 2023-11-26 04:45:53,044 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3225626.6666666665, ans=0.0 2023-11-26 04:45:57,146 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 2900, loss[loss=0.05883, simple_loss=0.07897, pruned_loss=0.009626, audio_tagging_loss=0.009725, over 15534.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.0901, pruned_loss=0.01253, audio_tagging_loss=0.008823, over 3046274.15 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:46:12,343 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3225760.0, ans=0.0 2023-11-26 04:46:16,787 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.96 vs. limit=15.0 2023-11-26 04:46:28,266 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3225826.6666666665, ans=0.0 2023-11-26 04:46:46,743 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 483900 2023-11-26 04:46:52,985 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 2950, loss[loss=0.05083, simple_loss=0.07073, pruned_loss=0.006674, audio_tagging_loss=0.008795, over 14635.00 frames. ], tot_loss[loss=0.06677, simple_loss=0.0905, pruned_loss=0.01261, audio_tagging_loss=0.008909, over 3043623.22 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:47:05,376 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3226093.3333333335, ans=0.125 2023-11-26 04:47:10,173 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.89 vs. limit=15.0 2023-11-26 04:47:12,732 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3226093.3333333335, ans=0.0 2023-11-26 04:47:21,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3226160.0, ans=0.125 2023-11-26 04:47:27,902 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.443e+01 8.828e+01 9.406e+01 1.023e+02 1.338e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-26 04:47:42,877 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 483950 2023-11-26 04:47:43,525 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.25 vs. limit=15.0 2023-11-26 04:47:45,762 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.58 vs. limit=8.0 2023-11-26 04:47:49,767 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 3000, loss[loss=0.07759, simple_loss=0.09987, pruned_loss=0.01681, audio_tagging_loss=0.01085, over 15159.00 frames. ], tot_loss[loss=0.0669, simple_loss=0.09063, pruned_loss=0.01261, audio_tagging_loss=0.00897, over 3047486.17 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:47:49,768 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-26 04:48:14,497 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([1.7912, 3.0341, 2.8132, 2.7489, 3.4423, 3.3441, 3.2438, 3.6230], device='cuda:3') 2023-11-26 04:48:22,227 INFO [train_asr.py:1267] (3/4) Epoch 41, validation: loss=0.05755, simple_loss=0.05064, pruned_loss=0.005227, audio_tagging_loss=0.02701, over 4681554.00 frames. 2023-11-26 04:48:22,228 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-26 04:48:52,517 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3226493.3333333335, ans=0.0 2023-11-26 04:49:11,263 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 484000 2023-11-26 04:49:11,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3226626.6666666665, ans=0.125 2023-11-26 04:49:20,475 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 3050, loss[loss=0.06236, simple_loss=0.08099, pruned_loss=0.01299, audio_tagging_loss=0.008867, over 15120.00 frames. ], tot_loss[loss=0.06706, simple_loss=0.09079, pruned_loss=0.0127, audio_tagging_loss=0.008965, over 3047583.06 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:49:33,560 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 04:49:52,011 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 04:49:54,330 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3226893.3333333335, ans=0.125 2023-11-26 04:49:55,239 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.335e+01 8.733e+01 9.255e+01 1.004e+02 1.259e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-26 04:49:56,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3226893.3333333335, ans=0.1 2023-11-26 04:49:58,808 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3226893.3333333335, ans=0.125 2023-11-26 04:50:09,337 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3226960.0, ans=0.0 2023-11-26 04:50:10,305 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 484050 2023-11-26 04:50:13,178 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3226960.0, ans=0.0 2023-11-26 04:50:17,073 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 3100, loss[loss=0.04945, simple_loss=0.06118, pruned_loss=0.00701, audio_tagging_loss=0.01185, over 14878.00 frames. ], tot_loss[loss=0.06732, simple_loss=0.0913, pruned_loss=0.01272, audio_tagging_loss=0.008949, over 3055642.43 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:50:22,611 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3227026.6666666665, ans=0.95 2023-11-26 04:50:30,056 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3227093.3333333335, ans=0.09899494936611666 2023-11-26 04:50:57,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3227226.6666666665, ans=0.125 2023-11-26 04:50:58,798 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=3227226.6666666665, ans=0.02 2023-11-26 04:51:06,108 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 484100 2023-11-26 04:51:11,839 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.07 vs. limit=6.0 2023-11-26 04:51:12,460 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 3150, loss[loss=0.0856, simple_loss=0.1255, pruned_loss=0.01626, audio_tagging_loss=0.00658, over 14847.00 frames. ], tot_loss[loss=0.06748, simple_loss=0.09147, pruned_loss=0.01269, audio_tagging_loss=0.009054, over 3056529.99 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 04:51:31,948 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3227426.6666666665, ans=0.05 2023-11-26 04:51:38,779 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3227493.3333333335, ans=0.1 2023-11-26 04:51:48,598 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.901e+01 8.686e+01 9.278e+01 1.012e+02 1.304e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-26 04:51:50,943 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3227560.0, ans=0.2 2023-11-26 04:51:56,177 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3227626.6666666665, ans=0.1 2023-11-26 04:52:01,920 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 484150 2023-11-26 04:52:06,319 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3227626.6666666665, ans=0.125 2023-11-26 04:52:06,329 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3227626.6666666665, ans=0.1 2023-11-26 04:52:08,290 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 3200, loss[loss=0.07616, simple_loss=0.09913, pruned_loss=0.01792, audio_tagging_loss=0.008675, over 16222.00 frames. ], tot_loss[loss=0.06762, simple_loss=0.09173, pruned_loss=0.01276, audio_tagging_loss=0.008994, over 3056670.68 frames. ], batch size: 62, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:52:10,567 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.52 vs. limit=15.0 2023-11-26 04:52:14,256 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3227693.3333333335, ans=0.125 2023-11-26 04:52:27,456 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.15 vs. limit=12.0 2023-11-26 04:52:29,315 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3227760.0, ans=0.125 2023-11-26 04:52:46,182 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.67 vs. limit=15.0 2023-11-26 04:52:47,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3227893.3333333335, ans=0.035 2023-11-26 04:52:51,067 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3227893.3333333335, ans=0.125 2023-11-26 04:52:57,679 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 484200 2023-11-26 04:53:01,857 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3227960.0, ans=0.125 2023-11-26 04:53:04,801 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 3250, loss[loss=0.07826, simple_loss=0.1028, pruned_loss=0.01717, audio_tagging_loss=0.009698, over 15250.00 frames. ], tot_loss[loss=0.06682, simple_loss=0.09018, pruned_loss=0.01259, audio_tagging_loss=0.009143, over 3054056.80 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:53:12,951 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3228026.6666666665, ans=0.0 2023-11-26 04:53:37,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3228226.6666666665, ans=0.0 2023-11-26 04:53:40,491 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.780e+01 8.676e+01 9.295e+01 9.800e+01 1.223e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-26 04:53:40,672 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3228226.6666666665, ans=0.2 2023-11-26 04:53:51,173 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.93 vs. limit=15.0 2023-11-26 04:53:53,397 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3228293.3333333335, ans=0.125 2023-11-26 04:53:54,367 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 484250 2023-11-26 04:53:59,846 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3228360.0, ans=0.0 2023-11-26 04:54:00,665 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 3300, loss[loss=0.07156, simple_loss=0.09824, pruned_loss=0.01344, audio_tagging_loss=0.009003, over 15374.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.08957, pruned_loss=0.0126, audio_tagging_loss=0.009268, over 3053683.13 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:54:22,176 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3228493.3333333335, ans=0.1 2023-11-26 04:54:23,266 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3228493.3333333335, ans=0.1 2023-11-26 04:54:34,621 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.95 vs. limit=22.5 2023-11-26 04:54:43,453 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3228560.0, ans=0.0 2023-11-26 04:54:50,302 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 484300 2023-11-26 04:54:53,567 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3228626.6666666665, ans=0.05 2023-11-26 04:54:56,667 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 3350, loss[loss=0.07889, simple_loss=0.1009, pruned_loss=0.01818, audio_tagging_loss=0.01028, over 16214.00 frames. ], tot_loss[loss=0.067, simple_loss=0.09034, pruned_loss=0.0127, audio_tagging_loss=0.009125, over 3050155.97 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:55:13,716 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3228760.0, ans=0.125 2023-11-26 04:55:19,054 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3228826.6666666665, ans=0.0 2023-11-26 04:55:31,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3228893.3333333335, ans=0.0 2023-11-26 04:55:32,662 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.567e+01 8.894e+01 9.635e+01 1.028e+02 1.225e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-26 04:55:46,121 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 484350 2023-11-26 04:55:50,955 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3228960.0, ans=0.2 2023-11-26 04:55:52,833 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 3400, loss[loss=0.07449, simple_loss=0.1055, pruned_loss=0.01323, audio_tagging_loss=0.008526, over 13686.00 frames. ], tot_loss[loss=0.06717, simple_loss=0.09086, pruned_loss=0.01285, audio_tagging_loss=0.008893, over 3050438.39 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:56:05,311 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3229093.3333333335, ans=0.0 2023-11-26 04:56:41,947 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 484400 2023-11-26 04:56:49,093 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 3450, loss[loss=0.05742, simple_loss=0.07465, pruned_loss=0.009853, audio_tagging_loss=0.01024, over 14658.00 frames. ], tot_loss[loss=0.06716, simple_loss=0.09091, pruned_loss=0.0129, audio_tagging_loss=0.008809, over 3046673.81 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:57:14,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3229493.3333333335, ans=0.1 2023-11-26 04:57:21,207 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.17 vs. limit=15.0 2023-11-26 04:57:24,102 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3229560.0, ans=0.2 2023-11-26 04:57:24,870 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.793e+01 8.632e+01 9.209e+01 1.007e+02 1.265e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-26 04:57:28,271 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3229560.0, ans=0.125 2023-11-26 04:57:38,929 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 484450 2023-11-26 04:57:45,202 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 3500, loss[loss=0.07322, simple_loss=0.1016, pruned_loss=0.01496, audio_tagging_loss=0.007461, over 14565.00 frames. ], tot_loss[loss=0.06682, simple_loss=0.09073, pruned_loss=0.01272, audio_tagging_loss=0.008735, over 3044016.60 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:58:08,716 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3229826.6666666665, ans=0.125 2023-11-26 04:58:12,777 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 04:58:12,958 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3229826.6666666665, ans=0.0 2023-11-26 04:58:15,159 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=3229826.6666666665, ans=0.02 2023-11-26 04:58:18,921 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3229893.3333333335, ans=0.1 2023-11-26 04:58:19,920 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3229893.3333333335, ans=0.0 2023-11-26 04:58:25,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3229893.3333333335, ans=0.2 2023-11-26 04:58:26,877 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3229893.3333333335, ans=0.125 2023-11-26 04:58:27,997 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3229893.3333333335, ans=0.2 2023-11-26 04:58:29,101 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3229960.0, ans=0.0 2023-11-26 04:58:34,254 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.58 vs. limit=15.0 2023-11-26 04:58:34,889 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 484500 2023-11-26 04:58:38,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3229960.0, ans=0.0 2023-11-26 04:58:41,683 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 3550, loss[loss=0.06035, simple_loss=0.0806, pruned_loss=0.01, audio_tagging_loss=0.01005, over 15527.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.09005, pruned_loss=0.01262, audio_tagging_loss=0.00874, over 3041744.18 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 04:58:42,969 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3230026.6666666665, ans=0.1 2023-11-26 04:58:58,869 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3230093.3333333335, ans=0.5 2023-11-26 04:59:00,942 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.72 vs. limit=22.5 2023-11-26 04:59:13,868 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3230226.6666666665, ans=0.0 2023-11-26 04:59:14,064 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.45 vs. limit=15.0 2023-11-26 04:59:17,991 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.56 vs. limit=15.0 2023-11-26 04:59:18,488 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.656e+01 8.467e+01 9.253e+01 9.852e+01 1.320e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-26 04:59:28,825 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3230293.3333333335, ans=0.125 2023-11-26 04:59:28,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3230293.3333333335, ans=0.1 2023-11-26 04:59:30,882 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 484550 2023-11-26 04:59:37,120 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 3600, loss[loss=0.06995, simple_loss=0.09546, pruned_loss=0.01195, audio_tagging_loss=0.01028, over 15077.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.08929, pruned_loss=0.01258, audio_tagging_loss=0.008704, over 3037383.78 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 04:59:55,453 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3230426.6666666665, ans=0.2 2023-11-26 05:00:05,998 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=3230493.3333333335, ans=0.025 2023-11-26 05:00:08,539 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.74 vs. limit=12.0 2023-11-26 05:00:25,917 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 484600 2023-11-26 05:00:29,221 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.11 vs. limit=6.0 2023-11-26 05:00:32,959 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 3650, loss[loss=0.06125, simple_loss=0.07938, pruned_loss=0.01253, audio_tagging_loss=0.009031, over 15271.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.08981, pruned_loss=0.0126, audio_tagging_loss=0.008597, over 3034209.93 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:00:34,357 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3230693.3333333335, ans=0.1 2023-11-26 05:00:45,405 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3230760.0, ans=0.1 2023-11-26 05:00:58,686 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3230826.6666666665, ans=0.025 2023-11-26 05:01:08,864 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.138e+01 8.921e+01 9.497e+01 1.030e+02 1.167e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-26 05:01:21,829 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 484650 2023-11-26 05:01:28,668 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 3700, loss[loss=0.04649, simple_loss=0.05854, pruned_loss=0.00614, audio_tagging_loss=0.01108, over 15749.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.09035, pruned_loss=0.01268, audio_tagging_loss=0.008583, over 3042806.71 frames. ], batch size: 61, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:01:33,669 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3231026.6666666665, ans=0.0 2023-11-26 05:01:33,686 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3231026.6666666665, ans=0.2 2023-11-26 05:01:44,312 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3231093.3333333335, ans=0.0 2023-11-26 05:01:48,640 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3231093.3333333335, ans=0.5 2023-11-26 05:02:17,231 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3231293.3333333335, ans=0.0 2023-11-26 05:02:18,218 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 484700 2023-11-26 05:02:18,386 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3231293.3333333335, ans=0.125 2023-11-26 05:02:24,649 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 3750, loss[loss=0.0825, simple_loss=0.1138, pruned_loss=0.01943, audio_tagging_loss=0.006193, over 15484.00 frames. ], tot_loss[loss=0.06694, simple_loss=0.0909, pruned_loss=0.01289, audio_tagging_loss=0.008605, over 3045124.46 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:02:27,008 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3231360.0, ans=0.125 2023-11-26 05:02:50,905 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3231493.3333333335, ans=0.0 2023-11-26 05:02:57,706 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3231560.0, ans=0.1 2023-11-26 05:02:59,328 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.55 vs. limit=15.0 2023-11-26 05:03:01,944 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3231560.0, ans=0.125 2023-11-26 05:03:02,812 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.701e+01 8.844e+01 9.429e+01 1.038e+02 1.452e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-26 05:03:02,863 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 05:03:04,082 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3231560.0, ans=0.125 2023-11-26 05:03:04,590 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.25 vs. limit=22.5 2023-11-26 05:03:13,393 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 484750 2023-11-26 05:03:20,217 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 3800, loss[loss=0.06449, simple_loss=0.08792, pruned_loss=0.01161, audio_tagging_loss=0.008914, over 14768.00 frames. ], tot_loss[loss=0.06682, simple_loss=0.09055, pruned_loss=0.01285, audio_tagging_loss=0.008697, over 3047762.30 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:03:27,220 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3231693.3333333335, ans=0.125 2023-11-26 05:03:28,330 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3231693.3333333335, ans=0.1 2023-11-26 05:03:59,044 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3231893.3333333335, ans=0.0 2023-11-26 05:04:08,680 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3231960.0, ans=0.2 2023-11-26 05:04:09,527 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 484800 2023-11-26 05:04:10,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3231960.0, ans=0.125 2023-11-26 05:04:16,584 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 3850, loss[loss=0.07924, simple_loss=0.1119, pruned_loss=0.01493, audio_tagging_loss=0.008352, over 15865.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.09011, pruned_loss=0.01261, audio_tagging_loss=0.008748, over 3050036.95 frames. ], batch size: 60, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:04:20,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3232026.6666666665, ans=0.1 2023-11-26 05:04:35,315 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3232093.3333333335, ans=0.1 2023-11-26 05:04:44,424 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3232160.0, ans=0.125 2023-11-26 05:04:45,381 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3232160.0, ans=0.0 2023-11-26 05:04:52,279 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3232226.6666666665, ans=0.0 2023-11-26 05:04:54,117 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.559e+01 8.842e+01 9.367e+01 1.019e+02 1.484e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-26 05:05:05,897 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 484850 2023-11-26 05:05:11,425 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3232360.0, ans=0.1 2023-11-26 05:05:12,185 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 3900, loss[loss=0.05501, simple_loss=0.06362, pruned_loss=0.01085, audio_tagging_loss=0.01235, over 14434.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.08959, pruned_loss=0.01255, audio_tagging_loss=0.008875, over 3047890.01 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:05:24,277 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3232426.6666666665, ans=0.125 2023-11-26 05:05:25,387 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3232426.6666666665, ans=0.125 2023-11-26 05:05:37,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3232493.3333333335, ans=0.2 2023-11-26 05:05:44,421 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.34 vs. limit=15.0 2023-11-26 05:05:53,174 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3232560.0, ans=0.125 2023-11-26 05:06:01,423 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 484900 2023-11-26 05:06:07,640 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 3950, loss[loss=0.07057, simple_loss=0.09077, pruned_loss=0.01589, audio_tagging_loss=0.009296, over 15746.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.08897, pruned_loss=0.01242, audio_tagging_loss=0.008897, over 3051367.54 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:06:10,537 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3232693.3333333335, ans=0.0 2023-11-26 05:06:15,667 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3232693.3333333335, ans=0.1 2023-11-26 05:06:16,752 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3232693.3333333335, ans=0.1 2023-11-26 05:06:42,070 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3232893.3333333335, ans=0.125 2023-11-26 05:06:45,964 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.408e+01 8.918e+01 9.453e+01 1.012e+02 1.260e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-26 05:06:47,785 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.69 vs. limit=15.0 2023-11-26 05:06:53,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3232960.0, ans=0.0 2023-11-26 05:06:57,173 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 484950 2023-11-26 05:07:04,060 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 4000, loss[loss=0.06768, simple_loss=0.08975, pruned_loss=0.01377, audio_tagging_loss=0.009041, over 15932.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.09005, pruned_loss=0.01271, audio_tagging_loss=0.008791, over 3040805.67 frames. ], batch size: 62, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:07:17,162 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3233093.3333333335, ans=0.0 2023-11-26 05:07:44,382 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3233226.6666666665, ans=0.125 2023-11-26 05:07:54,547 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 485000 2023-11-26 05:08:01,193 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 4050, loss[loss=0.106, simple_loss=0.1352, pruned_loss=0.03169, audio_tagging_loss=0.006704, over 16004.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.08999, pruned_loss=0.01276, audio_tagging_loss=0.008889, over 3038303.84 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:08:02,574 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3233360.0, ans=0.0 2023-11-26 05:08:03,380 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 05:08:03,845 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.01 vs. limit=15.0 2023-11-26 05:08:07,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3233360.0, ans=0.0 2023-11-26 05:08:15,233 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3233426.6666666665, ans=0.0 2023-11-26 05:08:38,910 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.299e+01 8.932e+01 9.380e+01 1.022e+02 1.358e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-26 05:08:41,608 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=3233560.0, ans=15.0 2023-11-26 05:08:46,696 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3233626.6666666665, ans=0.0 2023-11-26 05:08:49,768 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 485050 2023-11-26 05:08:49,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3233626.6666666665, ans=0.1 2023-11-26 05:08:56,102 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 4100, loss[loss=0.0893, simple_loss=0.1302, pruned_loss=0.0163, audio_tagging_loss=0.007917, over 15190.00 frames. ], tot_loss[loss=0.06705, simple_loss=0.09072, pruned_loss=0.01285, audio_tagging_loss=0.008843, over 3045855.48 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:09:06,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3233760.0, ans=0.125 2023-11-26 05:09:36,841 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.64 vs. limit=12.0 2023-11-26 05:09:45,638 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 485100 2023-11-26 05:09:51,892 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 4150, loss[loss=0.08933, simple_loss=0.1184, pruned_loss=0.02314, audio_tagging_loss=0.007002, over 14814.00 frames. ], tot_loss[loss=0.06677, simple_loss=0.09062, pruned_loss=0.01271, audio_tagging_loss=0.008748, over 3044588.34 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:10:10,659 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3234093.3333333335, ans=0.125 2023-11-26 05:10:11,649 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3234093.3333333335, ans=0.0 2023-11-26 05:10:17,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3234160.0, ans=0.1 2023-11-26 05:10:30,097 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.245e+01 8.624e+01 9.472e+01 1.019e+02 1.478e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-26 05:10:32,251 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 05:10:41,379 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 485150 2023-11-26 05:10:45,030 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3234293.3333333335, ans=0.015 2023-11-26 05:10:46,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3234293.3333333335, ans=0.125 2023-11-26 05:10:48,105 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 4200, loss[loss=0.0757, simple_loss=0.1082, pruned_loss=0.01403, audio_tagging_loss=0.007559, over 14581.00 frames. ], tot_loss[loss=0.06697, simple_loss=0.09078, pruned_loss=0.01284, audio_tagging_loss=0.008744, over 3040834.01 frames. ], batch size: 53, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:10:57,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3234426.6666666665, ans=0.05 2023-11-26 05:11:04,277 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3234426.6666666665, ans=0.1 2023-11-26 05:11:08,066 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3234426.6666666665, ans=0.125 2023-11-26 05:11:17,050 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.23 vs. limit=15.0 2023-11-26 05:11:37,144 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 485200 2023-11-26 05:11:37,306 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3234626.6666666665, ans=0.125 2023-11-26 05:11:43,693 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 4250, loss[loss=0.06835, simple_loss=0.08453, pruned_loss=0.01495, audio_tagging_loss=0.01114, over 15523.00 frames. ], tot_loss[loss=0.06685, simple_loss=0.09081, pruned_loss=0.01279, audio_tagging_loss=0.008656, over 3041084.19 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:11:44,931 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3234693.3333333335, ans=0.125 2023-11-26 05:11:46,321 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.90 vs. limit=22.5 2023-11-26 05:12:16,758 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 05:12:21,682 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.997e+01 8.763e+01 9.377e+01 1.004e+02 4.197e+02, threshold=1.875e+02, percent-clipped=1.0 2023-11-26 05:12:24,036 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3234893.3333333335, ans=0.05 2023-11-26 05:12:30,470 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3234960.0, ans=0.1 2023-11-26 05:12:33,021 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 485250 2023-11-26 05:12:39,365 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 4300, loss[loss=0.08617, simple_loss=0.1168, pruned_loss=0.02303, audio_tagging_loss=0.004729, over 14734.00 frames. ], tot_loss[loss=0.06775, simple_loss=0.09223, pruned_loss=0.01308, audio_tagging_loss=0.008556, over 3042396.41 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:12:42,934 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 05:12:48,801 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 05:12:53,677 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3235093.3333333335, ans=0.1 2023-11-26 05:12:57,295 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff2.min_abs, batch_count=3235093.3333333335, ans=0.1 2023-11-26 05:13:05,913 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3235160.0, ans=0.1 2023-11-26 05:13:10,650 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.16 vs. limit=15.0 2023-11-26 05:13:13,197 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 05:13:28,996 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 485300 2023-11-26 05:13:35,800 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 4350, loss[loss=0.06115, simple_loss=0.08423, pruned_loss=0.01097, audio_tagging_loss=0.008057, over 15110.00 frames. ], tot_loss[loss=0.06791, simple_loss=0.09252, pruned_loss=0.01316, audio_tagging_loss=0.008483, over 3045873.46 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:14:11,931 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.70 vs. limit=15.0 2023-11-26 05:14:14,583 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.430e+01 8.639e+01 9.414e+01 1.000e+02 1.262e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-26 05:14:25,144 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 485350 2023-11-26 05:14:25,233 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3235626.6666666665, ans=0.1 2023-11-26 05:14:31,452 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 4400, loss[loss=0.05976, simple_loss=0.08579, pruned_loss=0.00731, audio_tagging_loss=0.009559, over 16224.00 frames. ], tot_loss[loss=0.06727, simple_loss=0.09164, pruned_loss=0.01297, audio_tagging_loss=0.008477, over 3050935.32 frames. ], batch size: 62, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:14:32,797 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3235693.3333333335, ans=0.125 2023-11-26 05:14:33,740 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3235693.3333333335, ans=0.125 2023-11-26 05:14:42,628 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3235760.0, ans=0.125 2023-11-26 05:14:45,880 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3235760.0, ans=0.1 2023-11-26 05:14:50,127 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3235760.0, ans=0.125 2023-11-26 05:14:54,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3235826.6666666665, ans=0.0 2023-11-26 05:15:19,906 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 485400 2023-11-26 05:15:27,056 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 4450, loss[loss=0.06702, simple_loss=0.09142, pruned_loss=0.01212, audio_tagging_loss=0.009186, over 15355.00 frames. ], tot_loss[loss=0.06749, simple_loss=0.0921, pruned_loss=0.01291, audio_tagging_loss=0.008528, over 3055388.83 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:15:43,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=3236093.3333333335, ans=15.0 2023-11-26 05:15:44,249 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3236093.3333333335, ans=0.1 2023-11-26 05:16:06,362 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.168e+01 8.912e+01 9.547e+01 1.021e+02 1.319e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-26 05:16:08,784 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3236226.6666666665, ans=0.0 2023-11-26 05:16:11,927 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3236293.3333333335, ans=0.125 2023-11-26 05:16:13,060 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3236293.3333333335, ans=0.125 2023-11-26 05:16:16,077 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 485450 2023-11-26 05:16:23,545 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 4500, loss[loss=0.05849, simple_loss=0.08179, pruned_loss=0.009113, audio_tagging_loss=0.008481, over 15861.00 frames. ], tot_loss[loss=0.06732, simple_loss=0.09201, pruned_loss=0.0128, audio_tagging_loss=0.008517, over 3057650.78 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:17:02,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3236560.0, ans=0.125 2023-11-26 05:17:08,461 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3236626.6666666665, ans=0.0 2023-11-26 05:17:11,614 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3236626.6666666665, ans=0.07 2023-11-26 05:17:12,569 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 485500 2023-11-26 05:17:15,235 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.73 vs. limit=15.0 2023-11-26 05:17:19,340 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 4550, loss[loss=0.07499, simple_loss=0.1069, pruned_loss=0.01446, audio_tagging_loss=0.007102, over 15242.00 frames. ], tot_loss[loss=0.06703, simple_loss=0.09145, pruned_loss=0.01274, audio_tagging_loss=0.008566, over 3049740.88 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:17:20,540 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3236693.3333333335, ans=0.125 2023-11-26 05:17:21,651 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3236693.3333333335, ans=0.0 2023-11-26 05:17:23,724 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3236693.3333333335, ans=0.1 2023-11-26 05:17:34,830 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3236760.0, ans=0.0 2023-11-26 05:17:36,906 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3236760.0, ans=0.125 2023-11-26 05:17:40,258 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3236826.6666666665, ans=0.0 2023-11-26 05:17:40,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3236826.6666666665, ans=0.125 2023-11-26 05:17:42,852 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3236826.6666666665, ans=0.1 2023-11-26 05:17:57,819 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.811e+01 8.528e+01 9.112e+01 9.671e+01 1.236e+02, threshold=1.822e+02, percent-clipped=0.0 2023-11-26 05:17:58,394 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.93 vs. limit=15.0 2023-11-26 05:18:00,022 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 05:18:04,022 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3236960.0, ans=0.035 2023-11-26 05:18:08,207 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 485550 2023-11-26 05:18:09,778 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.08 vs. limit=15.0 2023-11-26 05:18:15,112 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 4600, loss[loss=0.06881, simple_loss=0.08628, pruned_loss=0.01787, audio_tagging_loss=0.007798, over 15395.00 frames. ], tot_loss[loss=0.06715, simple_loss=0.09133, pruned_loss=0.01281, audio_tagging_loss=0.008674, over 3052284.10 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:18:22,798 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3237026.6666666665, ans=0.05 2023-11-26 05:19:03,804 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 485600 2023-11-26 05:19:10,778 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 4650, loss[loss=0.06419, simple_loss=0.08683, pruned_loss=0.01334, audio_tagging_loss=0.007436, over 16375.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09079, pruned_loss=0.0127, audio_tagging_loss=0.008769, over 3050763.41 frames. ], batch size: 61, lr: 1.65e-03, grad_scale: 8.0 2023-11-26 05:19:13,602 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 05:19:40,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3237493.3333333335, ans=0.0 2023-11-26 05:19:43,175 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3237560.0, ans=0.0 2023-11-26 05:19:46,303 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3237560.0, ans=0.0 2023-11-26 05:19:51,901 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.215e+01 8.706e+01 9.399e+01 1.022e+02 1.601e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-26 05:19:52,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3237560.0, ans=0.0 2023-11-26 05:19:59,969 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 485650 2023-11-26 05:20:06,205 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 4700, loss[loss=0.08155, simple_loss=0.1095, pruned_loss=0.01815, audio_tagging_loss=0.008636, over 15312.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.08981, pruned_loss=0.01266, audio_tagging_loss=0.008978, over 3046646.95 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 8.0 2023-11-26 05:20:09,482 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3237693.3333333335, ans=0.125 2023-11-26 05:20:13,180 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3237693.3333333335, ans=0.1 2023-11-26 05:20:25,466 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3237760.0, ans=0.125 2023-11-26 05:20:28,785 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.91 vs. limit=22.5 2023-11-26 05:20:33,547 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.whiten.whitening_limit, batch_count=3237826.6666666665, ans=12.0 2023-11-26 05:20:46,379 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 05:20:49,863 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.28 vs. limit=6.0 2023-11-26 05:20:49,892 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.96 vs. limit=22.5 2023-11-26 05:20:54,773 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 485700 2023-11-26 05:21:02,120 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 4750, loss[loss=0.06955, simple_loss=0.08889, pruned_loss=0.01676, audio_tagging_loss=0.008351, over 15315.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.08887, pruned_loss=0.01256, audio_tagging_loss=0.009053, over 3042397.77 frames. ], batch size: 60, lr: 1.65e-03, grad_scale: 8.0 2023-11-26 05:21:06,582 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3238026.6666666665, ans=0.125 2023-11-26 05:21:22,349 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3238093.3333333335, ans=0.125 2023-11-26 05:21:23,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3238160.0, ans=0.1 2023-11-26 05:21:42,924 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.508e+01 8.672e+01 9.207e+01 9.886e+01 1.229e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-26 05:21:50,944 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 485750 2023-11-26 05:21:57,724 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 4800, loss[loss=0.07757, simple_loss=0.111, pruned_loss=0.01527, audio_tagging_loss=0.006794, over 16480.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.08897, pruned_loss=0.01253, audio_tagging_loss=0.009096, over 3045084.65 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:22:13,422 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3238426.6666666665, ans=0.95 2023-11-26 05:22:20,713 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3238493.3333333335, ans=0.0 2023-11-26 05:22:42,351 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3238626.6666666665, ans=0.125 2023-11-26 05:22:46,904 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 485800 2023-11-26 05:22:51,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3238626.6666666665, ans=0.2 2023-11-26 05:22:53,460 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 4850, loss[loss=0.05915, simple_loss=0.07784, pruned_loss=0.009609, audio_tagging_loss=0.01062, over 14287.00 frames. ], tot_loss[loss=0.06667, simple_loss=0.08986, pruned_loss=0.01268, audio_tagging_loss=0.009057, over 3041031.70 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 8.0 2023-11-26 05:22:56,851 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 05:23:20,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3238826.6666666665, ans=0.0 2023-11-26 05:23:35,553 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.989e+01 8.610e+01 9.289e+01 1.009e+02 1.598e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-26 05:23:39,219 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.76 vs. limit=15.0 2023-11-26 05:23:41,944 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 485850 2023-11-26 05:23:42,438 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.60 vs. limit=15.0 2023-11-26 05:23:43,255 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3238960.0, ans=0.125 2023-11-26 05:23:48,250 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 4900, loss[loss=0.04556, simple_loss=0.05672, pruned_loss=0.007535, audio_tagging_loss=0.009669, over 14222.00 frames. ], tot_loss[loss=0.06695, simple_loss=0.0906, pruned_loss=0.01268, audio_tagging_loss=0.008974, over 3039174.03 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 8.0 2023-11-26 05:24:00,535 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3239093.3333333335, ans=0.95 2023-11-26 05:24:03,221 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3239093.3333333335, ans=0.0 2023-11-26 05:24:09,054 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 05:24:14,621 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.04 vs. limit=15.0 2023-11-26 05:24:17,569 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3239160.0, ans=0.125 2023-11-26 05:24:18,780 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.85 vs. limit=12.0 2023-11-26 05:24:20,610 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3239226.6666666665, ans=0.125 2023-11-26 05:24:23,797 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3239226.6666666665, ans=0.0 2023-11-26 05:24:37,350 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 485900 2023-11-26 05:24:41,209 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.09 vs. limit=15.0 2023-11-26 05:24:43,574 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 4950, loss[loss=0.07981, simple_loss=0.1066, pruned_loss=0.01675, audio_tagging_loss=0.009751, over 13166.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.08997, pruned_loss=0.01254, audio_tagging_loss=0.008877, over 3034093.93 frames. ], batch size: 52, lr: 1.65e-03, grad_scale: 8.0 2023-11-26 05:25:06,961 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3239493.3333333335, ans=0.125 2023-11-26 05:25:08,292 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.12 vs. limit=22.5 2023-11-26 05:25:15,929 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3239560.0, ans=0.1 2023-11-26 05:25:18,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3239560.0, ans=0.125 2023-11-26 05:25:18,886 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.00 vs. limit=15.0 2023-11-26 05:25:22,880 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3239560.0, ans=0.95 2023-11-26 05:25:25,805 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.598e+01 8.689e+01 9.233e+01 9.794e+01 1.211e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-26 05:25:33,269 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 485950 2023-11-26 05:25:35,872 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.03 vs. limit=15.0 2023-11-26 05:25:39,614 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 5000, loss[loss=0.05904, simple_loss=0.08234, pruned_loss=0.007788, audio_tagging_loss=0.01008, over 15087.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.08987, pruned_loss=0.01249, audio_tagging_loss=0.008822, over 3035880.97 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 8.0 2023-11-26 05:25:46,107 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3239693.3333333335, ans=0.2 2023-11-26 05:26:19,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3239893.3333333335, ans=0.2 2023-11-26 05:26:28,174 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 486000 2023-11-26 05:26:34,234 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.11 vs. limit=15.0 2023-11-26 05:26:34,640 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 5050, loss[loss=0.05181, simple_loss=0.06414, pruned_loss=0.01086, audio_tagging_loss=0.008877, over 15695.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08921, pruned_loss=0.0124, audio_tagging_loss=0.00891, over 3041320.40 frames. ], batch size: 61, lr: 1.65e-03, grad_scale: 8.0 2023-11-26 05:26:34,819 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3240026.6666666665, ans=0.125 2023-11-26 05:26:35,906 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3240026.6666666665, ans=0.125 2023-11-26 05:26:37,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3240026.6666666665, ans=0.125 2023-11-26 05:26:39,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3240026.6666666665, ans=0.125 2023-11-26 05:26:44,388 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3240026.6666666665, ans=0.0 2023-11-26 05:27:01,469 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.11 vs. limit=15.0 2023-11-26 05:27:16,763 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.060e+01 8.723e+01 9.210e+01 1.029e+02 1.181e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-26 05:27:23,672 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 486050 2023-11-26 05:27:30,370 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 5100, loss[loss=0.07122, simple_loss=0.1018, pruned_loss=0.01239, audio_tagging_loss=0.007941, over 15366.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.08914, pruned_loss=0.01241, audio_tagging_loss=0.008897, over 3034717.76 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 8.0 2023-11-26 05:27:43,185 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3240426.6666666665, ans=0.0 2023-11-26 05:27:46,822 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.77 vs. limit=15.0 2023-11-26 05:28:09,246 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.37 vs. limit=15.0 2023-11-26 05:28:11,684 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3240560.0, ans=0.125 2023-11-26 05:28:19,381 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 486100 2023-11-26 05:28:26,091 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 5150, loss[loss=0.07666, simple_loss=0.1003, pruned_loss=0.01667, audio_tagging_loss=0.009814, over 15500.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.08953, pruned_loss=0.0124, audio_tagging_loss=0.008823, over 3034900.63 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 8.0 2023-11-26 05:28:27,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3240693.3333333335, ans=0.2 2023-11-26 05:28:59,460 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3240893.3333333335, ans=0.0 2023-11-26 05:29:06,445 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3240893.3333333335, ans=0.125 2023-11-26 05:29:08,265 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.744e+01 8.813e+01 9.450e+01 1.017e+02 1.282e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-26 05:29:14,701 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 486150 2023-11-26 05:29:15,903 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3240960.0, ans=0.125 2023-11-26 05:29:21,085 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 5200, loss[loss=0.09008, simple_loss=0.1291, pruned_loss=0.01842, audio_tagging_loss=0.007108, over 15077.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.09006, pruned_loss=0.01255, audio_tagging_loss=0.00876, over 3038529.20 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:29:21,974 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.53 vs. limit=15.0 2023-11-26 05:29:28,160 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3241026.6666666665, ans=0.125 2023-11-26 05:29:35,637 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3241093.3333333335, ans=0.0 2023-11-26 05:29:54,170 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3241226.6666666665, ans=0.05 2023-11-26 05:30:10,290 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 486200 2023-11-26 05:30:16,781 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 5250, loss[loss=0.0589, simple_loss=0.07382, pruned_loss=0.01421, audio_tagging_loss=0.007781, over 14812.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09114, pruned_loss=0.01264, audio_tagging_loss=0.00865, over 3048753.23 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:30:30,273 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3241426.6666666665, ans=0.2 2023-11-26 05:30:35,437 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3241426.6666666665, ans=10.0 2023-11-26 05:30:40,128 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.73 vs. limit=12.0 2023-11-26 05:30:44,874 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3241493.3333333335, ans=0.0 2023-11-26 05:30:47,420 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.39 vs. limit=22.5 2023-11-26 05:30:52,252 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3241560.0, ans=0.1 2023-11-26 05:30:57,329 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.65 vs. limit=15.0 2023-11-26 05:30:58,894 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.332e+01 8.725e+01 9.409e+01 1.008e+02 1.630e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-26 05:31:04,968 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3241626.6666666665, ans=0.0 2023-11-26 05:31:05,909 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 486250 2023-11-26 05:31:13,296 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 5300, loss[loss=0.05602, simple_loss=0.07025, pruned_loss=0.009379, audio_tagging_loss=0.01152, over 15733.00 frames. ], tot_loss[loss=0.06721, simple_loss=0.09159, pruned_loss=0.01281, audio_tagging_loss=0.008608, over 3053072.89 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:31:38,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3241826.6666666665, ans=0.1 2023-11-26 05:31:40,610 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3241826.6666666665, ans=0.0 2023-11-26 05:31:47,517 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3241893.3333333335, ans=0.0 2023-11-26 05:31:49,842 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.04 vs. limit=22.5 2023-11-26 05:32:01,933 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 486300 2023-11-26 05:32:08,104 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 5350, loss[loss=0.04513, simple_loss=0.05536, pruned_loss=0.007148, audio_tagging_loss=0.01031, over 15844.00 frames. ], tot_loss[loss=0.06718, simple_loss=0.09158, pruned_loss=0.01275, audio_tagging_loss=0.008646, over 3045045.38 frames. ], batch size: 62, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:32:18,433 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3242093.3333333335, ans=0.2 2023-11-26 05:32:30,686 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3242160.0, ans=0.2 2023-11-26 05:32:32,804 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 05:32:36,632 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.50 vs. limit=15.0 2023-11-26 05:32:49,969 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.283e+01 8.485e+01 9.147e+01 9.991e+01 1.214e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-26 05:32:56,339 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 486350 2023-11-26 05:33:03,202 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 5400, loss[loss=0.05672, simple_loss=0.07409, pruned_loss=0.01052, audio_tagging_loss=0.009158, over 14940.00 frames. ], tot_loss[loss=0.06782, simple_loss=0.09247, pruned_loss=0.01292, audio_tagging_loss=0.008666, over 3050081.44 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:33:25,006 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3242493.3333333335, ans=0.2 2023-11-26 05:33:25,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3242493.3333333335, ans=0.125 2023-11-26 05:33:39,771 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3242560.0, ans=0.1 2023-11-26 05:33:46,703 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3242626.6666666665, ans=0.0 2023-11-26 05:33:49,755 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3242626.6666666665, ans=0.0 2023-11-26 05:33:51,792 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 486400 2023-11-26 05:33:59,180 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 5450, loss[loss=0.08746, simple_loss=0.106, pruned_loss=0.02261, audio_tagging_loss=0.01183, over 14221.00 frames. ], tot_loss[loss=0.06769, simple_loss=0.0921, pruned_loss=0.01292, audio_tagging_loss=0.008731, over 3050337.79 frames. ], batch size: 53, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:34:16,637 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3242760.0, ans=0.125 2023-11-26 05:34:20,288 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.20 vs. limit=15.0 2023-11-26 05:34:41,185 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.720e+01 8.605e+01 9.179e+01 9.906e+01 1.952e+02, threshold=1.836e+02, percent-clipped=1.0 2023-11-26 05:34:48,117 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 486450 2023-11-26 05:34:53,053 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.68 vs. limit=15.0 2023-11-26 05:34:54,502 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 5500, loss[loss=0.04205, simple_loss=0.0477, pruned_loss=0.007829, audio_tagging_loss=0.01037, over 14937.00 frames. ], tot_loss[loss=0.06795, simple_loss=0.09242, pruned_loss=0.013, audio_tagging_loss=0.008749, over 3051579.95 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:35:08,402 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.28 vs. limit=12.0 2023-11-26 05:35:15,690 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.03 vs. limit=15.0 2023-11-26 05:35:34,888 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.85 vs. limit=15.0 2023-11-26 05:35:40,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3243293.3333333335, ans=0.2 2023-11-26 05:35:42,917 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 486500 2023-11-26 05:35:45,146 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3243293.3333333335, ans=0.0 2023-11-26 05:35:49,808 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 5550, loss[loss=0.06676, simple_loss=0.08603, pruned_loss=0.01496, audio_tagging_loss=0.008787, over 14428.00 frames. ], tot_loss[loss=0.06734, simple_loss=0.09138, pruned_loss=0.01281, audio_tagging_loss=0.008843, over 3050680.16 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:36:05,536 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3243426.6666666665, ans=0.2 2023-11-26 05:36:20,361 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3243493.3333333335, ans=0.2 2023-11-26 05:36:31,393 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3243560.0, ans=0.125 2023-11-26 05:36:32,123 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.529e+01 8.745e+01 9.267e+01 1.002e+02 1.641e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-26 05:36:38,526 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 486550 2023-11-26 05:36:45,267 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 5600, loss[loss=0.1011, simple_loss=0.1441, pruned_loss=0.02184, audio_tagging_loss=0.007189, over 17319.00 frames. ], tot_loss[loss=0.06762, simple_loss=0.09184, pruned_loss=0.01271, audio_tagging_loss=0.008994, over 3042177.16 frames. ], batch size: 63, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:36:46,408 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3243693.3333333335, ans=0.125 2023-11-26 05:36:47,500 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 05:36:47,970 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.47 vs. limit=12.0 2023-11-26 05:36:51,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3243693.3333333335, ans=10.0 2023-11-26 05:37:06,428 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3243826.6666666665, ans=0.2 2023-11-26 05:37:23,716 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 05:37:28,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3243960.0, ans=0.1 2023-11-26 05:37:34,279 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 486600 2023-11-26 05:37:40,743 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 5650, loss[loss=0.07011, simple_loss=0.09393, pruned_loss=0.01452, audio_tagging_loss=0.008633, over 16102.00 frames. ], tot_loss[loss=0.06722, simple_loss=0.09135, pruned_loss=0.01261, audio_tagging_loss=0.008938, over 3046034.84 frames. ], batch size: 61, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:38:08,683 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=3244160.0, ans=0.02 2023-11-26 05:38:23,772 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.303e+01 8.720e+01 9.280e+01 9.877e+01 1.364e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-26 05:38:29,540 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 486650 2023-11-26 05:38:35,196 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.96 vs. limit=10.0 2023-11-26 05:38:35,768 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 5700, loss[loss=0.06152, simple_loss=0.09034, pruned_loss=0.00831, audio_tagging_loss=0.008034, over 14697.00 frames. ], tot_loss[loss=0.0672, simple_loss=0.09118, pruned_loss=0.01266, audio_tagging_loss=0.00895, over 3048389.41 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:38:55,306 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3244426.6666666665, ans=0.025 2023-11-26 05:39:05,757 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3244493.3333333335, ans=0.0 2023-11-26 05:39:24,763 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 486700 2023-11-26 05:39:30,730 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3244693.3333333335, ans=0.125 2023-11-26 05:39:31,502 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 5750, loss[loss=0.07342, simple_loss=0.1016, pruned_loss=0.01444, audio_tagging_loss=0.008177, over 14607.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.09002, pruned_loss=0.01251, audio_tagging_loss=0.008923, over 3043229.93 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:39:44,335 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.81 vs. limit=15.0 2023-11-26 05:39:58,131 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3244826.6666666665, ans=0.2 2023-11-26 05:39:59,167 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3244826.6666666665, ans=0.2 2023-11-26 05:40:15,155 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.73 vs. limit=15.0 2023-11-26 05:40:15,739 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.860e+01 8.613e+01 9.170e+01 1.044e+02 1.478e+02, threshold=1.834e+02, percent-clipped=0.0 2023-11-26 05:40:20,577 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 486750 2023-11-26 05:40:25,125 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.52 vs. limit=10.0 2023-11-26 05:40:26,822 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 5800, loss[loss=0.08676, simple_loss=0.1253, pruned_loss=0.01731, audio_tagging_loss=0.006802, over 15965.00 frames. ], tot_loss[loss=0.06693, simple_loss=0.091, pruned_loss=0.01264, audio_tagging_loss=0.008793, over 3043873.94 frames. ], batch size: 60, lr: 1.65e-03, grad_scale: 8.0 2023-11-26 05:40:46,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3245093.3333333335, ans=0.0 2023-11-26 05:40:47,255 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3245093.3333333335, ans=0.0 2023-11-26 05:41:05,943 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3245226.6666666665, ans=0.125 2023-11-26 05:41:07,135 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3245226.6666666665, ans=0.125 2023-11-26 05:41:13,586 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3245293.3333333335, ans=0.125 2023-11-26 05:41:13,613 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3245293.3333333335, ans=0.2 2023-11-26 05:41:15,442 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 486800 2023-11-26 05:41:21,967 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 5850, loss[loss=0.04153, simple_loss=0.05961, pruned_loss=0.00437, audio_tagging_loss=0.007355, over 14743.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.08987, pruned_loss=0.01232, audio_tagging_loss=0.008765, over 3045178.03 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 8.0 2023-11-26 05:41:42,597 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3245426.6666666665, ans=0.125 2023-11-26 05:41:44,458 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3245493.3333333335, ans=0.125 2023-11-26 05:41:49,617 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3245493.3333333335, ans=0.0 2023-11-26 05:42:02,331 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3245560.0, ans=0.125 2023-11-26 05:42:06,260 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.352e+01 8.540e+01 9.221e+01 1.014e+02 1.317e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-26 05:42:06,446 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3245626.6666666665, ans=0.125 2023-11-26 05:42:11,732 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 486850 2023-11-26 05:42:13,239 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.10 vs. limit=6.0 2023-11-26 05:42:17,883 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 5900, loss[loss=0.08301, simple_loss=0.1223, pruned_loss=0.01536, audio_tagging_loss=0.006504, over 15458.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09096, pruned_loss=0.01235, audio_tagging_loss=0.008716, over 3046044.26 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 8.0 2023-11-26 05:42:42,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3245826.6666666665, ans=0.125 2023-11-26 05:42:53,103 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.16 vs. limit=12.0 2023-11-26 05:43:06,748 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 486900 2023-11-26 05:43:13,582 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 5950, loss[loss=0.06992, simple_loss=0.1016, pruned_loss=0.01488, audio_tagging_loss=0.004249, over 15730.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.0913, pruned_loss=0.01246, audio_tagging_loss=0.008613, over 3052806.82 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 8.0 2023-11-26 05:43:26,549 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3246093.3333333335, ans=0.0 2023-11-26 05:43:44,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3246160.0, ans=0.1 2023-11-26 05:43:45,269 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.88 vs. limit=15.0 2023-11-26 05:43:46,300 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.36 vs. limit=22.5 2023-11-26 05:43:46,363 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.33 vs. limit=15.0 2023-11-26 05:43:57,759 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.996e+01 8.522e+01 9.337e+01 1.020e+02 1.344e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-26 05:44:02,128 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 486950 2023-11-26 05:44:08,294 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 6000, loss[loss=0.07001, simple_loss=0.09564, pruned_loss=0.01437, audio_tagging_loss=0.007823, over 15684.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.09111, pruned_loss=0.01241, audio_tagging_loss=0.008565, over 3044643.16 frames. ], batch size: 60, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:44:08,295 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-26 05:44:40,559 INFO [train_asr.py:1267] (3/4) Epoch 41, validation: loss=0.05752, simple_loss=0.0506, pruned_loss=0.005164, audio_tagging_loss=0.02705, over 4681554.00 frames. 2023-11-26 05:44:40,560 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-26 05:44:45,433 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3246360.0, ans=0.2 2023-11-26 05:44:55,531 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.10 vs. limit=10.0 2023-11-26 05:44:57,093 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3246426.6666666665, ans=0.125 2023-11-26 05:45:14,690 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3246560.0, ans=0.125 2023-11-26 05:45:15,827 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3246560.0, ans=0.0 2023-11-26 05:45:17,961 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3246560.0, ans=0.125 2023-11-26 05:45:20,429 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 05:45:20,549 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3246560.0, ans=0.1 2023-11-26 05:45:22,258 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3246560.0, ans=0.1 2023-11-26 05:45:23,282 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 05:45:23,315 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3246560.0, ans=0.2 2023-11-26 05:45:29,959 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 487000 2023-11-26 05:45:33,994 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.04 vs. limit=15.0 2023-11-26 05:45:36,453 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 6050, loss[loss=0.0547, simple_loss=0.07244, pruned_loss=0.009924, audio_tagging_loss=0.008553, over 15144.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.09103, pruned_loss=0.01249, audio_tagging_loss=0.008643, over 3047369.73 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:45:39,856 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3246693.3333333335, ans=0.125 2023-11-26 05:45:48,567 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3246760.0, ans=0.125 2023-11-26 05:45:50,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3246760.0, ans=0.0 2023-11-26 05:45:55,379 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3246760.0, ans=0.0 2023-11-26 05:45:56,619 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3246760.0, ans=0.04949747468305833 2023-11-26 05:45:57,893 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.92 vs. limit=15.0 2023-11-26 05:45:59,716 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3246826.6666666665, ans=0.0 2023-11-26 05:46:00,886 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3246826.6666666665, ans=0.0 2023-11-26 05:46:08,292 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3246826.6666666665, ans=0.1 2023-11-26 05:46:21,412 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.073e+01 8.605e+01 9.174e+01 9.669e+01 1.333e+02, threshold=1.835e+02, percent-clipped=0.0 2023-11-26 05:46:21,685 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3246960.0, ans=0.1 2023-11-26 05:46:25,741 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 487050 2023-11-26 05:46:25,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3246960.0, ans=0.125 2023-11-26 05:46:27,154 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.28 vs. limit=15.0 2023-11-26 05:46:32,117 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 6100, loss[loss=0.06409, simple_loss=0.09407, pruned_loss=0.01108, audio_tagging_loss=0.00597, over 14845.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.09075, pruned_loss=0.01256, audio_tagging_loss=0.008564, over 3040081.63 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:46:34,566 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3247026.6666666665, ans=0.125 2023-11-26 05:46:35,597 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=3247026.6666666665, ans=0.05 2023-11-26 05:46:43,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3247093.3333333335, ans=0.015 2023-11-26 05:46:54,791 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3247160.0, ans=0.1 2023-11-26 05:46:55,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=3247160.0, ans=15.0 2023-11-26 05:46:57,951 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3247160.0, ans=0.1 2023-11-26 05:47:13,865 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3247226.6666666665, ans=0.125 2023-11-26 05:47:21,769 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 487100 2023-11-26 05:47:26,135 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3247293.3333333335, ans=0.0 2023-11-26 05:47:28,039 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 6150, loss[loss=0.0529, simple_loss=0.07647, pruned_loss=0.007345, audio_tagging_loss=0.007315, over 15160.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.09051, pruned_loss=0.0125, audio_tagging_loss=0.00865, over 3041824.52 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:47:33,125 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3247360.0, ans=0.125 2023-11-26 05:47:52,907 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3247493.3333333335, ans=0.1 2023-11-26 05:47:55,262 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten.whitening_limit, batch_count=3247493.3333333335, ans=22.5 2023-11-26 05:48:00,289 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3247560.0, ans=0.0 2023-11-26 05:48:02,994 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3247560.0, ans=0.125 2023-11-26 05:48:03,065 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3247560.0, ans=0.0 2023-11-26 05:48:09,370 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.09 vs. limit=15.0 2023-11-26 05:48:12,589 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.193e+01 8.728e+01 9.335e+01 1.012e+02 1.245e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-26 05:48:15,516 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3247626.6666666665, ans=0.1 2023-11-26 05:48:17,899 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 487150 2023-11-26 05:48:19,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3247626.6666666665, ans=0.125 2023-11-26 05:48:24,208 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 6200, loss[loss=0.06213, simple_loss=0.08668, pruned_loss=0.009786, audio_tagging_loss=0.009005, over 15844.00 frames. ], tot_loss[loss=0.06668, simple_loss=0.09052, pruned_loss=0.0126, audio_tagging_loss=0.008822, over 3045290.54 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:48:26,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3247693.3333333335, ans=0.1 2023-11-26 05:48:51,495 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3247826.6666666665, ans=0.125 2023-11-26 05:49:00,087 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.73 vs. limit=15.0 2023-11-26 05:49:10,473 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=3247960.0, ans=22.5 2023-11-26 05:49:13,220 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 487200 2023-11-26 05:49:14,467 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3247960.0, ans=0.125 2023-11-26 05:49:19,824 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 6250, loss[loss=0.06135, simple_loss=0.08361, pruned_loss=0.01189, audio_tagging_loss=0.007656, over 14995.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08855, pruned_loss=0.01223, audio_tagging_loss=0.00899, over 3044837.27 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:49:19,979 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff3.min_abs, batch_count=3248026.6666666665, ans=0.2 2023-11-26 05:49:51,679 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3248160.0, ans=0.0 2023-11-26 05:49:52,674 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3248226.6666666665, ans=0.05 2023-11-26 05:50:04,065 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.648e+01 8.628e+01 9.158e+01 1.005e+02 1.454e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-26 05:50:04,540 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.27 vs. limit=6.0 2023-11-26 05:50:08,389 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 487250 2023-11-26 05:50:12,344 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3248293.3333333335, ans=0.125 2023-11-26 05:50:15,170 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 6300, loss[loss=0.03978, simple_loss=0.04551, pruned_loss=0.006205, audio_tagging_loss=0.01082, over 14015.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08865, pruned_loss=0.01227, audio_tagging_loss=0.009022, over 3048129.92 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:50:20,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3248360.0, ans=0.2 2023-11-26 05:50:20,279 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3248360.0, ans=0.0 2023-11-26 05:51:04,438 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 487300 2023-11-26 05:51:10,013 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3248626.6666666665, ans=0.125 2023-11-26 05:51:11,864 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 6350, loss[loss=0.06105, simple_loss=0.08845, pruned_loss=0.009463, audio_tagging_loss=0.00736, over 14676.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.08911, pruned_loss=0.01226, audio_tagging_loss=0.009027, over 3044938.18 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 05:51:23,304 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.76 vs. limit=6.0 2023-11-26 05:51:33,261 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 05:51:36,019 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.82 vs. limit=22.5 2023-11-26 05:51:37,960 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3248826.6666666665, ans=0.2 2023-11-26 05:51:38,012 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3248826.6666666665, ans=0.0 2023-11-26 05:51:41,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3248826.6666666665, ans=0.0 2023-11-26 05:51:45,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3248893.3333333335, ans=0.0 2023-11-26 05:51:46,086 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3248893.3333333335, ans=0.2 2023-11-26 05:51:56,308 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.938e+01 8.555e+01 9.166e+01 9.747e+01 1.455e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-26 05:51:56,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3248960.0, ans=0.125 2023-11-26 05:51:58,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3248960.0, ans=0.1 2023-11-26 05:52:00,767 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 487350 2023-11-26 05:52:01,304 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.92 vs. limit=10.0 2023-11-26 05:52:06,951 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 6400, loss[loss=0.06481, simple_loss=0.08581, pruned_loss=0.01001, audio_tagging_loss=0.0119, over 15198.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08916, pruned_loss=0.01221, audio_tagging_loss=0.009118, over 3048716.05 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:52:26,945 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.87 vs. limit=15.0 2023-11-26 05:52:53,932 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3249293.3333333335, ans=0.125 2023-11-26 05:52:55,834 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 487400 2023-11-26 05:53:02,864 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 6450, loss[loss=0.06771, simple_loss=0.09628, pruned_loss=0.01236, audio_tagging_loss=0.007203, over 15178.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08917, pruned_loss=0.01217, audio_tagging_loss=0.009097, over 3046149.10 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:53:28,124 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.40 vs. limit=15.0 2023-11-26 05:53:31,558 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.96 vs. limit=10.0 2023-11-26 05:53:47,383 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.968e+01 8.690e+01 9.179e+01 1.001e+02 1.387e+02, threshold=1.836e+02, percent-clipped=0.0 2023-11-26 05:53:50,887 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.20 vs. limit=6.0 2023-11-26 05:53:52,286 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 487450 2023-11-26 05:53:59,116 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 6500, loss[loss=0.06463, simple_loss=0.09225, pruned_loss=0.01188, audio_tagging_loss=0.006629, over 16210.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08913, pruned_loss=0.01214, audio_tagging_loss=0.00902, over 3044222.45 frames. ], batch size: 63, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:54:48,377 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 487500 2023-11-26 05:54:48,466 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3249960.0, ans=0.0 2023-11-26 05:54:54,636 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 6550, loss[loss=0.07324, simple_loss=0.1045, pruned_loss=0.01371, audio_tagging_loss=0.007266, over 16218.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08904, pruned_loss=0.01221, audio_tagging_loss=0.008941, over 3046090.66 frames. ], batch size: 61, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:55:06,792 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3250093.3333333335, ans=0.1 2023-11-26 05:55:39,376 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.986e+01 8.523e+01 8.995e+01 9.830e+01 1.214e+02, threshold=1.799e+02, percent-clipped=0.0 2023-11-26 05:55:43,719 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 487550 2023-11-26 05:55:45,913 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3250293.3333333335, ans=10.0 2023-11-26 05:55:50,078 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 6600, loss[loss=0.04095, simple_loss=0.04684, pruned_loss=0.006529, audio_tagging_loss=0.01099, over 13993.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08874, pruned_loss=0.01213, audio_tagging_loss=0.008929, over 3044801.98 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:56:05,768 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3250426.6666666665, ans=0.125 2023-11-26 05:56:40,097 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 487600 2023-11-26 05:56:43,646 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3250626.6666666665, ans=0.125 2023-11-26 05:56:47,246 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 6650, loss[loss=0.07181, simple_loss=0.09852, pruned_loss=0.01438, audio_tagging_loss=0.008175, over 14857.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08871, pruned_loss=0.01225, audio_tagging_loss=0.008852, over 3048599.89 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:57:09,280 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3250826.6666666665, ans=0.125 2023-11-26 05:57:13,552 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3250826.6666666665, ans=0.0 2023-11-26 05:57:19,836 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.79 vs. limit=15.0 2023-11-26 05:57:27,433 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3250893.3333333335, ans=0.1 2023-11-26 05:57:32,013 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.412e+01 8.660e+01 9.061e+01 9.694e+01 1.150e+02, threshold=1.812e+02, percent-clipped=0.0 2023-11-26 05:57:36,386 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 487650 2023-11-26 05:57:42,736 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 6700, loss[loss=0.07472, simple_loss=0.105, pruned_loss=0.01418, audio_tagging_loss=0.008058, over 15819.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08908, pruned_loss=0.01223, audio_tagging_loss=0.008798, over 3042296.99 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:57:45,655 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3251026.6666666665, ans=0.125 2023-11-26 05:57:46,860 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.26 vs. limit=15.0 2023-11-26 05:57:55,185 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3251093.3333333335, ans=0.0 2023-11-26 05:58:12,781 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.02 vs. limit=22.5 2023-11-26 05:58:14,694 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_na.min_abs, batch_count=3251160.0, ans=0.02 2023-11-26 05:58:15,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3251226.6666666665, ans=0.07 2023-11-26 05:58:23,221 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3251226.6666666665, ans=0.125 2023-11-26 05:58:29,013 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3251293.3333333335, ans=0.0 2023-11-26 05:58:32,026 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 487700 2023-11-26 05:58:34,334 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3251293.3333333335, ans=0.05 2023-11-26 05:58:38,278 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 6750, loss[loss=0.07805, simple_loss=0.1184, pruned_loss=0.01421, audio_tagging_loss=0.004651, over 15201.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08895, pruned_loss=0.01232, audio_tagging_loss=0.008736, over 3043796.90 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:58:39,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=3251360.0, ans=15.0 2023-11-26 05:58:45,354 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3251360.0, ans=0.0 2023-11-26 05:59:15,622 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3251560.0, ans=0.125 2023-11-26 05:59:24,320 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.344e+01 8.663e+01 9.356e+01 1.018e+02 1.599e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-26 05:59:27,694 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 487750 2023-11-26 05:59:34,855 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 6800, loss[loss=0.07023, simple_loss=0.09358, pruned_loss=0.01578, audio_tagging_loss=0.007659, over 15363.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.0891, pruned_loss=0.01251, audio_tagging_loss=0.008864, over 3048548.81 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 05:59:42,337 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.08 vs. limit=15.0 2023-11-26 06:00:08,020 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.04 vs. limit=12.0 2023-11-26 06:00:11,339 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.36 vs. limit=10.0 2023-11-26 06:00:24,531 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 487800 2023-11-26 06:00:31,087 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 6850, loss[loss=0.08142, simple_loss=0.118, pruned_loss=0.01371, audio_tagging_loss=0.008729, over 15431.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.08954, pruned_loss=0.01246, audio_tagging_loss=0.008839, over 3045280.65 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 06:01:01,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3252160.0, ans=0.1 2023-11-26 06:01:13,736 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3252226.6666666665, ans=0.1 2023-11-26 06:01:16,556 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.387e+01 8.578e+01 9.183e+01 9.945e+01 1.364e+02, threshold=1.837e+02, percent-clipped=0.0 2023-11-26 06:01:17,748 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3252293.3333333335, ans=0.125 2023-11-26 06:01:19,757 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 487850 2023-11-26 06:01:26,639 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 6900, loss[loss=0.07425, simple_loss=0.1055, pruned_loss=0.01349, audio_tagging_loss=0.008031, over 17202.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.08979, pruned_loss=0.01244, audio_tagging_loss=0.008806, over 3044332.64 frames. ], batch size: 63, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 06:01:38,609 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3252426.6666666665, ans=0.0 2023-11-26 06:01:47,580 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3252426.6666666665, ans=0.1 2023-11-26 06:01:52,796 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3252493.3333333335, ans=0.125 2023-11-26 06:01:57,266 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3252493.3333333335, ans=0.2 2023-11-26 06:02:09,412 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3252560.0, ans=0.125 2023-11-26 06:02:10,251 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 06:02:15,208 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3252626.6666666665, ans=0.125 2023-11-26 06:02:16,116 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 487900 2023-11-26 06:02:18,621 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.97 vs. limit=22.5 2023-11-26 06:02:22,933 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 6950, loss[loss=0.06634, simple_loss=0.08469, pruned_loss=0.01513, audio_tagging_loss=0.008871, over 14889.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.09049, pruned_loss=0.01259, audio_tagging_loss=0.008813, over 3043884.42 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 06:02:34,529 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.97 vs. limit=10.0 2023-11-26 06:02:40,357 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3252760.0, ans=0.0 2023-11-26 06:02:46,778 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3252826.6666666665, ans=0.0 2023-11-26 06:02:48,104 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.61 vs. limit=6.0 2023-11-26 06:02:57,365 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3252893.3333333335, ans=0.0 2023-11-26 06:02:59,875 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.67 vs. limit=12.0 2023-11-26 06:03:08,701 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3252960.0, ans=0.09899494936611666 2023-11-26 06:03:10,032 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.635e+01 8.822e+01 9.326e+01 1.010e+02 2.073e+02, threshold=1.865e+02, percent-clipped=1.0 2023-11-26 06:03:12,259 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 487950 2023-11-26 06:03:18,660 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 7000, loss[loss=0.05763, simple_loss=0.07937, pruned_loss=0.0107, audio_tagging_loss=0.007249, over 14439.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.08981, pruned_loss=0.0124, audio_tagging_loss=0.008803, over 3043578.78 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:03:37,326 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3253093.3333333335, ans=0.125 2023-11-26 06:03:56,368 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=3253226.6666666665, ans=15.0 2023-11-26 06:04:00,357 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3253226.6666666665, ans=0.125 2023-11-26 06:04:07,363 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.21 vs. limit=12.0 2023-11-26 06:04:07,804 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 488000 2023-11-26 06:04:16,302 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 7050, loss[loss=0.1047, simple_loss=0.1513, pruned_loss=0.02396, audio_tagging_loss=0.005115, over 16570.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.0902, pruned_loss=0.01248, audio_tagging_loss=0.008873, over 3050784.13 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:04:48,293 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.97 vs. limit=22.5 2023-11-26 06:05:02,585 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.363e+01 8.684e+01 9.399e+01 1.022e+02 1.192e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-26 06:05:05,268 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 488050 2023-11-26 06:05:12,706 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 7100, loss[loss=0.05808, simple_loss=0.07539, pruned_loss=0.0104, audio_tagging_loss=0.009983, over 14512.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.08945, pruned_loss=0.01232, audio_tagging_loss=0.009038, over 3047641.12 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:05:12,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3253693.3333333335, ans=0.1 2023-11-26 06:05:21,902 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3253693.3333333335, ans=0.125 2023-11-26 06:05:57,471 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3253960.0, ans=0.2 2023-11-26 06:05:58,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3253960.0, ans=0.1 2023-11-26 06:06:02,063 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 488100 2023-11-26 06:06:08,414 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 7150, loss[loss=0.08708, simple_loss=0.1292, pruned_loss=0.01652, audio_tagging_loss=0.005961, over 16100.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09053, pruned_loss=0.01257, audio_tagging_loss=0.009022, over 3044755.28 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:06:27,887 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3254093.3333333335, ans=0.04949747468305833 2023-11-26 06:06:31,550 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3254160.0, ans=0.125 2023-11-26 06:06:53,942 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3254293.3333333335, ans=0.2 2023-11-26 06:06:54,818 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.635e+01 8.934e+01 9.396e+01 1.011e+02 1.220e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-26 06:06:56,984 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 488150 2023-11-26 06:06:57,150 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3254293.3333333335, ans=0.0 2023-11-26 06:07:03,191 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 7200, loss[loss=0.08541, simple_loss=0.1178, pruned_loss=0.01613, audio_tagging_loss=0.01038, over 15515.00 frames. ], tot_loss[loss=0.06667, simple_loss=0.09024, pruned_loss=0.01248, audio_tagging_loss=0.009074, over 3045441.74 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 06:07:06,493 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3254360.0, ans=0.125 2023-11-26 06:07:06,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3254360.0, ans=0.125 2023-11-26 06:07:07,694 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=7.639e-03 2023-11-26 06:07:22,901 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3254426.6666666665, ans=0.125 2023-11-26 06:07:23,103 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.34 vs. limit=10.0 2023-11-26 06:07:30,193 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.27 vs. limit=15.0 2023-11-26 06:07:48,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3254626.6666666665, ans=0.125 2023-11-26 06:07:51,373 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3254626.6666666665, ans=0.0 2023-11-26 06:07:52,230 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 488200 2023-11-26 06:07:59,922 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 7250, loss[loss=0.07486, simple_loss=0.1029, pruned_loss=0.01294, audio_tagging_loss=0.01047, over 14778.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.08979, pruned_loss=0.01245, audio_tagging_loss=0.009169, over 3042227.46 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 06:08:02,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3254693.3333333335, ans=0.025 2023-11-26 06:08:04,130 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.46 vs. limit=15.0 2023-11-26 06:08:13,813 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3254760.0, ans=0.04949747468305833 2023-11-26 06:08:14,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3254760.0, ans=0.05 2023-11-26 06:08:20,348 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3254760.0, ans=0.125 2023-11-26 06:08:23,772 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3254826.6666666665, ans=0.125 2023-11-26 06:08:34,421 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3254893.3333333335, ans=0.0 2023-11-26 06:08:47,424 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.403e+01 8.575e+01 9.064e+01 9.788e+01 1.213e+02, threshold=1.813e+02, percent-clipped=0.0 2023-11-26 06:08:49,623 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 488250 2023-11-26 06:08:56,418 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 7300, loss[loss=0.1071, simple_loss=0.1417, pruned_loss=0.02521, audio_tagging_loss=0.01104, over 16240.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.09031, pruned_loss=0.01247, audio_tagging_loss=0.009037, over 3045012.89 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:09:00,011 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.01 vs. limit=15.0 2023-11-26 06:09:02,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=3255026.6666666665, ans=10.0 2023-11-26 06:09:39,424 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.19 vs. limit=22.5 2023-11-26 06:09:42,651 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.18 vs. limit=15.0 2023-11-26 06:09:45,395 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 488300 2023-11-26 06:09:51,736 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 7350, loss[loss=0.07528, simple_loss=0.101, pruned_loss=0.01757, audio_tagging_loss=0.007209, over 14509.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.0893, pruned_loss=0.01247, audio_tagging_loss=0.008936, over 3035729.99 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:10:02,595 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3255426.6666666665, ans=0.125 2023-11-26 06:10:22,762 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3255493.3333333335, ans=0.125 2023-11-26 06:10:39,583 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.650e+01 8.543e+01 9.108e+01 9.776e+01 1.189e+02, threshold=1.822e+02, percent-clipped=0.0 2023-11-26 06:10:39,758 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3255626.6666666665, ans=0.0 2023-11-26 06:10:40,722 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 488350 2023-11-26 06:10:47,681 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 7400, loss[loss=0.07298, simple_loss=0.1041, pruned_loss=0.01177, audio_tagging_loss=0.009175, over 16180.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.08979, pruned_loss=0.01257, audio_tagging_loss=0.008848, over 3041077.71 frames. ], batch size: 61, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:10:48,989 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3255693.3333333335, ans=0.125 2023-11-26 06:10:53,750 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3255693.3333333335, ans=0.125 2023-11-26 06:11:05,893 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.06 vs. limit=15.0 2023-11-26 06:11:07,674 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3255760.0, ans=0.2 2023-11-26 06:11:21,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3255893.3333333335, ans=0.125 2023-11-26 06:11:30,512 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3255893.3333333335, ans=0.125 2023-11-26 06:11:30,621 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.43 vs. limit=15.0 2023-11-26 06:11:37,320 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 488400 2023-11-26 06:11:44,980 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 7450, loss[loss=0.05966, simple_loss=0.08208, pruned_loss=0.01192, audio_tagging_loss=0.006703, over 14860.00 frames. ], tot_loss[loss=0.0672, simple_loss=0.09141, pruned_loss=0.0128, audio_tagging_loss=0.008697, over 3042847.79 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:11:48,399 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3256026.6666666665, ans=0.0 2023-11-26 06:12:23,892 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.57 vs. limit=15.0 2023-11-26 06:12:28,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3256293.3333333335, ans=0.07 2023-11-26 06:12:32,355 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.158e+01 8.793e+01 9.296e+01 1.001e+02 1.337e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-26 06:12:33,491 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 488450 2023-11-26 06:12:39,870 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 7500, loss[loss=0.06242, simple_loss=0.08202, pruned_loss=0.01436, audio_tagging_loss=0.007052, over 14847.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.09034, pruned_loss=0.01265, audio_tagging_loss=0.008668, over 3041467.75 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:12:54,581 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3256426.6666666665, ans=0.0 2023-11-26 06:13:21,714 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3256560.0, ans=0.0 2023-11-26 06:13:29,016 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 488500 2023-11-26 06:13:35,299 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 7550, loss[loss=0.08228, simple_loss=0.12, pruned_loss=0.01507, audio_tagging_loss=0.0072, over 15226.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.09075, pruned_loss=0.01265, audio_tagging_loss=0.008623, over 3042103.96 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:14:05,608 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.82 vs. limit=10.0 2023-11-26 06:14:06,197 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3256826.6666666665, ans=0.2 2023-11-26 06:14:13,635 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3256893.3333333335, ans=0.025 2023-11-26 06:14:19,675 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.10 vs. limit=15.0 2023-11-26 06:14:23,930 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.557e+01 9.000e+01 9.495e+01 1.038e+02 1.345e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-26 06:14:24,102 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3256960.0, ans=0.125 2023-11-26 06:14:24,538 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.28 vs. limit=10.0 2023-11-26 06:14:25,073 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 488550 2023-11-26 06:14:31,446 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 7600, loss[loss=0.06452, simple_loss=0.08671, pruned_loss=0.0111, audio_tagging_loss=0.01006, over 14982.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.09043, pruned_loss=0.01264, audio_tagging_loss=0.008628, over 3042388.91 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 06:14:40,104 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3257026.6666666665, ans=0.125 2023-11-26 06:14:42,443 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3257093.3333333335, ans=0.125 2023-11-26 06:14:49,996 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.01 vs. limit=22.5 2023-11-26 06:15:13,048 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=3257226.6666666665, ans=0.05 2023-11-26 06:15:18,800 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3257293.3333333335, ans=0.1 2023-11-26 06:15:20,850 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 488600 2023-11-26 06:15:27,003 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3257360.0, ans=0.125 2023-11-26 06:15:27,863 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 7650, loss[loss=0.06584, simple_loss=0.08234, pruned_loss=0.01305, audio_tagging_loss=0.01162, over 15018.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.08987, pruned_loss=0.01257, audio_tagging_loss=0.00863, over 3038102.76 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 06:15:35,704 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3257360.0, ans=0.125 2023-11-26 06:15:45,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=3257426.6666666665, ans=15.0 2023-11-26 06:15:58,281 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3257493.3333333335, ans=0.125 2023-11-26 06:15:59,387 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3257493.3333333335, ans=0.125 2023-11-26 06:16:09,721 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.34 vs. limit=15.0 2023-11-26 06:16:15,849 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3257626.6666666665, ans=0.0 2023-11-26 06:16:16,676 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.210e+01 8.718e+01 9.418e+01 1.004e+02 2.180e+02, threshold=1.884e+02, percent-clipped=1.0 2023-11-26 06:16:16,771 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 488650 2023-11-26 06:16:23,062 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 7700, loss[loss=0.06297, simple_loss=0.08409, pruned_loss=0.009637, audio_tagging_loss=0.01129, over 14734.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.08954, pruned_loss=0.01259, audio_tagging_loss=0.008662, over 3038836.48 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:16:53,250 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.35 vs. limit=15.0 2023-11-26 06:16:53,917 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=3257826.6666666665, ans=0.1 2023-11-26 06:17:12,417 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 488700 2023-11-26 06:17:12,513 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3257960.0, ans=0.2 2023-11-26 06:17:18,463 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3258026.6666666665, ans=0.025 2023-11-26 06:17:19,365 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 7750, loss[loss=0.07393, simple_loss=0.09687, pruned_loss=0.0165, audio_tagging_loss=0.008992, over 16562.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.0892, pruned_loss=0.01245, audio_tagging_loss=0.008688, over 3037820.54 frames. ], batch size: 61, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:17:43,040 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3258160.0, ans=0.0 2023-11-26 06:18:06,211 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2023-11-26 06:18:08,730 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.436e+01 8.599e+01 9.200e+01 9.734e+01 1.299e+02, threshold=1.840e+02, percent-clipped=0.0 2023-11-26 06:18:08,826 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 488750 2023-11-26 06:18:15,095 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 7800, loss[loss=0.04264, simple_loss=0.05776, pruned_loss=0.003438, audio_tagging_loss=0.01033, over 14438.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.08958, pruned_loss=0.01251, audio_tagging_loss=0.008677, over 3040790.53 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:18:30,859 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3258426.6666666665, ans=0.2 2023-11-26 06:18:37,722 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3258493.3333333335, ans=0.1 2023-11-26 06:18:54,184 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.72 vs. limit=6.0 2023-11-26 06:19:04,838 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 488800 2023-11-26 06:19:11,385 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 7850, loss[loss=0.09, simple_loss=0.1117, pruned_loss=0.02294, audio_tagging_loss=0.0112, over 15228.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.08945, pruned_loss=0.0125, audio_tagging_loss=0.008808, over 3035838.27 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:19:19,555 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3258693.3333333335, ans=0.0 2023-11-26 06:19:23,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3258760.0, ans=0.125 2023-11-26 06:19:32,975 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.41 vs. limit=5.0 2023-11-26 06:19:40,887 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3258826.6666666665, ans=0.125 2023-11-26 06:19:55,093 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3258960.0, ans=0.04949747468305833 2023-11-26 06:19:57,316 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3258960.0, ans=0.05 2023-11-26 06:20:00,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3258960.0, ans=0.125 2023-11-26 06:20:00,751 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.651e+01 8.695e+01 9.194e+01 9.770e+01 1.489e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-26 06:20:00,850 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 488850 2023-11-26 06:20:04,155 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3258960.0, ans=0.0 2023-11-26 06:20:07,639 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 7900, loss[loss=0.06933, simple_loss=0.09607, pruned_loss=0.01353, audio_tagging_loss=0.007762, over 15984.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.08918, pruned_loss=0.01225, audio_tagging_loss=0.008996, over 3040412.05 frames. ], batch size: 63, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:20:07,808 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3259026.6666666665, ans=0.2 2023-11-26 06:20:28,678 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3259160.0, ans=0.125 2023-11-26 06:20:44,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3259226.6666666665, ans=0.1 2023-11-26 06:20:48,555 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3259226.6666666665, ans=0.125 2023-11-26 06:20:57,368 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 488900 2023-11-26 06:20:58,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3259293.3333333335, ans=0.125 2023-11-26 06:21:03,736 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 7950, loss[loss=0.03984, simple_loss=0.05187, pruned_loss=0.003336, audio_tagging_loss=0.01056, over 13909.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.09052, pruned_loss=0.01243, audio_tagging_loss=0.009001, over 3039410.21 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:21:16,900 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 06:21:20,167 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 06:21:23,270 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 06:21:26,236 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.85 vs. limit=15.0 2023-11-26 06:21:49,322 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3259626.6666666665, ans=0.125 2023-11-26 06:21:52,302 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.130e+01 8.753e+01 9.407e+01 1.023e+02 1.871e+02, threshold=1.881e+02, percent-clipped=1.0 2023-11-26 06:21:52,401 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 488950 2023-11-26 06:21:53,623 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 06:21:59,113 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 8000, loss[loss=0.06018, simple_loss=0.08482, pruned_loss=0.006113, audio_tagging_loss=0.01165, over 15830.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.09059, pruned_loss=0.01247, audio_tagging_loss=0.00902, over 3039572.22 frames. ], batch size: 61, lr: 1.65e-03, grad_scale: 32.0 2023-11-26 06:22:01,534 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3259693.3333333335, ans=0.125 2023-11-26 06:22:11,828 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.26 vs. limit=10.0 2023-11-26 06:22:19,559 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3259760.0, ans=0.05 2023-11-26 06:22:19,756 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.91 vs. limit=15.0 2023-11-26 06:22:25,357 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3259826.6666666665, ans=0.125 2023-11-26 06:22:32,629 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3259893.3333333335, ans=0.125 2023-11-26 06:22:36,895 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3259893.3333333335, ans=0.125 2023-11-26 06:22:45,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3259960.0, ans=0.125 2023-11-26 06:22:47,046 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3259960.0, ans=0.0 2023-11-26 06:22:48,444 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 489000 2023-11-26 06:22:53,103 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3259960.0, ans=0.0 2023-11-26 06:22:53,260 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3259960.0, ans=0.04949747468305833 2023-11-26 06:22:55,046 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 8050, loss[loss=0.06771, simple_loss=0.09442, pruned_loss=0.01222, audio_tagging_loss=0.008279, over 16461.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.09038, pruned_loss=0.01245, audio_tagging_loss=0.009007, over 3040258.72 frames. ], batch size: 60, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:22:55,940 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3260026.6666666665, ans=0.125 2023-11-26 06:23:08,095 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3260093.3333333335, ans=0.125 2023-11-26 06:23:09,184 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3260093.3333333335, ans=0.125 2023-11-26 06:23:22,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3260160.0, ans=0.0 2023-11-26 06:23:32,879 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.19 vs. limit=12.0 2023-11-26 06:23:41,633 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3260293.3333333335, ans=0.125 2023-11-26 06:23:44,638 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 489050 2023-11-26 06:23:44,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3260293.3333333335, ans=0.1 2023-11-26 06:23:46,128 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.345e+01 8.810e+01 9.339e+01 9.946e+01 1.266e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-26 06:23:51,453 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 8100, loss[loss=0.03903, simple_loss=0.04463, pruned_loss=0.007439, audio_tagging_loss=0.009276, over 16361.00 frames. ], tot_loss[loss=0.06706, simple_loss=0.09086, pruned_loss=0.01267, audio_tagging_loss=0.008957, over 3040074.48 frames. ], batch size: 66, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:24:09,848 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3260426.6666666665, ans=0.125 2023-11-26 06:24:22,064 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3260493.3333333335, ans=0.07 2023-11-26 06:24:23,495 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.68 vs. limit=22.5 2023-11-26 06:24:32,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3260560.0, ans=0.0 2023-11-26 06:24:34,406 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3260560.0, ans=0.125 2023-11-26 06:24:39,785 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3260626.6666666665, ans=0.0 2023-11-26 06:24:40,657 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 489100 2023-11-26 06:24:46,948 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 8150, loss[loss=0.07095, simple_loss=0.09781, pruned_loss=0.01363, audio_tagging_loss=0.00842, over 15289.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.09005, pruned_loss=0.01252, audio_tagging_loss=0.008813, over 3036649.99 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-26 06:24:48,170 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3260693.3333333335, ans=0.0 2023-11-26 06:25:00,950 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3260760.0, ans=0.0 2023-11-26 06:25:22,274 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3260893.3333333335, ans=0.1 2023-11-26 06:25:25,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=3260893.3333333335, ans=0.02 2023-11-26 06:25:34,404 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.92 vs. limit=22.5 2023-11-26 06:25:35,895 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 489150 2023-11-26 06:25:37,970 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.913e+01 8.636e+01 9.236e+01 1.005e+02 1.829e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-26 06:25:43,409 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 8200, loss[loss=0.06324, simple_loss=0.08759, pruned_loss=0.01226, audio_tagging_loss=0.007182, over 14570.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.09027, pruned_loss=0.01243, audio_tagging_loss=0.008615, over 3045846.74 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 8.0 2023-11-26 06:25:44,470 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 06:25:55,274 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3261093.3333333335, ans=0.2 2023-11-26 06:25:57,363 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3261093.3333333335, ans=0.125 2023-11-26 06:26:08,125 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3261160.0, ans=0.2 2023-11-26 06:26:11,723 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.80 vs. limit=15.0 2023-11-26 06:26:33,281 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 489200 2023-11-26 06:26:36,983 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3261293.3333333335, ans=0.125 2023-11-26 06:26:40,317 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 8250, loss[loss=0.06006, simple_loss=0.0758, pruned_loss=0.01343, audio_tagging_loss=0.008728, over 15669.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.09048, pruned_loss=0.01266, audio_tagging_loss=0.008569, over 3045264.21 frames. ], batch size: 61, lr: 1.64e-03, grad_scale: 8.0 2023-11-26 06:26:43,035 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.29 vs. limit=15.0 2023-11-26 06:26:49,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3261360.0, ans=0.0 2023-11-26 06:26:54,470 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3261426.6666666665, ans=0.125 2023-11-26 06:27:01,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3261493.3333333335, ans=0.2 2023-11-26 06:27:23,510 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3261560.0, ans=0.0 2023-11-26 06:27:29,809 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 489250 2023-11-26 06:27:31,876 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.668e+01 8.764e+01 9.523e+01 1.021e+02 1.378e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-26 06:27:36,100 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 8300, loss[loss=0.05826, simple_loss=0.08179, pruned_loss=0.00819, audio_tagging_loss=0.009174, over 15183.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.08999, pruned_loss=0.01242, audio_tagging_loss=0.008649, over 3048844.77 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 8.0 2023-11-26 06:27:36,339 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3261693.3333333335, ans=0.0 2023-11-26 06:27:39,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3261693.3333333335, ans=0.125 2023-11-26 06:27:39,891 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.73 vs. limit=10.0 2023-11-26 06:27:47,030 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3261760.0, ans=0.125 2023-11-26 06:28:05,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3261826.6666666665, ans=0.125 2023-11-26 06:28:08,366 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3261826.6666666665, ans=0.0 2023-11-26 06:28:19,309 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.92 vs. limit=6.0 2023-11-26 06:28:25,257 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 489300 2023-11-26 06:28:32,185 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 8350, loss[loss=0.07113, simple_loss=0.0952, pruned_loss=0.01551, audio_tagging_loss=0.008023, over 17163.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.09037, pruned_loss=0.01257, audio_tagging_loss=0.008605, over 3051500.54 frames. ], batch size: 65, lr: 1.64e-03, grad_scale: 8.0 2023-11-26 06:29:21,862 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 489350 2023-11-26 06:29:23,909 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.424e+01 8.707e+01 9.107e+01 9.856e+01 1.432e+02, threshold=1.821e+02, percent-clipped=0.0 2023-11-26 06:29:28,770 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 8400, loss[loss=0.06945, simple_loss=0.09445, pruned_loss=0.01553, audio_tagging_loss=0.006698, over 14440.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.08933, pruned_loss=0.01225, audio_tagging_loss=0.008638, over 3041890.86 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:29:34,467 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3262360.0, ans=0.125 2023-11-26 06:29:37,578 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3262360.0, ans=0.0 2023-11-26 06:29:41,828 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3262426.6666666665, ans=0.125 2023-11-26 06:29:43,967 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3262426.6666666665, ans=0.0 2023-11-26 06:29:54,087 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.14 vs. limit=15.0 2023-11-26 06:30:08,980 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3262560.0, ans=0.125 2023-11-26 06:30:13,055 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.06 vs. limit=22.5 2023-11-26 06:30:16,984 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3262626.6666666665, ans=0.0 2023-11-26 06:30:17,894 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 489400 2023-11-26 06:30:18,387 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.92 vs. limit=22.5 2023-11-26 06:30:23,960 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.13 vs. limit=15.0 2023-11-26 06:30:24,462 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 8450, loss[loss=0.08237, simple_loss=0.1172, pruned_loss=0.01489, audio_tagging_loss=0.00887, over 14500.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08897, pruned_loss=0.01225, audio_tagging_loss=0.008715, over 3044061.95 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:30:30,967 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3262693.3333333335, ans=0.0 2023-11-26 06:30:31,039 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3262693.3333333335, ans=0.0 2023-11-26 06:30:31,113 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3262693.3333333335, ans=0.1 2023-11-26 06:30:39,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3262760.0, ans=0.0 2023-11-26 06:30:54,921 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.34 vs. limit=6.0 2023-11-26 06:30:58,580 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3262893.3333333335, ans=0.1 2023-11-26 06:31:09,199 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3262960.0, ans=0.0 2023-11-26 06:31:13,408 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 489450 2023-11-26 06:31:15,427 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.299e+01 8.909e+01 9.451e+01 1.011e+02 1.331e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-26 06:31:17,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3262960.0, ans=0.125 2023-11-26 06:31:20,191 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 8500, loss[loss=0.06518, simple_loss=0.08337, pruned_loss=0.012, audio_tagging_loss=0.0115, over 15691.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.08935, pruned_loss=0.01247, audio_tagging_loss=0.008813, over 3047643.24 frames. ], batch size: 60, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:31:46,691 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3263160.0, ans=0.125 2023-11-26 06:32:03,168 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3263226.6666666665, ans=0.1 2023-11-26 06:32:09,406 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 489500 2023-11-26 06:32:14,212 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3263293.3333333335, ans=0.0 2023-11-26 06:32:16,188 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 8550, loss[loss=0.07519, simple_loss=0.09231, pruned_loss=0.01994, audio_tagging_loss=0.009095, over 14444.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08896, pruned_loss=0.01229, audio_tagging_loss=0.008804, over 3052099.38 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:32:19,724 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_na.min_abs, batch_count=3263360.0, ans=0.02 2023-11-26 06:32:21,723 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3263360.0, ans=0.1 2023-11-26 06:32:32,512 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3263426.6666666665, ans=0.125 2023-11-26 06:32:33,904 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.54 vs. limit=6.0 2023-11-26 06:32:43,407 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.12 vs. limit=22.5 2023-11-26 06:32:54,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3263560.0, ans=0.125 2023-11-26 06:33:04,756 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3263626.6666666665, ans=0.1 2023-11-26 06:33:05,656 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 489550 2023-11-26 06:33:05,763 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3263626.6666666665, ans=0.0 2023-11-26 06:33:07,664 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.400e+01 8.883e+01 9.307e+01 9.956e+01 1.247e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-26 06:33:11,973 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 8600, loss[loss=0.07257, simple_loss=0.1047, pruned_loss=0.01219, audio_tagging_loss=0.008033, over 14977.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.08995, pruned_loss=0.01243, audio_tagging_loss=0.008853, over 3049243.63 frames. ], batch size: 53, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:33:36,966 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.03 vs. limit=15.0 2023-11-26 06:33:39,718 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3263826.6666666665, ans=0.0 2023-11-26 06:34:00,555 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 489600 2023-11-26 06:34:05,141 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3263960.0, ans=0.0 2023-11-26 06:34:06,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3264026.6666666665, ans=0.125 2023-11-26 06:34:07,034 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 8650, loss[loss=0.06749, simple_loss=0.0915, pruned_loss=0.01354, audio_tagging_loss=0.008201, over 14140.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.09024, pruned_loss=0.01258, audio_tagging_loss=0.008854, over 3042142.01 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:34:20,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3264093.3333333335, ans=0.0 2023-11-26 06:34:24,188 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3264093.3333333335, ans=0.125 2023-11-26 06:34:31,225 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.17 vs. limit=15.0 2023-11-26 06:34:32,952 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3264160.0, ans=0.125 2023-11-26 06:34:41,356 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3264226.6666666665, ans=0.0 2023-11-26 06:34:42,437 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3264226.6666666665, ans=0.125 2023-11-26 06:34:48,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3264226.6666666665, ans=0.125 2023-11-26 06:34:51,484 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.99 vs. limit=10.0 2023-11-26 06:34:56,647 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 489650 2023-11-26 06:34:58,650 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.548e+01 8.575e+01 9.501e+01 1.015e+02 1.798e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-26 06:35:03,401 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 8700, loss[loss=0.08192, simple_loss=0.1159, pruned_loss=0.01515, audio_tagging_loss=0.008817, over 15017.00 frames. ], tot_loss[loss=0.06682, simple_loss=0.09066, pruned_loss=0.01266, audio_tagging_loss=0.008832, over 3051956.57 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:35:05,042 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.44 vs. limit=22.5 2023-11-26 06:35:05,666 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3264360.0, ans=0.0 2023-11-26 06:35:13,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3264360.0, ans=0.1 2023-11-26 06:35:29,031 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3264493.3333333335, ans=0.2 2023-11-26 06:35:33,366 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3264493.3333333335, ans=0.2 2023-11-26 06:35:39,149 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3264560.0, ans=0.125 2023-11-26 06:35:52,894 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 489700 2023-11-26 06:35:59,781 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 8750, loss[loss=0.0621, simple_loss=0.07807, pruned_loss=0.01305, audio_tagging_loss=0.01002, over 14457.00 frames. ], tot_loss[loss=0.06731, simple_loss=0.09127, pruned_loss=0.01276, audio_tagging_loss=0.008922, over 3059716.83 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:36:48,811 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 489750 2023-11-26 06:36:50,786 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.923e+01 8.719e+01 9.577e+01 1.009e+02 1.331e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-26 06:36:54,261 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3265026.6666666665, ans=0.125 2023-11-26 06:36:55,059 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 8800, loss[loss=0.05982, simple_loss=0.07931, pruned_loss=0.01043, audio_tagging_loss=0.009728, over 14253.00 frames. ], tot_loss[loss=0.06765, simple_loss=0.09169, pruned_loss=0.01286, audio_tagging_loss=0.00894, over 3054748.53 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 06:36:59,531 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3265026.6666666665, ans=0.125 2023-11-26 06:37:05,318 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3265093.3333333335, ans=0.0 2023-11-26 06:37:23,505 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3265160.0, ans=0.0 2023-11-26 06:37:27,152 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3265160.0, ans=0.0 2023-11-26 06:37:27,609 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.10 vs. limit=22.5 2023-11-26 06:37:30,501 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.96 vs. limit=15.0 2023-11-26 06:37:36,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3265226.6666666665, ans=0.125 2023-11-26 06:37:37,959 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.13 vs. limit=22.5 2023-11-26 06:37:44,510 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 489800 2023-11-26 06:37:51,524 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 8850, loss[loss=0.05985, simple_loss=0.08705, pruned_loss=0.008314, audio_tagging_loss=0.008009, over 15310.00 frames. ], tot_loss[loss=0.0678, simple_loss=0.09197, pruned_loss=0.01286, audio_tagging_loss=0.008957, over 3056971.90 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 06:37:53,209 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=3265360.0, ans=22.5 2023-11-26 06:38:02,760 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 06:38:08,657 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3265426.6666666665, ans=0.0 2023-11-26 06:38:30,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3265560.0, ans=0.1 2023-11-26 06:38:38,532 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3265626.6666666665, ans=0.0 2023-11-26 06:38:40,968 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 489850 2023-11-26 06:38:44,014 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.272e+01 8.713e+01 9.492e+01 1.007e+02 1.202e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-26 06:38:47,353 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 8900, loss[loss=0.04544, simple_loss=0.05645, pruned_loss=0.007383, audio_tagging_loss=0.009832, over 14485.00 frames. ], tot_loss[loss=0.06739, simple_loss=0.09153, pruned_loss=0.01275, audio_tagging_loss=0.008871, over 3054837.09 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:38:52,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3265693.3333333335, ans=0.07 2023-11-26 06:39:06,444 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.90 vs. limit=15.0 2023-11-26 06:39:10,201 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.49 vs. limit=12.0 2023-11-26 06:39:21,327 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.21 vs. limit=22.5 2023-11-26 06:39:32,717 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.22 vs. limit=12.0 2023-11-26 06:39:36,866 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 489900 2023-11-26 06:39:41,295 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3265960.0, ans=0.0 2023-11-26 06:39:43,102 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 8950, loss[loss=0.08341, simple_loss=0.1121, pruned_loss=0.01742, audio_tagging_loss=0.009951, over 16185.00 frames. ], tot_loss[loss=0.06739, simple_loss=0.09177, pruned_loss=0.01284, audio_tagging_loss=0.008669, over 3059869.72 frames. ], batch size: 59, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:40:04,673 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.28 vs. limit=12.0 2023-11-26 06:40:10,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3266160.0, ans=0.0 2023-11-26 06:40:16,756 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3266226.6666666665, ans=0.0 2023-11-26 06:40:32,187 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 489950 2023-11-26 06:40:33,369 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3266293.3333333335, ans=0.1 2023-11-26 06:40:35,291 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.267e+01 8.881e+01 9.559e+01 9.968e+01 1.237e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-26 06:40:38,547 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 9000, loss[loss=0.07128, simple_loss=0.09769, pruned_loss=0.0155, audio_tagging_loss=0.006931, over 15376.00 frames. ], tot_loss[loss=0.06731, simple_loss=0.09164, pruned_loss=0.0129, audio_tagging_loss=0.008594, over 3058346.92 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:40:38,548 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-26 06:41:10,845 INFO [train_asr.py:1267] (3/4) Epoch 41, validation: loss=0.05835, simple_loss=0.05057, pruned_loss=0.005166, audio_tagging_loss=0.0279, over 4681554.00 frames. 2023-11-26 06:41:10,846 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-26 06:41:24,310 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3266426.6666666665, ans=0.2 2023-11-26 06:41:26,679 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.11 vs. limit=12.0 2023-11-26 06:41:59,923 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 490000 2023-11-26 06:42:03,603 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3266626.6666666665, ans=0.125 2023-11-26 06:42:06,616 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 9050, loss[loss=0.06382, simple_loss=0.08659, pruned_loss=0.0137, audio_tagging_loss=0.006831, over 15379.00 frames. ], tot_loss[loss=0.06689, simple_loss=0.09113, pruned_loss=0.01268, audio_tagging_loss=0.008648, over 3056131.65 frames. ], batch size: 59, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:42:13,789 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3266693.3333333335, ans=0.07 2023-11-26 06:42:15,945 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3266693.3333333335, ans=0.125 2023-11-26 06:42:41,492 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3266893.3333333335, ans=0.07 2023-11-26 06:42:54,489 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3266960.0, ans=0.125 2023-11-26 06:42:56,361 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 490050 2023-11-26 06:42:59,334 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.458e+01 8.756e+01 9.461e+01 1.032e+02 1.293e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-26 06:43:03,101 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 9100, loss[loss=0.0563, simple_loss=0.07681, pruned_loss=0.007197, audio_tagging_loss=0.0107, over 15589.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.09074, pruned_loss=0.01264, audio_tagging_loss=0.00863, over 3060253.97 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:43:09,175 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3267026.6666666665, ans=0.1 2023-11-26 06:43:18,557 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3267093.3333333335, ans=0.125 2023-11-26 06:43:45,883 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3267226.6666666665, ans=0.2 2023-11-26 06:43:52,690 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 490100 2023-11-26 06:43:58,970 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 9150, loss[loss=0.06234, simple_loss=0.08698, pruned_loss=0.01125, audio_tagging_loss=0.007592, over 14948.00 frames. ], tot_loss[loss=0.06673, simple_loss=0.09088, pruned_loss=0.01269, audio_tagging_loss=0.0086, over 3058861.33 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:43:59,123 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3267360.0, ans=0.0 2023-11-26 06:44:04,281 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3267360.0, ans=0.125 2023-11-26 06:44:11,296 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3267426.6666666665, ans=0.125 2023-11-26 06:44:23,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3267493.3333333335, ans=0.1 2023-11-26 06:44:28,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3267493.3333333335, ans=0.125 2023-11-26 06:44:30,377 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3267493.3333333335, ans=0.025 2023-11-26 06:44:36,119 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3267560.0, ans=0.125 2023-11-26 06:44:44,047 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.08 vs. limit=12.0 2023-11-26 06:44:47,818 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 490150 2023-11-26 06:44:50,862 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.417e+01 8.875e+01 9.458e+01 1.013e+02 1.353e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-26 06:44:54,033 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 9200, loss[loss=0.06059, simple_loss=0.07667, pruned_loss=0.009739, audio_tagging_loss=0.01251, over 14360.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.0909, pruned_loss=0.01277, audio_tagging_loss=0.008593, over 3059385.92 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 06:45:07,613 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3267760.0, ans=0.0 2023-11-26 06:45:40,184 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3267960.0, ans=0.125 2023-11-26 06:45:43,783 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 490200 2023-11-26 06:45:51,057 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 9250, loss[loss=0.07368, simple_loss=0.1041, pruned_loss=0.01598, audio_tagging_loss=0.005638, over 14875.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.09061, pruned_loss=0.01262, audio_tagging_loss=0.008506, over 3062606.29 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 06:45:51,207 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3268026.6666666665, ans=0.1 2023-11-26 06:46:00,572 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.72 vs. limit=22.5 2023-11-26 06:46:03,922 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.72 vs. limit=12.0 2023-11-26 06:46:14,316 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.54 vs. limit=15.0 2023-11-26 06:46:17,351 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3268160.0, ans=0.2 2023-11-26 06:46:20,386 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3268160.0, ans=0.125 2023-11-26 06:46:26,648 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3268226.6666666665, ans=0.125 2023-11-26 06:46:39,712 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 490250 2023-11-26 06:46:42,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3268293.3333333335, ans=0.125 2023-11-26 06:46:43,462 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.386e+01 8.603e+01 9.080e+01 9.924e+01 1.383e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-26 06:46:46,732 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 9300, loss[loss=0.07155, simple_loss=0.08985, pruned_loss=0.01709, audio_tagging_loss=0.009542, over 14284.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.09058, pruned_loss=0.01264, audio_tagging_loss=0.008605, over 3060728.23 frames. ], batch size: 53, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 06:46:59,910 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.11 vs. limit=6.0 2023-11-26 06:47:13,122 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3268493.3333333335, ans=0.07 2023-11-26 06:47:23,705 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3268560.0, ans=0.125 2023-11-26 06:47:35,459 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 490300 2023-11-26 06:47:41,732 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 9350, loss[loss=0.05483, simple_loss=0.07139, pruned_loss=0.009665, audio_tagging_loss=0.009465, over 15586.00 frames. ], tot_loss[loss=0.06705, simple_loss=0.09111, pruned_loss=0.01278, audio_tagging_loss=0.008713, over 3057816.97 frames. ], batch size: 60, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 06:48:11,718 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3268826.6666666665, ans=0.125 2023-11-26 06:48:19,045 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3268893.3333333335, ans=0.125 2023-11-26 06:48:21,628 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.22 vs. limit=15.0 2023-11-26 06:48:31,098 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 490350 2023-11-26 06:48:34,655 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.576e+01 8.924e+01 9.559e+01 1.022e+02 1.389e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-26 06:48:37,933 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 9400, loss[loss=0.04777, simple_loss=0.07152, pruned_loss=0.004769, audio_tagging_loss=0.007243, over 15189.00 frames. ], tot_loss[loss=0.06732, simple_loss=0.09125, pruned_loss=0.01293, audio_tagging_loss=0.008763, over 3052198.96 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 06:48:45,073 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3269026.6666666665, ans=0.125 2023-11-26 06:48:50,249 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.56 vs. limit=15.0 2023-11-26 06:48:57,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3269093.3333333335, ans=0.125 2023-11-26 06:49:27,317 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 490400 2023-11-26 06:49:29,906 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3269293.3333333335, ans=0.0 2023-11-26 06:49:32,367 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 06:49:33,592 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3269360.0, ans=0.125 2023-11-26 06:49:34,430 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 9450, loss[loss=0.05074, simple_loss=0.06489, pruned_loss=0.009471, audio_tagging_loss=0.008825, over 15710.00 frames. ], tot_loss[loss=0.06744, simple_loss=0.09131, pruned_loss=0.01294, audio_tagging_loss=0.008854, over 3041335.16 frames. ], batch size: 59, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 06:49:34,926 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.67 vs. limit=15.0 2023-11-26 06:49:36,821 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3269360.0, ans=0.1 2023-11-26 06:50:23,256 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 490450 2023-11-26 06:50:26,315 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.546e+01 8.849e+01 9.435e+01 1.031e+02 1.248e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-26 06:50:29,505 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 9500, loss[loss=0.06783, simple_loss=0.0795, pruned_loss=0.01606, audio_tagging_loss=0.01202, over 14208.00 frames. ], tot_loss[loss=0.06741, simple_loss=0.09106, pruned_loss=0.01297, audio_tagging_loss=0.008915, over 3043126.51 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 06:50:31,899 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3269693.3333333335, ans=0.05 2023-11-26 06:50:38,841 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.43 vs. limit=22.5 2023-11-26 06:50:40,341 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.76 vs. limit=22.5 2023-11-26 06:50:51,790 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3269826.6666666665, ans=0.0 2023-11-26 06:50:54,067 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3269826.6666666665, ans=0.125 2023-11-26 06:51:18,660 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 490500 2023-11-26 06:51:25,487 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 9550, loss[loss=0.0734, simple_loss=0.09695, pruned_loss=0.01477, audio_tagging_loss=0.01015, over 15004.00 frames. ], tot_loss[loss=0.06759, simple_loss=0.09126, pruned_loss=0.01297, audio_tagging_loss=0.008993, over 3046269.01 frames. ], batch size: 53, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 06:51:38,966 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3270093.3333333335, ans=0.125 2023-11-26 06:51:54,214 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.63 vs. limit=22.5 2023-11-26 06:51:57,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3270160.0, ans=0.2 2023-11-26 06:52:01,495 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3270226.6666666665, ans=0.125 2023-11-26 06:52:08,660 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.83 vs. limit=15.0 2023-11-26 06:52:15,543 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 490550 2023-11-26 06:52:18,582 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.011e+01 8.931e+01 9.591e+01 1.034e+02 1.211e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-26 06:52:22,419 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 9600, loss[loss=0.07684, simple_loss=0.09891, pruned_loss=0.01641, audio_tagging_loss=0.01097, over 16196.00 frames. ], tot_loss[loss=0.06779, simple_loss=0.09166, pruned_loss=0.013, audio_tagging_loss=0.008959, over 3053235.19 frames. ], batch size: 63, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 06:52:30,307 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.95 vs. limit=15.0 2023-11-26 06:53:11,613 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 490600 2023-11-26 06:53:12,295 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.29 vs. limit=15.0 2023-11-26 06:53:18,172 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 9650, loss[loss=0.07288, simple_loss=0.1002, pruned_loss=0.0149, audio_tagging_loss=0.007864, over 15619.00 frames. ], tot_loss[loss=0.06799, simple_loss=0.09201, pruned_loss=0.01302, audio_tagging_loss=0.008966, over 3050296.63 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 06:53:50,238 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3270826.6666666665, ans=0.125 2023-11-26 06:53:50,252 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3270826.6666666665, ans=0.125 2023-11-26 06:54:02,190 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=3270960.0, ans=15.0 2023-11-26 06:54:07,190 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 490650 2023-11-26 06:54:07,331 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3270960.0, ans=0.2 2023-11-26 06:54:10,212 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.227e+01 8.629e+01 9.120e+01 1.007e+02 1.405e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-26 06:54:13,991 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 9700, loss[loss=0.06166, simple_loss=0.08456, pruned_loss=0.01037, audio_tagging_loss=0.009015, over 15098.00 frames. ], tot_loss[loss=0.06772, simple_loss=0.09176, pruned_loss=0.01299, audio_tagging_loss=0.008855, over 3045617.55 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 06:54:28,226 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 06:54:53,170 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3271226.6666666665, ans=0.125 2023-11-26 06:54:53,451 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.34 vs. limit=15.0 2023-11-26 06:55:03,224 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 490700 2023-11-26 06:55:10,691 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 9750, loss[loss=0.07958, simple_loss=0.1114, pruned_loss=0.01483, audio_tagging_loss=0.009028, over 14575.00 frames. ], tot_loss[loss=0.06747, simple_loss=0.09156, pruned_loss=0.01293, audio_tagging_loss=0.008762, over 3049735.49 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 06:55:13,516 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3271360.0, ans=0.125 2023-11-26 06:55:14,588 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3271360.0, ans=0.125 2023-11-26 06:55:22,391 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3271426.6666666665, ans=0.1 2023-11-26 06:55:31,857 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3271493.3333333335, ans=0.0 2023-11-26 06:55:39,926 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.63 vs. limit=15.0 2023-11-26 06:55:59,916 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 490750 2023-11-26 06:56:03,973 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.324e+01 8.706e+01 9.282e+01 1.012e+02 1.180e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-26 06:56:05,250 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3271693.3333333335, ans=0.07 2023-11-26 06:56:06,096 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 9800, loss[loss=0.06621, simple_loss=0.0936, pruned_loss=0.00949, audio_tagging_loss=0.009917, over 15129.00 frames. ], tot_loss[loss=0.06736, simple_loss=0.09142, pruned_loss=0.01287, audio_tagging_loss=0.008781, over 3041380.81 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:56:08,419 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3271693.3333333335, ans=0.125 2023-11-26 06:56:40,254 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3271893.3333333335, ans=0.2 2023-11-26 06:56:48,868 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3271893.3333333335, ans=0.125 2023-11-26 06:56:55,059 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 06:56:55,086 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 490800 2023-11-26 06:56:55,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3271960.0, ans=0.125 2023-11-26 06:56:55,313 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3271960.0, ans=0.0 2023-11-26 06:56:58,847 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3271960.0, ans=0.0 2023-11-26 06:57:01,817 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 9850, loss[loss=0.06763, simple_loss=0.09507, pruned_loss=0.01256, audio_tagging_loss=0.007534, over 15055.00 frames. ], tot_loss[loss=0.06777, simple_loss=0.09224, pruned_loss=0.01299, audio_tagging_loss=0.008664, over 3045604.21 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:57:16,869 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3272093.3333333335, ans=0.0 2023-11-26 06:57:29,797 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3272160.0, ans=0.2 2023-11-26 06:57:38,220 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3272226.6666666665, ans=0.0 2023-11-26 06:57:44,790 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3272226.6666666665, ans=0.125 2023-11-26 06:57:51,701 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 490850 2023-11-26 06:57:56,501 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.954e+01 8.658e+01 9.556e+01 1.029e+02 1.537e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-26 06:57:58,694 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 9900, loss[loss=0.05784, simple_loss=0.06921, pruned_loss=0.01132, audio_tagging_loss=0.01192, over 14112.00 frames. ], tot_loss[loss=0.0671, simple_loss=0.0912, pruned_loss=0.01278, audio_tagging_loss=0.008723, over 3044886.39 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:58:09,552 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3272426.6666666665, ans=0.2 2023-11-26 06:58:15,982 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3272426.6666666665, ans=0.125 2023-11-26 06:58:40,847 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.18 vs. limit=15.0 2023-11-26 06:58:48,410 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 490900 2023-11-26 06:58:49,597 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=3272626.6666666665, ans=10.0 2023-11-26 06:58:55,381 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 9950, loss[loss=0.08462, simple_loss=0.1174, pruned_loss=0.01908, audio_tagging_loss=0.006844, over 14376.00 frames. ], tot_loss[loss=0.06715, simple_loss=0.09134, pruned_loss=0.01275, audio_tagging_loss=0.008728, over 3050169.23 frames. ], batch size: 53, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 06:59:14,177 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3272760.0, ans=0.07 2023-11-26 06:59:16,268 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 06:59:21,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3272826.6666666665, ans=0.0 2023-11-26 06:59:22,028 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3272826.6666666665, ans=0.125 2023-11-26 06:59:42,372 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 06:59:44,374 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 490950 2023-11-26 06:59:48,561 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.315e+01 8.546e+01 9.420e+01 1.008e+02 1.364e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-26 06:59:50,741 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 10000, loss[loss=0.07094, simple_loss=0.1019, pruned_loss=0.01103, audio_tagging_loss=0.008959, over 15023.00 frames. ], tot_loss[loss=0.06721, simple_loss=0.09167, pruned_loss=0.01281, audio_tagging_loss=0.008568, over 3049509.39 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:00:06,906 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3273093.3333333335, ans=0.0 2023-11-26 07:00:26,167 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3273226.6666666665, ans=0.125 2023-11-26 07:00:40,154 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 491000 2023-11-26 07:00:47,307 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 10050, loss[loss=0.05804, simple_loss=0.07913, pruned_loss=0.009381, audio_tagging_loss=0.009094, over 15851.00 frames. ], tot_loss[loss=0.06733, simple_loss=0.09204, pruned_loss=0.01274, audio_tagging_loss=0.008573, over 3057770.48 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:00:49,648 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3273360.0, ans=0.0 2023-11-26 07:00:54,390 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 07:01:09,802 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3273493.3333333335, ans=0.1 2023-11-26 07:01:21,602 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.88 vs. limit=15.0 2023-11-26 07:01:27,960 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3273560.0, ans=0.1 2023-11-26 07:01:36,838 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 491050 2023-11-26 07:01:41,054 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.352e+01 8.461e+01 9.073e+01 9.880e+01 1.259e+02, threshold=1.815e+02, percent-clipped=0.0 2023-11-26 07:01:43,275 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 10100, loss[loss=0.07511, simple_loss=0.09585, pruned_loss=0.01699, audio_tagging_loss=0.0102, over 14720.00 frames. ], tot_loss[loss=0.06693, simple_loss=0.09124, pruned_loss=0.01256, audio_tagging_loss=0.008753, over 3048283.89 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:01:48,260 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3273693.3333333335, ans=0.0 2023-11-26 07:02:05,918 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3273826.6666666665, ans=0.125 2023-11-26 07:02:19,853 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=3273893.3333333335, ans=15.0 2023-11-26 07:02:28,944 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 07:02:32,757 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 491100 2023-11-26 07:02:34,943 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3273960.0, ans=0.0 2023-11-26 07:02:36,102 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3273960.0, ans=0.125 2023-11-26 07:02:37,397 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.48 vs. limit=15.0 2023-11-26 07:02:39,072 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 10150, loss[loss=0.05745, simple_loss=0.07705, pruned_loss=0.01082, audio_tagging_loss=0.008108, over 15672.00 frames. ], tot_loss[loss=0.06693, simple_loss=0.09105, pruned_loss=0.01253, audio_tagging_loss=0.008881, over 3045462.26 frames. ], batch size: 59, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:02:43,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3274026.6666666665, ans=0.1 2023-11-26 07:02:50,310 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3274093.3333333335, ans=0.0 2023-11-26 07:03:03,128 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 07:03:06,044 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 07:03:10,535 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3274160.0, ans=0.0 2023-11-26 07:03:19,520 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3274226.6666666665, ans=0.2 2023-11-26 07:03:28,343 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 491150 2023-11-26 07:03:32,400 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.543e+01 8.834e+01 9.375e+01 1.026e+02 1.327e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-26 07:03:33,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3274360.0, ans=0.0 2023-11-26 07:03:34,515 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 10200, loss[loss=0.06211, simple_loss=0.08339, pruned_loss=0.01404, audio_tagging_loss=0.006373, over 14688.00 frames. ], tot_loss[loss=0.06697, simple_loss=0.09086, pruned_loss=0.01262, audio_tagging_loss=0.008919, over 3042853.97 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:03:35,370 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3274360.0, ans=0.2 2023-11-26 07:03:47,248 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3274426.6666666665, ans=0.1 2023-11-26 07:03:49,548 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3274426.6666666665, ans=0.0 2023-11-26 07:03:54,667 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3274426.6666666665, ans=0.125 2023-11-26 07:03:55,605 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 07:04:12,806 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3274560.0, ans=0.0 2023-11-26 07:04:20,832 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3274626.6666666665, ans=0.125 2023-11-26 07:04:23,794 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 491200 2023-11-26 07:04:30,702 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 10250, loss[loss=0.05692, simple_loss=0.07602, pruned_loss=0.008701, audio_tagging_loss=0.01021, over 15078.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.0901, pruned_loss=0.01248, audio_tagging_loss=0.00892, over 3047346.48 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:04:30,916 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff3.min_abs, batch_count=3274693.3333333335, ans=0.2 2023-11-26 07:04:47,642 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.91 vs. limit=15.0 2023-11-26 07:04:49,477 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 07:05:16,325 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3274960.0, ans=0.0 2023-11-26 07:05:18,471 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3274960.0, ans=0.0 2023-11-26 07:05:19,434 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 491250 2023-11-26 07:05:23,091 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.44 vs. limit=6.0 2023-11-26 07:05:23,640 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.179e+01 8.938e+01 9.745e+01 1.064e+02 1.415e+02, threshold=1.949e+02, percent-clipped=0.0 2023-11-26 07:05:25,868 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 10300, loss[loss=0.06262, simple_loss=0.07952, pruned_loss=0.01252, audio_tagging_loss=0.01034, over 15295.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.09044, pruned_loss=0.01265, audio_tagging_loss=0.008939, over 3051711.02 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:05:40,349 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.71 vs. limit=15.0 2023-11-26 07:05:48,836 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3275160.0, ans=0.125 2023-11-26 07:05:59,517 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.53 vs. limit=15.0 2023-11-26 07:06:15,255 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 491300 2023-11-26 07:06:15,661 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.73 vs. limit=22.5 2023-11-26 07:06:22,372 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 10350, loss[loss=0.08413, simple_loss=0.1226, pruned_loss=0.01661, audio_tagging_loss=0.006232, over 15877.00 frames. ], tot_loss[loss=0.06705, simple_loss=0.09079, pruned_loss=0.0127, audio_tagging_loss=0.008949, over 3054400.91 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:06:34,006 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3275426.6666666665, ans=0.1 2023-11-26 07:06:36,515 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.31 vs. limit=15.0 2023-11-26 07:06:37,171 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3275426.6666666665, ans=0.0 2023-11-26 07:06:37,597 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.79 vs. limit=15.0 2023-11-26 07:06:54,929 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3275560.0, ans=0.125 2023-11-26 07:07:07,837 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.99 vs. limit=15.0 2023-11-26 07:07:11,734 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 491350 2023-11-26 07:07:15,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3275626.6666666665, ans=0.1 2023-11-26 07:07:16,377 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.482e+01 8.783e+01 9.372e+01 1.013e+02 2.774e+02, threshold=1.874e+02, percent-clipped=1.0 2023-11-26 07:07:18,538 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 10400, loss[loss=0.06127, simple_loss=0.08454, pruned_loss=0.0105, audio_tagging_loss=0.00849, over 15052.00 frames. ], tot_loss[loss=0.06693, simple_loss=0.09053, pruned_loss=0.01259, audio_tagging_loss=0.009076, over 3056921.48 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:07:30,487 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3275760.0, ans=0.125 2023-11-26 07:07:33,583 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3275760.0, ans=0.125 2023-11-26 07:07:36,792 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 07:07:43,207 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3275826.6666666665, ans=0.125 2023-11-26 07:07:48,591 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3275826.6666666665, ans=0.0 2023-11-26 07:07:51,217 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3275893.3333333335, ans=0.125 2023-11-26 07:07:53,355 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3275893.3333333335, ans=0.125 2023-11-26 07:08:03,376 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3275960.0, ans=0.1 2023-11-26 07:08:07,526 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 491400 2023-11-26 07:08:14,163 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 10450, loss[loss=0.06688, simple_loss=0.0905, pruned_loss=0.01327, audio_tagging_loss=0.008362, over 16082.00 frames. ], tot_loss[loss=0.06729, simple_loss=0.09117, pruned_loss=0.0128, audio_tagging_loss=0.008914, over 3056199.07 frames. ], batch size: 59, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:08:21,859 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3276026.6666666665, ans=0.2 2023-11-26 07:08:31,540 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3276093.3333333335, ans=0.1 2023-11-26 07:08:33,196 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3276093.3333333335, ans=0.1 2023-11-26 07:08:48,372 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3276226.6666666665, ans=0.125 2023-11-26 07:08:55,950 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3276226.6666666665, ans=0.2 2023-11-26 07:09:03,208 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 491450 2023-11-26 07:09:07,821 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.429e+01 8.707e+01 9.260e+01 9.868e+01 1.345e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-26 07:09:10,554 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 10500, loss[loss=0.07132, simple_loss=0.1036, pruned_loss=0.01172, audio_tagging_loss=0.007796, over 15386.00 frames. ], tot_loss[loss=0.06673, simple_loss=0.09073, pruned_loss=0.01254, audio_tagging_loss=0.008831, over 3055585.56 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:09:17,540 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3276360.0, ans=0.0 2023-11-26 07:09:59,893 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 491500 2023-11-26 07:10:06,819 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 10550, loss[loss=0.05753, simple_loss=0.07913, pruned_loss=0.01097, audio_tagging_loss=0.006991, over 15511.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.09017, pruned_loss=0.01246, audio_tagging_loss=0.008774, over 3047083.08 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 07:10:55,645 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 491550 2023-11-26 07:11:00,812 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.273e+01 8.562e+01 9.260e+01 9.916e+01 1.260e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-26 07:11:01,912 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 10600, loss[loss=0.06788, simple_loss=0.08958, pruned_loss=0.01384, audio_tagging_loss=0.009245, over 15328.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.08962, pruned_loss=0.01237, audio_tagging_loss=0.008756, over 3050084.42 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 07:11:11,084 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.16 vs. limit=15.0 2023-11-26 07:11:25,938 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3277160.0, ans=0.5 2023-11-26 07:11:38,015 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3277226.6666666665, ans=0.125 2023-11-26 07:11:40,362 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.84 vs. limit=10.0 2023-11-26 07:11:42,275 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3277226.6666666665, ans=0.125 2023-11-26 07:11:47,567 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3277293.3333333335, ans=0.125 2023-11-26 07:11:49,822 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3277293.3333333335, ans=0.125 2023-11-26 07:11:50,593 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 491600 2023-11-26 07:11:50,723 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3277293.3333333335, ans=0.0 2023-11-26 07:11:57,750 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 10650, loss[loss=0.06092, simple_loss=0.07584, pruned_loss=0.0114, audio_tagging_loss=0.0116, over 15293.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08939, pruned_loss=0.01228, audio_tagging_loss=0.008715, over 3052756.23 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 07:12:16,558 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3277426.6666666665, ans=0.125 2023-11-26 07:12:23,931 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3277493.3333333335, ans=0.125 2023-11-26 07:12:29,701 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.12 vs. limit=22.5 2023-11-26 07:12:34,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3277560.0, ans=0.2 2023-11-26 07:12:37,435 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3277560.0, ans=0.07 2023-11-26 07:12:43,304 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3277626.6666666665, ans=0.0 2023-11-26 07:12:44,377 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3277626.6666666665, ans=0.125 2023-11-26 07:12:46,776 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 491650 2023-11-26 07:12:47,426 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3277626.6666666665, ans=0.125 2023-11-26 07:12:53,545 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.823e+01 8.757e+01 9.487e+01 1.015e+02 1.210e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-26 07:12:53,571 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 10700, loss[loss=0.08135, simple_loss=0.102, pruned_loss=0.02075, audio_tagging_loss=0.009595, over 15287.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08924, pruned_loss=0.01235, audio_tagging_loss=0.008696, over 3049507.78 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 8.0 2023-11-26 07:12:57,665 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3277693.3333333335, ans=0.125 2023-11-26 07:13:11,405 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3277760.0, ans=0.125 2023-11-26 07:13:12,554 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3277760.0, ans=0.1 2023-11-26 07:13:14,858 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.97 vs. limit=22.5 2023-11-26 07:13:41,714 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3277960.0, ans=0.125 2023-11-26 07:13:42,686 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 491700 2023-11-26 07:13:48,929 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 10750, loss[loss=0.06481, simple_loss=0.09113, pruned_loss=0.01116, audio_tagging_loss=0.008086, over 15430.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08944, pruned_loss=0.01229, audio_tagging_loss=0.00865, over 3054932.57 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 8.0 2023-11-26 07:13:50,135 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3278026.6666666665, ans=0.0 2023-11-26 07:13:52,799 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.50 vs. limit=6.0 2023-11-26 07:14:08,872 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3278093.3333333335, ans=0.0 2023-11-26 07:14:15,245 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3278160.0, ans=0.125 2023-11-26 07:14:15,331 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3278160.0, ans=0.125 2023-11-26 07:14:33,643 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3278293.3333333335, ans=0.0 2023-11-26 07:14:37,777 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 491750 2023-11-26 07:14:44,168 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.353e+01 8.438e+01 9.296e+01 1.012e+02 1.543e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-26 07:14:44,195 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 10800, loss[loss=0.06554, simple_loss=0.08135, pruned_loss=0.01712, audio_tagging_loss=0.007739, over 16045.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.09005, pruned_loss=0.01237, audio_tagging_loss=0.008636, over 3060084.56 frames. ], batch size: 62, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 07:14:45,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3278360.0, ans=0.125 2023-11-26 07:14:45,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3278360.0, ans=0.0 2023-11-26 07:14:46,585 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3278360.0, ans=0.125 2023-11-26 07:15:04,023 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.30 vs. limit=22.5 2023-11-26 07:15:06,126 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.74 vs. limit=22.5 2023-11-26 07:15:06,257 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.60 vs. limit=22.5 2023-11-26 07:15:23,732 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3278560.0, ans=0.09899494936611666 2023-11-26 07:15:25,882 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3278560.0, ans=0.2 2023-11-26 07:15:33,513 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 491800 2023-11-26 07:15:41,208 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 10850, loss[loss=0.08123, simple_loss=0.09867, pruned_loss=0.02486, audio_tagging_loss=0.007031, over 14379.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.0892, pruned_loss=0.01235, audio_tagging_loss=0.008645, over 3061292.54 frames. ], batch size: 53, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 07:15:46,204 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3278693.3333333335, ans=0.1 2023-11-26 07:15:47,237 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3278693.3333333335, ans=0.125 2023-11-26 07:15:48,386 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3278693.3333333335, ans=0.125 2023-11-26 07:15:50,478 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3278693.3333333335, ans=0.2 2023-11-26 07:15:57,733 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3278760.0, ans=0.0 2023-11-26 07:16:01,386 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.13 vs. limit=15.0 2023-11-26 07:16:06,208 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3278826.6666666665, ans=0.0 2023-11-26 07:16:21,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3278893.3333333335, ans=0.125 2023-11-26 07:16:24,324 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3278960.0, ans=0.125 2023-11-26 07:16:30,342 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 491850 2023-11-26 07:16:32,829 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.64 vs. limit=22.5 2023-11-26 07:16:33,484 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 07:16:33,683 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3278960.0, ans=0.2 2023-11-26 07:16:36,666 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.345e+01 8.766e+01 9.451e+01 1.013e+02 1.235e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-26 07:16:36,704 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 10900, loss[loss=0.06049, simple_loss=0.08819, pruned_loss=0.008551, audio_tagging_loss=0.007843, over 16239.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08939, pruned_loss=0.01244, audio_tagging_loss=0.008707, over 3058080.96 frames. ], batch size: 59, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 07:16:37,305 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.93 vs. limit=15.0 2023-11-26 07:16:55,472 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.96 vs. limit=15.0 2023-11-26 07:17:02,432 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.66 vs. limit=15.0 2023-11-26 07:17:05,221 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3279160.0, ans=0.125 2023-11-26 07:17:06,314 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 07:17:06,619 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.44 vs. limit=22.5 2023-11-26 07:17:19,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3279226.6666666665, ans=0.0 2023-11-26 07:17:25,290 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 491900 2023-11-26 07:17:26,411 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3279293.3333333335, ans=0.0 2023-11-26 07:17:26,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=3279293.3333333335, ans=15.0 2023-11-26 07:17:31,492 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 10950, loss[loss=0.08816, simple_loss=0.1213, pruned_loss=0.02113, audio_tagging_loss=0.006378, over 15564.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.09029, pruned_loss=0.0127, audio_tagging_loss=0.008727, over 3052254.75 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 07:17:33,350 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.93 vs. limit=22.5 2023-11-26 07:17:35,980 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3279360.0, ans=0.125 2023-11-26 07:17:47,519 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3279426.6666666665, ans=0.0 2023-11-26 07:17:51,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3279426.6666666665, ans=0.125 2023-11-26 07:17:52,400 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3279426.6666666665, ans=0.125 2023-11-26 07:17:55,063 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3279493.3333333335, ans=0.05 2023-11-26 07:18:15,019 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3279626.6666666665, ans=0.2 2023-11-26 07:18:20,723 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 491950 2023-11-26 07:18:27,598 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.477e+01 8.778e+01 9.414e+01 1.024e+02 1.293e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-26 07:18:27,627 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 11000, loss[loss=0.06624, simple_loss=0.09445, pruned_loss=0.009966, audio_tagging_loss=0.009043, over 14426.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.08989, pruned_loss=0.01263, audio_tagging_loss=0.008771, over 3048720.99 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 07:18:36,605 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 07:18:44,713 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 07:18:46,837 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3279760.0, ans=0.2 2023-11-26 07:18:55,286 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3279826.6666666665, ans=0.1 2023-11-26 07:18:57,462 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3279826.6666666665, ans=0.1 2023-11-26 07:19:03,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3279893.3333333335, ans=0.1 2023-11-26 07:19:11,152 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3279960.0, ans=0.125 2023-11-26 07:19:16,794 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 492000 2023-11-26 07:19:25,737 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 11050, loss[loss=0.05503, simple_loss=0.07016, pruned_loss=0.007237, audio_tagging_loss=0.01271, over 14988.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.08997, pruned_loss=0.01269, audio_tagging_loss=0.008886, over 3047963.44 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 07:19:47,151 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.21 vs. limit=22.5 2023-11-26 07:19:49,807 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3280160.0, ans=0.2 2023-11-26 07:20:01,557 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3280226.6666666665, ans=0.125 2023-11-26 07:20:14,511 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 492050 2023-11-26 07:20:17,042 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.72 vs. limit=6.0 2023-11-26 07:20:20,697 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.475e+01 8.875e+01 9.418e+01 1.004e+02 1.333e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-26 07:20:20,725 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 11100, loss[loss=0.07216, simple_loss=0.09157, pruned_loss=0.01435, audio_tagging_loss=0.01202, over 14579.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.0899, pruned_loss=0.01269, audio_tagging_loss=0.009056, over 3052213.74 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 07:20:27,197 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3280360.0, ans=0.125 2023-11-26 07:20:36,218 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3280426.6666666665, ans=0.125 2023-11-26 07:20:38,223 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.46 vs. limit=10.0 2023-11-26 07:20:53,863 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3280560.0, ans=0.2 2023-11-26 07:21:08,443 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3280626.6666666665, ans=0.1 2023-11-26 07:21:09,400 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 492100 2023-11-26 07:21:16,281 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 11150, loss[loss=0.06392, simple_loss=0.08008, pruned_loss=0.01234, audio_tagging_loss=0.01154, over 16633.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.08926, pruned_loss=0.01263, audio_tagging_loss=0.00924, over 3049231.97 frames. ], batch size: 64, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 07:21:21,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3280693.3333333335, ans=0.125 2023-11-26 07:21:26,114 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3280693.3333333335, ans=0.1 2023-11-26 07:21:35,364 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.61 vs. limit=12.0 2023-11-26 07:21:45,325 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.24 vs. limit=6.0 2023-11-26 07:22:04,348 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.16 vs. limit=22.5 2023-11-26 07:22:05,151 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3280960.0, ans=0.2 2023-11-26 07:22:05,966 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 492150 2023-11-26 07:22:12,759 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.732e+01 8.937e+01 9.375e+01 1.012e+02 1.316e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-26 07:22:12,785 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 11200, loss[loss=0.07661, simple_loss=0.0959, pruned_loss=0.01755, audio_tagging_loss=0.0111, over 14423.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.08913, pruned_loss=0.01252, audio_tagging_loss=0.009328, over 3050581.98 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:22:17,803 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3281026.6666666665, ans=0.2 2023-11-26 07:22:20,919 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3281026.6666666665, ans=0.1 2023-11-26 07:22:21,983 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3281026.6666666665, ans=0.0 2023-11-26 07:22:25,493 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.49 vs. limit=22.5 2023-11-26 07:22:30,363 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3281093.3333333335, ans=0.0 2023-11-26 07:22:44,240 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3281160.0, ans=0.0 2023-11-26 07:22:49,451 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3281226.6666666665, ans=0.125 2023-11-26 07:23:01,944 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 492200 2023-11-26 07:23:02,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3281293.3333333335, ans=0.1 2023-11-26 07:23:08,503 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 11250, loss[loss=0.06791, simple_loss=0.08691, pruned_loss=0.01636, audio_tagging_loss=0.0081, over 15816.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.08858, pruned_loss=0.0124, audio_tagging_loss=0.009265, over 3048526.23 frames. ], batch size: 62, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:23:10,156 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.24 vs. limit=15.0 2023-11-26 07:23:10,886 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3281360.0, ans=0.125 2023-11-26 07:23:11,749 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3281360.0, ans=0.035 2023-11-26 07:23:14,071 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.72 vs. limit=15.0 2023-11-26 07:23:19,972 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.33 vs. limit=15.0 2023-11-26 07:23:26,635 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3281426.6666666665, ans=0.125 2023-11-26 07:23:42,191 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.79 vs. limit=10.0 2023-11-26 07:23:57,218 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 492250 2023-11-26 07:24:04,029 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.238e+01 8.640e+01 9.467e+01 1.012e+02 1.426e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 07:24:04,056 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 11300, loss[loss=0.0777, simple_loss=0.1096, pruned_loss=0.01481, audio_tagging_loss=0.008107, over 14819.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.08947, pruned_loss=0.01262, audio_tagging_loss=0.0091, over 3047094.20 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:24:10,422 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.17 vs. limit=10.0 2023-11-26 07:24:13,324 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3281693.3333333335, ans=0.0 2023-11-26 07:24:20,493 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.46 vs. limit=12.0 2023-11-26 07:24:31,233 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3281826.6666666665, ans=0.125 2023-11-26 07:24:53,314 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 492300 2023-11-26 07:24:54,034 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.25 vs. limit=12.0 2023-11-26 07:25:00,210 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 11350, loss[loss=0.0587, simple_loss=0.07758, pruned_loss=0.00936, audio_tagging_loss=0.01055, over 14542.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.08974, pruned_loss=0.01261, audio_tagging_loss=0.0089, over 3042182.45 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:25:12,700 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3282093.3333333335, ans=0.125 2023-11-26 07:25:29,274 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.51 vs. limit=15.0 2023-11-26 07:25:48,940 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 492350 2023-11-26 07:25:55,294 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.562e+01 8.982e+01 9.660e+01 1.025e+02 3.694e+02, threshold=1.932e+02, percent-clipped=1.0 2023-11-26 07:25:55,321 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 11400, loss[loss=0.0396, simple_loss=0.04775, pruned_loss=0.003894, audio_tagging_loss=0.01183, over 13459.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.08988, pruned_loss=0.0125, audio_tagging_loss=0.008804, over 3039671.77 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:26:02,399 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3282360.0, ans=0.1 2023-11-26 07:26:11,447 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3282426.6666666665, ans=0.2 2023-11-26 07:26:44,832 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 492400 2023-11-26 07:26:48,287 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3282626.6666666665, ans=0.1 2023-11-26 07:26:50,969 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3282693.3333333335, ans=0.0 2023-11-26 07:26:51,871 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 11450, loss[loss=0.0681, simple_loss=0.08871, pruned_loss=0.0159, audio_tagging_loss=0.007841, over 15901.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.08971, pruned_loss=0.01251, audio_tagging_loss=0.008828, over 3039400.52 frames. ], batch size: 59, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:27:05,977 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3282760.0, ans=0.125 2023-11-26 07:27:41,105 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 492450 2023-11-26 07:27:45,456 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3282960.0, ans=0.0 2023-11-26 07:27:47,907 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.451e+01 8.866e+01 9.675e+01 1.039e+02 1.240e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-26 07:27:47,934 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 11500, loss[loss=0.07656, simple_loss=0.1089, pruned_loss=0.0148, audio_tagging_loss=0.007292, over 15481.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.08957, pruned_loss=0.0124, audio_tagging_loss=0.00882, over 3045751.91 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:27:49,533 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.whiten.whitening_limit, batch_count=3283026.6666666665, ans=12.0 2023-11-26 07:27:52,324 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3283026.6666666665, ans=0.1 2023-11-26 07:27:52,793 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.02 vs. limit=15.0 2023-11-26 07:27:56,104 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.81 vs. limit=10.0 2023-11-26 07:27:56,770 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3283026.6666666665, ans=0.09899494936611666 2023-11-26 07:27:58,871 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3283093.3333333335, ans=0.125 2023-11-26 07:28:09,213 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.83 vs. limit=15.0 2023-11-26 07:28:25,146 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.80 vs. limit=15.0 2023-11-26 07:28:31,682 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3283293.3333333335, ans=0.0 2023-11-26 07:28:32,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3283293.3333333335, ans=0.0 2023-11-26 07:28:36,733 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 492500 2023-11-26 07:28:41,081 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 07:28:42,923 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 11550, loss[loss=0.06689, simple_loss=0.08744, pruned_loss=0.0123, audio_tagging_loss=0.01088, over 14130.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.09051, pruned_loss=0.01259, audio_tagging_loss=0.00874, over 3049510.78 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:28:52,519 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.15 vs. limit=22.5 2023-11-26 07:29:02,270 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3283426.6666666665, ans=0.125 2023-11-26 07:29:03,334 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3283426.6666666665, ans=0.125 2023-11-26 07:29:13,771 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3283493.3333333335, ans=0.1 2023-11-26 07:29:13,790 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3283493.3333333335, ans=0.5 2023-11-26 07:29:16,758 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 07:29:16,879 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3283560.0, ans=0.035 2023-11-26 07:29:22,348 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3283560.0, ans=0.2 2023-11-26 07:29:31,616 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 492550 2023-11-26 07:29:38,918 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.351e+01 8.923e+01 9.634e+01 1.014e+02 1.304e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-26 07:29:38,957 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 11600, loss[loss=0.0824, simple_loss=0.1095, pruned_loss=0.01849, audio_tagging_loss=0.009173, over 14867.00 frames. ], tot_loss[loss=0.06699, simple_loss=0.09133, pruned_loss=0.01268, audio_tagging_loss=0.008641, over 3051075.99 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:29:45,463 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3283693.3333333335, ans=0.2 2023-11-26 07:29:50,470 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3283760.0, ans=0.125 2023-11-26 07:30:18,363 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3283893.3333333335, ans=0.125 2023-11-26 07:30:27,746 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 492600 2023-11-26 07:30:27,820 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3283960.0, ans=0.0 2023-11-26 07:30:34,816 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 11650, loss[loss=0.0566, simple_loss=0.0788, pruned_loss=0.008577, audio_tagging_loss=0.008623, over 15010.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.0904, pruned_loss=0.01257, audio_tagging_loss=0.008728, over 3050556.44 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:30:36,058 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3284026.6666666665, ans=0.1 2023-11-26 07:30:50,824 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.68 vs. limit=12.0 2023-11-26 07:31:03,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3284160.0, ans=0.09899494936611666 2023-11-26 07:31:05,213 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.41 vs. limit=6.0 2023-11-26 07:31:11,238 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3284226.6666666665, ans=0.2 2023-11-26 07:31:17,365 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.75 vs. limit=12.0 2023-11-26 07:31:22,851 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3284293.3333333335, ans=0.0 2023-11-26 07:31:23,689 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 492650 2023-11-26 07:31:29,933 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.387e+01 8.493e+01 9.108e+01 9.754e+01 1.305e+02, threshold=1.822e+02, percent-clipped=0.0 2023-11-26 07:31:29,960 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 11700, loss[loss=0.05119, simple_loss=0.07223, pruned_loss=0.007102, audio_tagging_loss=0.007973, over 15730.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.08995, pruned_loss=0.01247, audio_tagging_loss=0.008831, over 3057776.57 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:31:40,741 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3284426.6666666665, ans=0.125 2023-11-26 07:31:42,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3284426.6666666665, ans=0.0 2023-11-26 07:31:48,679 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.94 vs. limit=15.0 2023-11-26 07:31:58,245 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3284493.3333333335, ans=0.0 2023-11-26 07:31:59,261 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 07:32:00,283 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 07:32:01,894 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-26 07:32:11,361 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3284560.0, ans=0.125 2023-11-26 07:32:18,459 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 492700 2023-11-26 07:32:24,688 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 11750, loss[loss=0.06143, simple_loss=0.07994, pruned_loss=0.01057, audio_tagging_loss=0.01088, over 15253.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.08949, pruned_loss=0.01239, audio_tagging_loss=0.008795, over 3055869.29 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:32:28,855 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.37 vs. limit=10.0 2023-11-26 07:32:35,433 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3284760.0, ans=0.1 2023-11-26 07:32:35,800 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.38 vs. limit=15.0 2023-11-26 07:33:00,669 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3284893.3333333335, ans=0.0 2023-11-26 07:33:12,943 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3284960.0, ans=0.1 2023-11-26 07:33:14,337 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 492750 2023-11-26 07:33:17,669 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 07:33:21,086 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.269e+01 8.644e+01 9.343e+01 1.016e+02 1.345e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-26 07:33:21,113 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 11800, loss[loss=0.0795, simple_loss=0.1142, pruned_loss=0.01462, audio_tagging_loss=0.007773, over 15745.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.08958, pruned_loss=0.0124, audio_tagging_loss=0.008865, over 3052618.68 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 32.0 2023-11-26 07:34:04,398 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3285293.3333333335, ans=0.0 2023-11-26 07:34:04,413 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3285293.3333333335, ans=0.0 2023-11-26 07:34:05,520 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3285293.3333333335, ans=0.0 2023-11-26 07:34:05,589 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3285293.3333333335, ans=0.2 2023-11-26 07:34:10,110 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 492800 2023-11-26 07:34:16,647 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 11850, loss[loss=0.0659, simple_loss=0.08339, pruned_loss=0.01438, audio_tagging_loss=0.009826, over 15566.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08884, pruned_loss=0.01235, audio_tagging_loss=0.008906, over 3055744.61 frames. ], batch size: 60, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 07:34:23,062 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3285360.0, ans=0.125 2023-11-26 07:34:59,078 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3285560.0, ans=0.0 2023-11-26 07:35:05,305 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 492850 2023-11-26 07:35:11,678 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 11900, loss[loss=0.07198, simple_loss=0.1016, pruned_loss=0.01165, audio_tagging_loss=0.009521, over 14324.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.0895, pruned_loss=0.01232, audio_tagging_loss=0.008989, over 3058648.37 frames. ], batch size: 53, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 07:35:11,846 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3285693.3333333335, ans=0.125 2023-11-26 07:35:12,696 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.488e+01 8.644e+01 9.176e+01 9.875e+01 1.365e+02, threshold=1.835e+02, percent-clipped=0.0 2023-11-26 07:35:26,905 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3285760.0, ans=0.0 2023-11-26 07:35:27,904 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3285760.0, ans=0.05 2023-11-26 07:35:44,398 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3285893.3333333335, ans=0.125 2023-11-26 07:35:49,803 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.69 vs. limit=22.5 2023-11-26 07:36:00,483 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 492900 2023-11-26 07:36:05,876 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3286026.6666666665, ans=0.1 2023-11-26 07:36:07,302 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 11950, loss[loss=0.06952, simple_loss=0.09582, pruned_loss=0.01343, audio_tagging_loss=0.008174, over 15199.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08869, pruned_loss=0.0121, audio_tagging_loss=0.009059, over 3056815.56 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 07:36:18,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3286093.3333333335, ans=0.125 2023-11-26 07:36:22,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3286093.3333333335, ans=0.0 2023-11-26 07:36:32,991 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.81 vs. limit=15.0 2023-11-26 07:36:46,501 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3286226.6666666665, ans=0.0 2023-11-26 07:36:50,543 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3286293.3333333335, ans=0.0 2023-11-26 07:36:52,499 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3286293.3333333335, ans=0.125 2023-11-26 07:36:54,465 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 492950 2023-11-26 07:37:00,469 INFO [train_asr.py:1235] (3/4) Epoch 41, batch 12000, loss[loss=0.07073, simple_loss=0.09345, pruned_loss=0.01426, audio_tagging_loss=0.009739, over 14724.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.08886, pruned_loss=0.01219, audio_tagging_loss=0.009143, over 3053431.66 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-26 07:37:00,470 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-26 07:37:27,821 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.9915, 5.8503, 5.6569, 5.5555], device='cuda:3') 2023-11-26 07:37:31,429 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.0394, 2.5538, 2.8774, 2.7494], device='cuda:3') 2023-11-26 07:37:33,026 INFO [train_asr.py:1267] (3/4) Epoch 41, validation: loss=0.05803, simple_loss=0.05068, pruned_loss=0.005323, audio_tagging_loss=0.02736, over 4681554.00 frames. 2023-11-26 07:37:33,027 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-26 07:37:35,014 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.239e+01 8.785e+01 9.392e+01 1.025e+02 1.388e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-26 07:37:51,600 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3286426.6666666665, ans=0.1 2023-11-26 07:38:28,137 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 0, loss[loss=0.06994, simple_loss=0.08031, pruned_loss=0.01016, audio_tagging_loss=0.01962, over 14886.00 frames. ], tot_loss[loss=0.06994, simple_loss=0.08031, pruned_loss=0.01016, audio_tagging_loss=0.01962, over 14886.00 frames. ], batch size: 55, lr: 1.62e-03, grad_scale: 32.0 2023-11-26 07:38:28,138 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-26 07:38:41,160 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.3625, 3.3312, 3.7567, 3.5725], device='cuda:3') 2023-11-26 07:38:59,432 INFO [train_asr.py:1267] (3/4) Epoch 42, validation: loss=0.05791, simple_loss=0.05064, pruned_loss=0.005256, audio_tagging_loss=0.02733, over 4681554.00 frames. 2023-11-26 07:38:59,433 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-26 07:39:01,759 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3286513.3333333335, ans=0.5 2023-11-26 07:39:01,789 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3286513.3333333335, ans=0.0 2023-11-26 07:39:08,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3286513.3333333335, ans=0.5 2023-11-26 07:39:23,564 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 493000 2023-11-26 07:39:35,209 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.40 vs. limit=22.5 2023-11-26 07:39:38,698 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.08 vs. limit=6.0 2023-11-26 07:39:54,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3286780.0, ans=0.0 2023-11-26 07:39:54,120 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3286780.0, ans=0.0 2023-11-26 07:39:56,093 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 50, loss[loss=0.08408, simple_loss=0.1164, pruned_loss=0.01271, audio_tagging_loss=0.01318, over 15841.00 frames. ], tot_loss[loss=0.0711, simple_loss=0.08469, pruned_loss=0.01136, audio_tagging_loss=0.01739, over 687944.65 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 32.0 2023-11-26 07:39:56,709 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.95 vs. limit=10.0 2023-11-26 07:40:01,696 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3286846.6666666665, ans=0.1 2023-11-26 07:40:19,851 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 493050 2023-11-26 07:40:22,722 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3286980.0, ans=0.125 2023-11-26 07:40:25,223 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.35 vs. limit=15.0 2023-11-26 07:40:29,378 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.809e+01 9.630e+01 1.022e+02 1.088e+02 1.448e+02, threshold=2.045e+02, percent-clipped=0.0 2023-11-26 07:40:52,961 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 100, loss[loss=0.05802, simple_loss=0.0635, pruned_loss=0.007658, audio_tagging_loss=0.01861, over 15189.00 frames. ], tot_loss[loss=0.07215, simple_loss=0.08733, pruned_loss=0.012, audio_tagging_loss=0.01649, over 1204197.09 frames. ], batch size: 59, lr: 1.62e-03, grad_scale: 32.0 2023-11-26 07:41:16,409 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 493100 2023-11-26 07:41:26,209 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.17 vs. limit=15.0 2023-11-26 07:41:39,646 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.28 vs. limit=10.0 2023-11-26 07:41:48,989 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 150, loss[loss=0.06035, simple_loss=0.08487, pruned_loss=0.008394, audio_tagging_loss=0.009522, over 15963.00 frames. ], tot_loss[loss=0.07099, simple_loss=0.08807, pruned_loss=0.0122, audio_tagging_loss=0.01475, over 1614997.98 frames. ], batch size: 59, lr: 1.62e-03, grad_scale: 32.0 2023-11-26 07:41:52,397 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3287513.3333333335, ans=0.0 2023-11-26 07:41:57,643 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.76 vs. limit=15.0 2023-11-26 07:41:59,435 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3287580.0, ans=0.125 2023-11-26 07:42:03,605 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3287580.0, ans=0.09899494936611666 2023-11-26 07:42:13,176 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 493150 2023-11-26 07:42:21,489 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.833e+01 9.081e+01 9.641e+01 1.033e+02 1.343e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-26 07:42:39,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3287780.0, ans=0.125 2023-11-26 07:42:41,927 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3287780.0, ans=0.125 2023-11-26 07:42:44,912 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 200, loss[loss=0.0931, simple_loss=0.134, pruned_loss=0.01988, audio_tagging_loss=0.00624, over 15386.00 frames. ], tot_loss[loss=0.06998, simple_loss=0.08886, pruned_loss=0.01245, audio_tagging_loss=0.0131, over 1936505.64 frames. ], batch size: 56, lr: 1.62e-03, grad_scale: 32.0 2023-11-26 07:43:08,493 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 493200 2023-11-26 07:43:12,686 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3287980.0, ans=0.05 2023-11-26 07:43:15,860 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3287980.0, ans=0.125 2023-11-26 07:43:32,042 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.16 vs. limit=15.0 2023-11-26 07:43:39,980 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3288113.3333333335, ans=0.125 2023-11-26 07:43:41,875 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 250, loss[loss=0.0796, simple_loss=0.1141, pruned_loss=0.01793, audio_tagging_loss=0.004644, over 15811.00 frames. ], tot_loss[loss=0.06945, simple_loss=0.09016, pruned_loss=0.01262, audio_tagging_loss=0.01175, over 2177275.63 frames. ], batch size: 60, lr: 1.62e-03, grad_scale: 32.0 2023-11-26 07:43:43,157 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3288180.0, ans=0.125 2023-11-26 07:43:59,221 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.25 vs. limit=15.0 2023-11-26 07:44:05,419 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 493250 2023-11-26 07:44:06,190 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.90 vs. limit=6.0 2023-11-26 07:44:14,641 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.995e+01 8.741e+01 9.429e+01 1.027e+02 1.277e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-26 07:44:17,942 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.83 vs. limit=15.0 2023-11-26 07:44:23,728 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.12 vs. limit=15.0 2023-11-26 07:44:31,676 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3288446.6666666665, ans=0.125 2023-11-26 07:44:37,411 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 300, loss[loss=0.05417, simple_loss=0.07445, pruned_loss=0.008076, audio_tagging_loss=0.008867, over 15541.00 frames. ], tot_loss[loss=0.06892, simple_loss=0.09069, pruned_loss=0.01271, audio_tagging_loss=0.01086, over 2372458.90 frames. ], batch size: 59, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 07:44:37,626 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3288513.3333333335, ans=0.025 2023-11-26 07:45:00,680 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 493300 2023-11-26 07:45:12,397 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2023-11-26 07:45:13,101 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3288713.3333333335, ans=0.0 2023-11-26 07:45:24,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3288780.0, ans=0.1 2023-11-26 07:45:31,094 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3288780.0, ans=0.0 2023-11-26 07:45:33,046 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 350, loss[loss=0.06186, simple_loss=0.0879, pruned_loss=0.01132, audio_tagging_loss=0.006585, over 16153.00 frames. ], tot_loss[loss=0.06786, simple_loss=0.09047, pruned_loss=0.01243, audio_tagging_loss=0.01019, over 2519838.58 frames. ], batch size: 63, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 07:45:56,624 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 493350 2023-11-26 07:46:05,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3289046.6666666665, ans=0.1 2023-11-26 07:46:06,686 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.530e+01 8.599e+01 9.325e+01 1.001e+02 1.376e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-26 07:46:16,951 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3289113.3333333335, ans=0.125 2023-11-26 07:46:29,010 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 400, loss[loss=0.06423, simple_loss=0.09065, pruned_loss=0.01095, audio_tagging_loss=0.007948, over 16746.00 frames. ], tot_loss[loss=0.06788, simple_loss=0.09102, pruned_loss=0.0126, audio_tagging_loss=0.009781, over 2642308.65 frames. ], batch size: 63, lr: 1.62e-03, grad_scale: 32.0 2023-11-26 07:46:31,838 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.13 vs. limit=10.0 2023-11-26 07:46:52,930 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 493400 2023-11-26 07:46:53,130 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3289313.3333333335, ans=0.0 2023-11-26 07:47:00,662 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3289313.3333333335, ans=0.125 2023-11-26 07:47:15,136 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3289446.6666666665, ans=0.125 2023-11-26 07:47:21,003 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3289446.6666666665, ans=0.125 2023-11-26 07:47:25,004 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 450, loss[loss=0.04908, simple_loss=0.06109, pruned_loss=0.009257, audio_tagging_loss=0.009277, over 14864.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.08973, pruned_loss=0.01246, audio_tagging_loss=0.009513, over 2727117.47 frames. ], batch size: 56, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 07:47:28,960 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_na.min_abs, batch_count=3289513.3333333335, ans=0.02 2023-11-26 07:47:38,981 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3289580.0, ans=0.07 2023-11-26 07:47:48,964 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 493450 2023-11-26 07:47:59,456 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.849e+01 8.870e+01 9.366e+01 1.009e+02 1.216e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-26 07:48:02,343 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3289713.3333333335, ans=0.125 2023-11-26 07:48:21,425 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 500, loss[loss=0.06602, simple_loss=0.09702, pruned_loss=0.007759, audio_tagging_loss=0.009752, over 14930.00 frames. ], tot_loss[loss=0.0669, simple_loss=0.08986, pruned_loss=0.01263, audio_tagging_loss=0.00934, over 2795818.70 frames. ], batch size: 54, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 07:48:31,697 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3289913.3333333335, ans=0.1 2023-11-26 07:48:33,570 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3289913.3333333335, ans=0.125 2023-11-26 07:48:36,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3289913.3333333335, ans=0.125 2023-11-26 07:48:44,916 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 493500 2023-11-26 07:48:46,463 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.74 vs. limit=15.0 2023-11-26 07:49:13,340 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3290113.3333333335, ans=0.125 2023-11-26 07:49:17,261 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 550, loss[loss=0.06304, simple_loss=0.08853, pruned_loss=0.01248, audio_tagging_loss=0.006304, over 15166.00 frames. ], tot_loss[loss=0.06705, simple_loss=0.0902, pruned_loss=0.01272, audio_tagging_loss=0.009228, over 2853989.60 frames. ], batch size: 60, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 07:49:30,787 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 07:49:40,294 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.82 vs. limit=22.5 2023-11-26 07:49:40,758 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 493550 2023-11-26 07:49:48,431 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.18 vs. limit=15.0 2023-11-26 07:49:51,813 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.224e+01 8.897e+01 9.489e+01 1.022e+02 1.296e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-26 07:49:54,120 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3290380.0, ans=0.125 2023-11-26 07:49:55,764 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 07:49:56,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3290380.0, ans=0.05 2023-11-26 07:50:13,174 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 600, loss[loss=0.06652, simple_loss=0.08822, pruned_loss=0.0102, audio_tagging_loss=0.01221, over 14649.00 frames. ], tot_loss[loss=0.06705, simple_loss=0.0907, pruned_loss=0.01266, audio_tagging_loss=0.009037, over 2893660.25 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 07:50:19,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3290513.3333333335, ans=0.125 2023-11-26 07:50:20,384 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3290513.3333333335, ans=0.1 2023-11-26 07:50:21,444 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3290513.3333333335, ans=0.125 2023-11-26 07:50:34,091 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.35 vs. limit=15.0 2023-11-26 07:50:36,746 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 493600 2023-11-26 07:50:40,906 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 07:50:51,586 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3290713.3333333335, ans=0.0 2023-11-26 07:51:09,748 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 650, loss[loss=0.06001, simple_loss=0.08381, pruned_loss=0.01049, audio_tagging_loss=0.007609, over 13934.00 frames. ], tot_loss[loss=0.06733, simple_loss=0.09098, pruned_loss=0.01285, audio_tagging_loss=0.008992, over 2929550.74 frames. ], batch size: 53, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 07:51:11,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3290846.6666666665, ans=0.0 2023-11-26 07:51:21,057 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3290913.3333333335, ans=0.04949747468305833 2023-11-26 07:51:32,502 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 493650 2023-11-26 07:51:36,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3290980.0, ans=0.0 2023-11-26 07:51:44,419 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.211e+01 8.602e+01 9.245e+01 1.014e+02 1.320e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-26 07:51:44,570 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3291046.6666666665, ans=0.1 2023-11-26 07:52:02,365 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.74 vs. limit=15.0 2023-11-26 07:52:05,560 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 700, loss[loss=0.08262, simple_loss=0.1256, pruned_loss=0.01465, audio_tagging_loss=0.005151, over 16745.00 frames. ], tot_loss[loss=0.06821, simple_loss=0.09229, pruned_loss=0.01303, audio_tagging_loss=0.009036, over 2954330.86 frames. ], batch size: 58, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 07:52:29,163 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 493700 2023-11-26 07:52:39,351 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3291380.0, ans=0.125 2023-11-26 07:52:42,657 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3291380.0, ans=0.05 2023-11-26 07:52:50,772 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3291446.6666666665, ans=0.125 2023-11-26 07:52:53,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3291446.6666666665, ans=0.0 2023-11-26 07:52:55,963 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3291446.6666666665, ans=0.0 2023-11-26 07:53:01,092 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 750, loss[loss=0.05542, simple_loss=0.07784, pruned_loss=0.007162, audio_tagging_loss=0.009333, over 17288.00 frames. ], tot_loss[loss=0.06713, simple_loss=0.09093, pruned_loss=0.01263, audio_tagging_loss=0.009035, over 2981109.28 frames. ], batch size: 64, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 07:53:05,560 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3291513.3333333335, ans=0.125 2023-11-26 07:53:17,833 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3291580.0, ans=0.125 2023-11-26 07:53:25,239 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 493750 2023-11-26 07:53:33,190 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3291646.6666666665, ans=0.125 2023-11-26 07:53:36,199 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.399e+01 8.579e+01 9.292e+01 9.836e+01 1.327e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-26 07:53:40,740 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3291713.3333333335, ans=0.0 2023-11-26 07:53:42,870 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3291713.3333333335, ans=0.0 2023-11-26 07:53:43,414 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.17 vs. limit=22.5 2023-11-26 07:53:44,232 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.33 vs. limit=22.5 2023-11-26 07:53:58,241 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 800, loss[loss=0.076, simple_loss=0.1138, pruned_loss=0.01084, audio_tagging_loss=0.008248, over 15350.00 frames. ], tot_loss[loss=0.06698, simple_loss=0.09039, pruned_loss=0.01269, audio_tagging_loss=0.009101, over 2998111.77 frames. ], batch size: 56, lr: 1.62e-03, grad_scale: 32.0 2023-11-26 07:54:04,852 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3291846.6666666665, ans=0.2 2023-11-26 07:54:16,012 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3291913.3333333335, ans=0.125 2023-11-26 07:54:21,043 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 493800 2023-11-26 07:54:26,735 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3291980.0, ans=0.125 2023-11-26 07:54:36,063 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3292046.6666666665, ans=0.05 2023-11-26 07:54:41,857 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3292113.3333333335, ans=0.125 2023-11-26 07:54:42,970 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3292113.3333333335, ans=0.2 2023-11-26 07:54:50,557 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3292113.3333333335, ans=0.125 2023-11-26 07:54:53,894 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 850, loss[loss=0.06716, simple_loss=0.09598, pruned_loss=0.01016, audio_tagging_loss=0.009014, over 14813.00 frames. ], tot_loss[loss=0.06679, simple_loss=0.09043, pruned_loss=0.01249, audio_tagging_loss=0.009087, over 3011011.09 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 07:55:04,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3292246.6666666665, ans=0.2 2023-11-26 07:55:06,713 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3292246.6666666665, ans=0.125 2023-11-26 07:55:16,667 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 493850 2023-11-26 07:55:25,161 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3292313.3333333335, ans=0.125 2023-11-26 07:55:29,199 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.705e+01 8.678e+01 9.372e+01 1.019e+02 1.445e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-26 07:55:29,542 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3292380.0, ans=0.0 2023-11-26 07:55:48,986 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 900, loss[loss=0.0582, simple_loss=0.07647, pruned_loss=0.01033, audio_tagging_loss=0.009632, over 15434.00 frames. ], tot_loss[loss=0.0672, simple_loss=0.09114, pruned_loss=0.01254, audio_tagging_loss=0.009092, over 3020616.51 frames. ], batch size: 59, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 07:55:49,527 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.30 vs. limit=22.5 2023-11-26 07:55:52,411 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3292513.3333333335, ans=0.125 2023-11-26 07:56:07,802 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3292580.0, ans=0.0 2023-11-26 07:56:13,198 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 493900 2023-11-26 07:56:45,002 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 950, loss[loss=0.0753, simple_loss=0.107, pruned_loss=0.01392, audio_tagging_loss=0.007906, over 16496.00 frames. ], tot_loss[loss=0.06708, simple_loss=0.09112, pruned_loss=0.01247, audio_tagging_loss=0.009051, over 3028163.42 frames. ], batch size: 61, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 07:57:09,232 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 493950 2023-11-26 07:57:09,331 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3292980.0, ans=0.125 2023-11-26 07:57:09,333 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3292980.0, ans=0.125 2023-11-26 07:57:10,471 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3292980.0, ans=0.125 2023-11-26 07:57:20,925 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.052e+01 8.738e+01 9.325e+01 9.888e+01 1.254e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-26 07:57:23,239 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3293046.6666666665, ans=0.1 2023-11-26 07:57:25,219 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.43 vs. limit=22.5 2023-11-26 07:57:34,263 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.07 vs. limit=22.5 2023-11-26 07:57:37,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3293113.3333333335, ans=0.1 2023-11-26 07:57:39,466 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3293113.3333333335, ans=0.0 2023-11-26 07:57:41,281 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 1000, loss[loss=0.04166, simple_loss=0.04096, pruned_loss=0.007623, audio_tagging_loss=0.01356, over 14763.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.09016, pruned_loss=0.01234, audio_tagging_loss=0.008964, over 3023805.72 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 07:57:41,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3293180.0, ans=0.125 2023-11-26 07:57:50,075 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.10 vs. limit=6.0 2023-11-26 07:58:04,308 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 07:58:04,340 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 494000 2023-11-26 07:58:17,824 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3293380.0, ans=0.125 2023-11-26 07:58:21,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3293380.0, ans=0.1 2023-11-26 07:58:36,679 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3293513.3333333335, ans=0.2 2023-11-26 07:58:37,592 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 1050, loss[loss=0.05208, simple_loss=0.07453, pruned_loss=0.006556, audio_tagging_loss=0.008262, over 15101.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.08996, pruned_loss=0.01238, audio_tagging_loss=0.008833, over 3030885.05 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 07:58:42,110 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3293513.3333333335, ans=0.0 2023-11-26 07:58:51,230 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3293580.0, ans=0.1 2023-11-26 07:58:56,958 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3293580.0, ans=0.1 2023-11-26 07:59:01,667 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 494050 2023-11-26 07:59:08,233 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3293646.6666666665, ans=0.2 2023-11-26 07:59:13,865 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.295e+01 8.710e+01 9.431e+01 1.020e+02 1.408e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-26 07:59:23,202 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.41 vs. limit=15.0 2023-11-26 07:59:33,784 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 1100, loss[loss=0.06086, simple_loss=0.08198, pruned_loss=0.0091, audio_tagging_loss=0.01078, over 15625.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08934, pruned_loss=0.01217, audio_tagging_loss=0.008771, over 3036118.40 frames. ], batch size: 58, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 07:59:36,065 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 07:59:38,847 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3293846.6666666665, ans=0.0 2023-11-26 07:59:39,983 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3293846.6666666665, ans=0.125 2023-11-26 07:59:42,651 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3293846.6666666665, ans=0.125 2023-11-26 07:59:45,305 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.91 vs. limit=15.0 2023-11-26 07:59:55,488 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.90 vs. limit=15.0 2023-11-26 07:59:58,065 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 494100 2023-11-26 08:00:14,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3294046.6666666665, ans=0.125 2023-11-26 08:00:30,392 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 1150, loss[loss=0.06665, simple_loss=0.1006, pruned_loss=0.01076, audio_tagging_loss=0.005618, over 15430.00 frames. ], tot_loss[loss=0.06692, simple_loss=0.09148, pruned_loss=0.01255, audio_tagging_loss=0.00863, over 3036171.31 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 08:00:33,778 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3294180.0, ans=0.05 2023-11-26 08:00:36,703 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.12 vs. limit=10.0 2023-11-26 08:00:50,144 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3294246.6666666665, ans=0.125 2023-11-26 08:00:53,170 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 494150 2023-11-26 08:01:05,956 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.933e+01 8.571e+01 9.145e+01 9.893e+01 1.532e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-26 08:01:23,323 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3294446.6666666665, ans=0.2 2023-11-26 08:01:26,212 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 1200, loss[loss=0.06244, simple_loss=0.08767, pruned_loss=0.01111, audio_tagging_loss=0.007499, over 15668.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.09135, pruned_loss=0.01261, audio_tagging_loss=0.00859, over 3035736.85 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 32.0 2023-11-26 08:01:31,692 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3294513.3333333335, ans=0.2 2023-11-26 08:01:45,143 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3294580.0, ans=0.0 2023-11-26 08:01:49,750 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 494200 2023-11-26 08:02:08,394 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3294713.3333333335, ans=0.1 2023-11-26 08:02:18,099 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.02 vs. limit=15.0 2023-11-26 08:02:22,000 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 1250, loss[loss=0.03764, simple_loss=0.05223, pruned_loss=0.004397, audio_tagging_loss=0.007123, over 14716.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.09021, pruned_loss=0.0126, audio_tagging_loss=0.008571, over 3041547.90 frames. ], batch size: 59, lr: 1.62e-03, grad_scale: 32.0 2023-11-26 08:02:30,768 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3294846.6666666665, ans=0.125 2023-11-26 08:02:30,769 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3294846.6666666665, ans=0.1 2023-11-26 08:02:46,657 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 494250 2023-11-26 08:02:58,365 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.703e+01 8.635e+01 9.244e+01 9.927e+01 1.336e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-26 08:03:00,731 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3295046.6666666665, ans=0.125 2023-11-26 08:03:18,634 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 1300, loss[loss=0.04124, simple_loss=0.04532, pruned_loss=0.006309, audio_tagging_loss=0.01227, over 14645.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08893, pruned_loss=0.0124, audio_tagging_loss=0.008741, over 3036910.92 frames. ], batch size: 58, lr: 1.62e-03, grad_scale: 32.0 2023-11-26 08:03:19,921 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3295180.0, ans=0.125 2023-11-26 08:03:22,013 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3295180.0, ans=0.0 2023-11-26 08:03:27,375 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3295180.0, ans=0.125 2023-11-26 08:03:41,206 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3295313.3333333335, ans=0.0 2023-11-26 08:03:42,032 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 494300 2023-11-26 08:04:07,139 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3295446.6666666665, ans=0.125 2023-11-26 08:04:08,651 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3295446.6666666665, ans=0.125 2023-11-26 08:04:14,951 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 1350, loss[loss=0.07042, simple_loss=0.09652, pruned_loss=0.01397, audio_tagging_loss=0.008193, over 14083.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.09016, pruned_loss=0.01262, audio_tagging_loss=0.008625, over 3044136.00 frames. ], batch size: 53, lr: 1.62e-03, grad_scale: 32.0 2023-11-26 08:04:20,544 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3295513.3333333335, ans=0.0 2023-11-26 08:04:37,919 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3295646.6666666665, ans=0.0 2023-11-26 08:04:38,690 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 494350 2023-11-26 08:04:38,858 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3295646.6666666665, ans=0.1 2023-11-26 08:04:48,567 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3295713.3333333335, ans=0.125 2023-11-26 08:04:52,464 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.588e+01 8.816e+01 9.406e+01 1.018e+02 1.240e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-26 08:04:52,798 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=3295713.3333333335, ans=0.2 2023-11-26 08:04:55,787 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 08:04:59,294 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3295780.0, ans=0.125 2023-11-26 08:05:02,368 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3295780.0, ans=0.1 2023-11-26 08:05:10,826 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 1400, loss[loss=0.06387, simple_loss=0.08751, pruned_loss=0.013, audio_tagging_loss=0.007117, over 15237.00 frames. ], tot_loss[loss=0.0663, simple_loss=0.09004, pruned_loss=0.01256, audio_tagging_loss=0.008715, over 3047887.31 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 08:05:27,217 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3295913.3333333335, ans=0.125 2023-11-26 08:05:34,857 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 494400 2023-11-26 08:06:07,636 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 1450, loss[loss=0.0622, simple_loss=0.08322, pruned_loss=0.0128, audio_tagging_loss=0.007786, over 14119.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.0897, pruned_loss=0.01244, audio_tagging_loss=0.008831, over 3048945.50 frames. ], batch size: 54, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 08:06:12,105 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3296180.0, ans=0.1 2023-11-26 08:06:19,004 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 08:06:25,032 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.27 vs. limit=15.0 2023-11-26 08:06:28,311 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.90 vs. limit=10.0 2023-11-26 08:06:31,026 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 494450 2023-11-26 08:06:44,217 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.307e+01 8.868e+01 9.341e+01 9.992e+01 1.188e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-26 08:07:04,096 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 1500, loss[loss=0.06003, simple_loss=0.07995, pruned_loss=0.01107, audio_tagging_loss=0.008987, over 14197.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.08921, pruned_loss=0.01232, audio_tagging_loss=0.008894, over 3044118.04 frames. ], batch size: 53, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 08:07:05,320 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3296513.3333333335, ans=0.125 2023-11-26 08:07:07,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3296513.3333333335, ans=0.0 2023-11-26 08:07:25,434 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.57 vs. limit=15.0 2023-11-26 08:07:27,058 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 494500 2023-11-26 08:07:28,221 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3296646.6666666665, ans=0.0 2023-11-26 08:07:28,747 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.96 vs. limit=6.0 2023-11-26 08:07:42,218 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3296713.3333333335, ans=0.125 2023-11-26 08:07:59,526 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 1550, loss[loss=0.08655, simple_loss=0.1163, pruned_loss=0.0202, audio_tagging_loss=0.008203, over 14907.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08864, pruned_loss=0.0122, audio_tagging_loss=0.008976, over 3040145.45 frames. ], batch size: 56, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 08:08:05,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3296846.6666666665, ans=0.125 2023-11-26 08:08:05,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3296846.6666666665, ans=0.0 2023-11-26 08:08:10,794 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3296913.3333333335, ans=0.125 2023-11-26 08:08:20,048 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.45 vs. limit=10.0 2023-11-26 08:08:22,790 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 494550 2023-11-26 08:08:22,908 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3296980.0, ans=0.0 2023-11-26 08:08:31,437 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3296980.0, ans=0.1 2023-11-26 08:08:31,491 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3296980.0, ans=0.1 2023-11-26 08:08:36,563 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.682e+01 8.636e+01 9.426e+01 1.014e+02 1.319e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-26 08:08:42,545 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.27 vs. limit=15.0 2023-11-26 08:08:48,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3297113.3333333335, ans=0.0 2023-11-26 08:08:49,125 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3297113.3333333335, ans=0.0 2023-11-26 08:08:52,152 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3297113.3333333335, ans=0.125 2023-11-26 08:08:55,311 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.88 vs. limit=10.0 2023-11-26 08:08:55,625 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 1600, loss[loss=0.06814, simple_loss=0.09697, pruned_loss=0.01138, audio_tagging_loss=0.00827, over 15799.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08855, pruned_loss=0.01211, audio_tagging_loss=0.009028, over 3041687.70 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 32.0 2023-11-26 08:09:13,314 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3297246.6666666665, ans=0.125 2023-11-26 08:09:19,010 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 494600 2023-11-26 08:09:51,640 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 1650, loss[loss=0.08044, simple_loss=0.1059, pruned_loss=0.01805, audio_tagging_loss=0.009433, over 15292.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.08978, pruned_loss=0.01223, audio_tagging_loss=0.009018, over 3037735.17 frames. ], batch size: 56, lr: 1.62e-03, grad_scale: 32.0 2023-11-26 08:09:54,996 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3297513.3333333335, ans=0.1 2023-11-26 08:10:01,993 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3297580.0, ans=0.2 2023-11-26 08:10:06,279 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3297580.0, ans=0.1 2023-11-26 08:10:09,464 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3297580.0, ans=0.1 2023-11-26 08:10:15,084 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 494650 2023-11-26 08:10:16,586 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.24 vs. limit=22.5 2023-11-26 08:10:28,308 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.653e+01 8.801e+01 9.353e+01 1.001e+02 1.567e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-26 08:10:29,596 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3297713.3333333335, ans=0.125 2023-11-26 08:10:31,069 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.35 vs. limit=15.0 2023-11-26 08:10:46,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3297846.6666666665, ans=0.125 2023-11-26 08:10:47,516 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 1700, loss[loss=0.07475, simple_loss=0.1025, pruned_loss=0.01455, audio_tagging_loss=0.008934, over 16109.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.08992, pruned_loss=0.01235, audio_tagging_loss=0.009071, over 3039140.69 frames. ], batch size: 60, lr: 1.62e-03, grad_scale: 32.0 2023-11-26 08:10:56,228 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.57 vs. limit=15.0 2023-11-26 08:10:56,732 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3297846.6666666665, ans=0.0 2023-11-26 08:11:07,080 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.26 vs. limit=10.0 2023-11-26 08:11:10,996 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 494700 2023-11-26 08:11:25,717 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.54 vs. limit=12.0 2023-11-26 08:11:28,334 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.84 vs. limit=22.5 2023-11-26 08:11:39,231 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3298113.3333333335, ans=0.2 2023-11-26 08:11:43,219 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 1750, loss[loss=0.06643, simple_loss=0.09275, pruned_loss=0.01075, audio_tagging_loss=0.009302, over 14660.00 frames. ], tot_loss[loss=0.06715, simple_loss=0.09137, pruned_loss=0.01251, audio_tagging_loss=0.008954, over 3041847.36 frames. ], batch size: 55, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 08:11:48,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3298180.0, ans=0.125 2023-11-26 08:11:51,590 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.73 vs. limit=15.0 2023-11-26 08:11:52,492 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3298180.0, ans=0.125 2023-11-26 08:12:02,370 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3298246.6666666665, ans=0.125 2023-11-26 08:12:06,564 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 494750 2023-11-26 08:12:15,752 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3298380.0, ans=0.5 2023-11-26 08:12:15,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3298380.0, ans=0.0 2023-11-26 08:12:21,373 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.536e+01 8.900e+01 9.496e+01 1.021e+02 1.422e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-26 08:12:23,813 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3298380.0, ans=0.125 2023-11-26 08:12:30,829 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3298446.6666666665, ans=0.1 2023-11-26 08:12:30,859 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3298446.6666666665, ans=0.0 2023-11-26 08:12:34,916 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3298446.6666666665, ans=0.125 2023-11-26 08:12:39,617 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 1800, loss[loss=0.06516, simple_loss=0.08752, pruned_loss=0.01174, audio_tagging_loss=0.009659, over 15451.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.09064, pruned_loss=0.01243, audio_tagging_loss=0.008938, over 3039964.60 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 08:12:42,993 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3298513.3333333335, ans=0.125 2023-11-26 08:12:46,196 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3298513.3333333335, ans=0.0 2023-11-26 08:12:47,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3298513.3333333335, ans=0.125 2023-11-26 08:12:56,312 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=3298580.0, ans=0.02 2023-11-26 08:12:58,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3298580.0, ans=0.125 2023-11-26 08:12:58,704 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.79 vs. limit=15.0 2023-11-26 08:13:01,527 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3298646.6666666665, ans=0.125 2023-11-26 08:13:03,006 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 494800 2023-11-26 08:13:29,134 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3298780.0, ans=0.125 2023-11-26 08:13:35,228 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 1850, loss[loss=0.08115, simple_loss=0.1148, pruned_loss=0.01624, audio_tagging_loss=0.007507, over 16723.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.09059, pruned_loss=0.01249, audio_tagging_loss=0.008786, over 3043366.91 frames. ], batch size: 59, lr: 1.62e-03, grad_scale: 8.0 2023-11-26 08:13:36,610 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3298846.6666666665, ans=0.0 2023-11-26 08:13:41,146 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3298846.6666666665, ans=0.125 2023-11-26 08:13:55,253 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3298913.3333333335, ans=0.125 2023-11-26 08:13:59,437 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 494850 2023-11-26 08:14:14,831 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.613e+01 8.832e+01 9.434e+01 1.017e+02 1.223e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-26 08:14:17,487 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.01 vs. limit=10.0 2023-11-26 08:14:27,708 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.16 vs. limit=15.0 2023-11-26 08:14:31,937 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 1900, loss[loss=0.07537, simple_loss=0.1074, pruned_loss=0.01407, audio_tagging_loss=0.007625, over 15459.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08919, pruned_loss=0.01227, audio_tagging_loss=0.008799, over 3041953.58 frames. ], batch size: 58, lr: 1.62e-03, grad_scale: 8.0 2023-11-26 08:14:32,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3299180.0, ans=0.0 2023-11-26 08:14:37,142 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.71 vs. limit=12.0 2023-11-26 08:14:37,263 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.76 vs. limit=6.0 2023-11-26 08:14:38,059 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.15 vs. limit=15.0 2023-11-26 08:14:38,827 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3299180.0, ans=0.125 2023-11-26 08:14:48,150 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.59 vs. limit=15.0 2023-11-26 08:14:55,258 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 494900 2023-11-26 08:14:58,557 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3299313.3333333335, ans=0.0 2023-11-26 08:15:03,829 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3299380.0, ans=0.125 2023-11-26 08:15:27,389 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 1950, loss[loss=0.07839, simple_loss=0.1047, pruned_loss=0.01741, audio_tagging_loss=0.008637, over 15448.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.08986, pruned_loss=0.01239, audio_tagging_loss=0.008738, over 3045932.82 frames. ], batch size: 56, lr: 1.62e-03, grad_scale: 8.0 2023-11-26 08:15:51,032 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 494950 2023-11-26 08:15:57,487 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.93 vs. limit=10.0 2023-11-26 08:16:00,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3299713.3333333335, ans=0.125 2023-11-26 08:16:05,443 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.50 vs. limit=22.5 2023-11-26 08:16:06,896 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.282e+01 8.592e+01 9.452e+01 9.958e+01 1.219e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-26 08:16:09,240 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3299713.3333333335, ans=0.125 2023-11-26 08:16:13,269 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.64 vs. limit=6.0 2023-11-26 08:16:23,457 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 2000, loss[loss=0.05073, simple_loss=0.06972, pruned_loss=0.006141, audio_tagging_loss=0.009727, over 15550.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.08943, pruned_loss=0.01231, audio_tagging_loss=0.008716, over 3041266.04 frames. ], batch size: 59, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 08:16:30,102 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3299846.6666666665, ans=0.2 2023-11-26 08:16:32,316 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3299846.6666666665, ans=0.125 2023-11-26 08:16:47,416 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 495000 2023-11-26 08:17:03,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3300046.6666666665, ans=0.2 2023-11-26 08:17:19,893 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 2050, loss[loss=0.05192, simple_loss=0.06752, pruned_loss=0.008156, audio_tagging_loss=0.01, over 14948.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08902, pruned_loss=0.01233, audio_tagging_loss=0.008746, over 3037101.79 frames. ], batch size: 55, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 08:17:25,415 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3300180.0, ans=0.0 2023-11-26 08:17:37,462 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.97 vs. limit=15.0 2023-11-26 08:17:43,466 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 495050 2023-11-26 08:17:45,771 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3300313.3333333335, ans=0.1 2023-11-26 08:17:58,313 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.946e+01 8.680e+01 9.276e+01 1.017e+02 1.208e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-26 08:17:58,559 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3300380.0, ans=0.125 2023-11-26 08:18:15,254 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3300513.3333333335, ans=0.125 2023-11-26 08:18:16,043 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 2100, loss[loss=0.05659, simple_loss=0.06824, pruned_loss=0.01044, audio_tagging_loss=0.01203, over 14699.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08897, pruned_loss=0.01239, audio_tagging_loss=0.008667, over 3032472.87 frames. ], batch size: 56, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 08:18:17,284 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3300513.3333333335, ans=0.125 2023-11-26 08:18:35,033 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3300580.0, ans=0.125 2023-11-26 08:18:39,012 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 495100 2023-11-26 08:19:03,686 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3300780.0, ans=0.04949747468305833 2023-11-26 08:19:11,871 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 2150, loss[loss=0.07668, simple_loss=0.1029, pruned_loss=0.0153, audio_tagging_loss=0.009922, over 14776.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.0899, pruned_loss=0.01248, audio_tagging_loss=0.008692, over 3034652.17 frames. ], batch size: 55, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 08:19:35,792 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 495150 2023-11-26 08:19:44,897 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 08:19:51,200 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.490e+01 8.871e+01 9.357e+01 1.023e+02 1.211e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-26 08:19:56,853 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3301113.3333333335, ans=0.125 2023-11-26 08:19:57,177 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.93 vs. limit=12.0 2023-11-26 08:19:59,073 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3301113.3333333335, ans=0.125 2023-11-26 08:20:05,303 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3301113.3333333335, ans=0.0 2023-11-26 08:20:07,193 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 2200, loss[loss=0.07193, simple_loss=0.101, pruned_loss=0.01546, audio_tagging_loss=0.005977, over 15999.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08927, pruned_loss=0.01237, audio_tagging_loss=0.008709, over 3036809.42 frames. ], batch size: 56, lr: 1.62e-03, grad_scale: 16.0 2023-11-26 08:20:10,177 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3301180.0, ans=0.2 2023-11-26 08:20:14,952 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-26 08:20:21,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3301246.6666666665, ans=0.125 2023-11-26 08:20:25,455 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.15 vs. limit=12.0 2023-11-26 08:20:31,803 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 495200 2023-11-26 08:20:49,445 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3301380.0, ans=0.0 2023-11-26 08:20:54,546 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.31 vs. limit=15.0 2023-11-26 08:21:04,564 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 2250, loss[loss=0.08147, simple_loss=0.1107, pruned_loss=0.01443, audio_tagging_loss=0.01168, over 15288.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.0889, pruned_loss=0.0123, audio_tagging_loss=0.008825, over 3040223.12 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:21:07,225 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.15 vs. limit=15.0 2023-11-26 08:21:23,525 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3301580.0, ans=0.0 2023-11-26 08:21:27,592 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 495250 2023-11-26 08:21:43,625 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.449e+01 8.823e+01 9.427e+01 1.035e+02 1.716e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-26 08:21:55,062 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3301780.0, ans=0.125 2023-11-26 08:21:59,715 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.33 vs. limit=15.0 2023-11-26 08:22:00,177 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 2300, loss[loss=0.06346, simple_loss=0.08907, pruned_loss=0.009809, audio_tagging_loss=0.009114, over 14951.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.08943, pruned_loss=0.01239, audio_tagging_loss=0.008875, over 3034974.00 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:22:03,586 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3301846.6666666665, ans=0.125 2023-11-26 08:22:14,101 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3301913.3333333335, ans=0.035 2023-11-26 08:22:20,061 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 08:22:23,225 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 495300 2023-11-26 08:22:31,905 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3301980.0, ans=0.125 2023-11-26 08:22:38,809 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3302046.6666666665, ans=0.07 2023-11-26 08:22:48,096 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 08:22:48,291 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3302113.3333333335, ans=0.125 2023-11-26 08:22:55,590 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 2350, loss[loss=0.06256, simple_loss=0.08708, pruned_loss=0.01169, audio_tagging_loss=0.007329, over 15070.00 frames. ], tot_loss[loss=0.0666, simple_loss=0.09022, pruned_loss=0.01255, audio_tagging_loss=0.008943, over 3039293.32 frames. ], batch size: 60, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:23:01,799 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3302180.0, ans=0.125 2023-11-26 08:23:20,227 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 495350 2023-11-26 08:23:20,544 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=3302313.3333333335, ans=15.0 2023-11-26 08:23:32,165 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.41 vs. limit=10.0 2023-11-26 08:23:34,832 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.436e+01 8.737e+01 9.480e+01 1.014e+02 1.457e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-26 08:23:35,106 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3302380.0, ans=0.125 2023-11-26 08:23:39,733 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.50 vs. limit=15.0 2023-11-26 08:23:51,968 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 2400, loss[loss=0.0882, simple_loss=0.122, pruned_loss=0.01868, audio_tagging_loss=0.008534, over 14688.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.09017, pruned_loss=0.01243, audio_tagging_loss=0.009056, over 3041577.37 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:24:01,790 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3302513.3333333335, ans=0.125 2023-11-26 08:24:07,144 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3302580.0, ans=0.125 2023-11-26 08:24:14,916 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=3302646.6666666665, ans=6.0 2023-11-26 08:24:15,471 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 495400 2023-11-26 08:24:34,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3302713.3333333335, ans=0.2 2023-11-26 08:24:36,637 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.68 vs. limit=15.0 2023-11-26 08:24:43,072 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3302780.0, ans=0.0 2023-11-26 08:24:48,739 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 2450, loss[loss=0.05746, simple_loss=0.07833, pruned_loss=0.01012, audio_tagging_loss=0.008177, over 15702.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09057, pruned_loss=0.01256, audio_tagging_loss=0.009008, over 3041097.79 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:24:53,743 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.24 vs. limit=12.0 2023-11-26 08:25:01,659 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3302913.3333333335, ans=0.125 2023-11-26 08:25:11,602 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 495450 2023-11-26 08:25:28,951 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.831e+01 8.728e+01 9.406e+01 1.027e+02 1.574e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-26 08:25:29,303 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3303046.6666666665, ans=0.2 2023-11-26 08:25:43,720 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 2500, loss[loss=0.04653, simple_loss=0.05536, pruned_loss=0.006842, audio_tagging_loss=0.012, over 13501.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.08987, pruned_loss=0.01232, audio_tagging_loss=0.009075, over 3035802.32 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:25:53,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3303180.0, ans=0.0 2023-11-26 08:26:05,264 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3303313.3333333335, ans=0.125 2023-11-26 08:26:07,809 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 495500 2023-11-26 08:26:17,191 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.88 vs. limit=12.0 2023-11-26 08:26:39,656 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 2550, loss[loss=0.07607, simple_loss=0.1046, pruned_loss=0.01621, audio_tagging_loss=0.007568, over 15560.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.09065, pruned_loss=0.01254, audio_tagging_loss=0.008985, over 3033212.08 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:26:40,453 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3303513.3333333335, ans=0.1 2023-11-26 08:26:48,343 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.24 vs. limit=22.5 2023-11-26 08:26:48,893 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3303513.3333333335, ans=0.0 2023-11-26 08:27:03,204 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 495550 2023-11-26 08:27:19,460 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.178e+01 8.568e+01 9.109e+01 1.007e+02 1.472e+02, threshold=1.822e+02, percent-clipped=0.0 2023-11-26 08:27:35,778 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 2600, loss[loss=0.06354, simple_loss=0.08682, pruned_loss=0.01037, audio_tagging_loss=0.009761, over 15256.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.0899, pruned_loss=0.01241, audio_tagging_loss=0.008849, over 3038260.59 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:27:45,649 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3303913.3333333335, ans=0.1 2023-11-26 08:27:48,025 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.74 vs. limit=15.0 2023-11-26 08:27:58,816 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 495600 2023-11-26 08:27:59,554 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.76 vs. limit=15.0 2023-11-26 08:28:11,541 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3304046.6666666665, ans=0.125 2023-11-26 08:28:19,154 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3304046.6666666665, ans=0.0 2023-11-26 08:28:23,341 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3304113.3333333335, ans=0.0 2023-11-26 08:28:24,313 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=3304113.3333333335, ans=10.0 2023-11-26 08:28:28,606 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3304113.3333333335, ans=0.125 2023-11-26 08:28:29,673 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3304113.3333333335, ans=0.0 2023-11-26 08:28:31,460 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 2650, loss[loss=0.05676, simple_loss=0.07925, pruned_loss=0.008896, audio_tagging_loss=0.00824, over 15442.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.08981, pruned_loss=0.01242, audio_tagging_loss=0.008759, over 3040294.49 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:28:35,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3304180.0, ans=0.125 2023-11-26 08:28:50,770 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3304246.6666666665, ans=0.125 2023-11-26 08:28:50,798 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3304246.6666666665, ans=0.0 2023-11-26 08:28:54,936 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 495650 2023-11-26 08:28:58,794 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3304313.3333333335, ans=0.0 2023-11-26 08:29:11,961 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.984e+01 8.718e+01 9.187e+01 9.929e+01 1.273e+02, threshold=1.837e+02, percent-clipped=0.0 2023-11-26 08:29:24,906 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.02 vs. limit=12.0 2023-11-26 08:29:27,434 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 2700, loss[loss=0.07103, simple_loss=0.08761, pruned_loss=0.01708, audio_tagging_loss=0.01015, over 15287.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.09021, pruned_loss=0.01249, audio_tagging_loss=0.008674, over 3039447.58 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:29:49,179 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.97 vs. limit=6.0 2023-11-26 08:29:51,620 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 495700 2023-11-26 08:29:57,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3304646.6666666665, ans=0.125 2023-11-26 08:29:59,044 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3304646.6666666665, ans=0.125 2023-11-26 08:30:23,787 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 2750, loss[loss=0.05896, simple_loss=0.07213, pruned_loss=0.01022, audio_tagging_loss=0.01268, over 15179.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.09023, pruned_loss=0.01258, audio_tagging_loss=0.008694, over 3035984.79 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:30:43,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3304913.3333333335, ans=0.0 2023-11-26 08:30:46,677 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 495750 2023-11-26 08:30:47,907 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3304980.0, ans=0.125 2023-11-26 08:31:02,537 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 08:31:03,383 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.804e+01 8.931e+01 9.557e+01 1.024e+02 1.484e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-26 08:31:10,402 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 08:31:19,459 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 2800, loss[loss=0.06086, simple_loss=0.0764, pruned_loss=0.01044, audio_tagging_loss=0.01222, over 15461.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08922, pruned_loss=0.01236, audio_tagging_loss=0.008714, over 3032977.33 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:31:20,191 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.47 vs. limit=22.5 2023-11-26 08:31:27,178 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3305180.0, ans=0.0 2023-11-26 08:31:35,363 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3305246.6666666665, ans=0.125 2023-11-26 08:31:40,207 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3305246.6666666665, ans=0.1 2023-11-26 08:31:42,289 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3305313.3333333335, ans=0.05 2023-11-26 08:31:43,135 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 495800 2023-11-26 08:32:02,690 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3305380.0, ans=0.1 2023-11-26 08:32:15,827 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 2850, loss[loss=0.06225, simple_loss=0.0789, pruned_loss=0.01168, audio_tagging_loss=0.01112, over 16503.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08888, pruned_loss=0.01231, audio_tagging_loss=0.008674, over 3033868.27 frames. ], batch size: 64, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:32:27,312 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3305580.0, ans=0.125 2023-11-26 08:32:39,506 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 495850 2023-11-26 08:32:53,076 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3305713.3333333335, ans=0.125 2023-11-26 08:32:55,419 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.77 vs. limit=15.0 2023-11-26 08:32:55,964 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.651e+01 8.658e+01 9.306e+01 1.021e+02 1.225e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-26 08:32:56,705 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.43 vs. limit=15.0 2023-11-26 08:33:11,935 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 2900, loss[loss=0.05792, simple_loss=0.07668, pruned_loss=0.01176, audio_tagging_loss=0.007818, over 15927.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08921, pruned_loss=0.01246, audio_tagging_loss=0.008691, over 3041414.52 frames. ], batch size: 61, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:33:13,196 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3305846.6666666665, ans=0.125 2023-11-26 08:33:26,129 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3305913.3333333335, ans=0.0 2023-11-26 08:33:28,146 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3305913.3333333335, ans=0.07 2023-11-26 08:33:32,325 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3305913.3333333335, ans=0.125 2023-11-26 08:33:35,380 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 495900 2023-11-26 08:33:38,721 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3305980.0, ans=0.2 2023-11-26 08:33:48,290 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3306046.6666666665, ans=0.125 2023-11-26 08:33:56,880 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3306113.3333333335, ans=0.125 2023-11-26 08:33:58,359 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3306113.3333333335, ans=0.0 2023-11-26 08:33:58,575 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.30 vs. limit=15.0 2023-11-26 08:34:01,001 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.44 vs. limit=10.0 2023-11-26 08:34:07,737 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 2950, loss[loss=0.06721, simple_loss=0.09319, pruned_loss=0.01177, audio_tagging_loss=0.008843, over 15617.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.0897, pruned_loss=0.01253, audio_tagging_loss=0.008768, over 3043956.11 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:34:07,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3306180.0, ans=0.0 2023-11-26 08:34:19,468 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.74 vs. limit=6.0 2023-11-26 08:34:22,142 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3306246.6666666665, ans=0.125 2023-11-26 08:34:31,570 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 495950 2023-11-26 08:34:33,144 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.37 vs. limit=10.0 2023-11-26 08:34:48,003 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.570e+01 8.833e+01 9.371e+01 1.025e+02 1.490e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-26 08:34:54,618 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.93 vs. limit=15.0 2023-11-26 08:35:03,567 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 3000, loss[loss=0.0582, simple_loss=0.07634, pruned_loss=0.01076, audio_tagging_loss=0.009276, over 14833.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.08973, pruned_loss=0.01256, audio_tagging_loss=0.008826, over 3041711.41 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:35:03,568 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-26 08:35:36,311 INFO [train_asr.py:1267] (3/4) Epoch 42, validation: loss=0.05776, simple_loss=0.05062, pruned_loss=0.005203, audio_tagging_loss=0.02725, over 4681554.00 frames. 2023-11-26 08:35:36,312 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-26 08:35:44,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3306513.3333333335, ans=0.125 2023-11-26 08:35:49,134 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3306580.0, ans=0.0 2023-11-26 08:35:56,125 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3306580.0, ans=0.0 2023-11-26 08:35:59,065 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 496000 2023-11-26 08:36:22,118 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3306780.0, ans=0.0 2023-11-26 08:36:33,628 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 3050, loss[loss=0.06079, simple_loss=0.08398, pruned_loss=0.009329, audio_tagging_loss=0.00947, over 15215.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.08996, pruned_loss=0.0125, audio_tagging_loss=0.008959, over 3041148.66 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:36:39,690 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3306846.6666666665, ans=0.0 2023-11-26 08:36:45,015 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3306913.3333333335, ans=0.125 2023-11-26 08:36:57,599 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 496050 2023-11-26 08:37:04,510 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3306980.0, ans=0.2 2023-11-26 08:37:05,407 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 08:37:13,908 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.281e+01 8.651e+01 9.305e+01 1.008e+02 1.239e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-26 08:37:19,432 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3307113.3333333335, ans=0.125 2023-11-26 08:37:29,292 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 3100, loss[loss=0.07445, simple_loss=0.1038, pruned_loss=0.01271, audio_tagging_loss=0.009861, over 15215.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.08994, pruned_loss=0.01244, audio_tagging_loss=0.008971, over 3040542.27 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:37:40,308 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3307246.6666666665, ans=0.0 2023-11-26 08:37:45,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3307246.6666666665, ans=0.0 2023-11-26 08:37:50,844 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.67 vs. limit=10.0 2023-11-26 08:37:53,478 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 496100 2023-11-26 08:38:16,971 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3307446.6666666665, ans=0.1 2023-11-26 08:38:18,509 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.39 vs. limit=10.0 2023-11-26 08:38:25,838 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 3150, loss[loss=0.07328, simple_loss=0.09744, pruned_loss=0.01394, audio_tagging_loss=0.01063, over 14501.00 frames. ], tot_loss[loss=0.06697, simple_loss=0.09087, pruned_loss=0.01253, audio_tagging_loss=0.009, over 3047133.89 frames. ], batch size: 53, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:38:48,599 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3307646.6666666665, ans=0.0 2023-11-26 08:38:49,551 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 496150 2023-11-26 08:38:56,554 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3307646.6666666665, ans=0.125 2023-11-26 08:38:57,975 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.79 vs. limit=15.0 2023-11-26 08:39:06,519 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.692e+01 8.861e+01 9.326e+01 1.004e+02 1.383e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-26 08:39:17,017 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3307780.0, ans=0.0 2023-11-26 08:39:17,063 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3307780.0, ans=0.1 2023-11-26 08:39:22,069 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 3200, loss[loss=0.06641, simple_loss=0.08226, pruned_loss=0.01413, audio_tagging_loss=0.01116, over 14323.00 frames. ], tot_loss[loss=0.06745, simple_loss=0.09139, pruned_loss=0.01276, audio_tagging_loss=0.008996, over 3041028.84 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:39:41,531 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3307913.3333333335, ans=0.125 2023-11-26 08:39:45,630 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 496200 2023-11-26 08:39:51,105 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3307980.0, ans=0.125 2023-11-26 08:39:54,786 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3307980.0, ans=0.0 2023-11-26 08:39:58,013 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3308046.6666666665, ans=0.125 2023-11-26 08:40:18,473 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 3250, loss[loss=0.05811, simple_loss=0.0803, pruned_loss=0.008638, audio_tagging_loss=0.009325, over 14459.00 frames. ], tot_loss[loss=0.06727, simple_loss=0.09109, pruned_loss=0.01266, audio_tagging_loss=0.009064, over 3034353.03 frames. ], batch size: 53, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:40:42,230 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 496250 2023-11-26 08:40:42,323 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3308313.3333333335, ans=0.0 2023-11-26 08:40:58,073 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.019e+01 8.751e+01 9.386e+01 1.020e+02 1.370e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-26 08:40:58,393 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3308380.0, ans=0.125 2023-11-26 08:41:14,146 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.21 vs. limit=15.0 2023-11-26 08:41:14,548 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 3300, loss[loss=0.07501, simple_loss=0.09484, pruned_loss=0.018, audio_tagging_loss=0.00959, over 15182.00 frames. ], tot_loss[loss=0.06697, simple_loss=0.09046, pruned_loss=0.01258, audio_tagging_loss=0.009153, over 3043347.71 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:41:28,351 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.56 vs. limit=10.0 2023-11-26 08:41:35,516 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3308646.6666666665, ans=0.07 2023-11-26 08:41:37,462 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 496300 2023-11-26 08:41:39,151 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.56 vs. limit=15.0 2023-11-26 08:41:53,294 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3308713.3333333335, ans=0.0 2023-11-26 08:41:55,256 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3308713.3333333335, ans=0.0 2023-11-26 08:42:04,458 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3308780.0, ans=0.1 2023-11-26 08:42:10,607 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 3350, loss[loss=0.04925, simple_loss=0.06242, pruned_loss=0.005481, audio_tagging_loss=0.01256, over 15012.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.08933, pruned_loss=0.01242, audio_tagging_loss=0.009075, over 3040346.83 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:42:33,885 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 496350 2023-11-26 08:42:38,759 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3308980.0, ans=0.125 2023-11-26 08:42:50,750 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.716e+01 8.810e+01 9.666e+01 1.064e+02 1.433e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-26 08:42:55,219 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3309113.3333333335, ans=0.125 2023-11-26 08:42:58,328 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3309113.3333333335, ans=0.125 2023-11-26 08:43:03,627 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3309113.3333333335, ans=0.025 2023-11-26 08:43:05,516 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 3400, loss[loss=0.06442, simple_loss=0.09576, pruned_loss=0.007942, audio_tagging_loss=0.008601, over 15994.00 frames. ], tot_loss[loss=0.06716, simple_loss=0.091, pruned_loss=0.01274, audio_tagging_loss=0.008918, over 3042011.01 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:43:26,366 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3309246.6666666665, ans=0.05 2023-11-26 08:43:29,288 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 496400 2023-11-26 08:43:29,379 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3309313.3333333335, ans=0.0 2023-11-26 08:43:36,073 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3309313.3333333335, ans=0.0 2023-11-26 08:43:39,434 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3309380.0, ans=0.2 2023-11-26 08:43:48,644 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3309446.6666666665, ans=0.125 2023-11-26 08:43:52,387 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.27 vs. limit=15.0 2023-11-26 08:43:53,893 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.92 vs. limit=15.0 2023-11-26 08:43:57,312 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3309446.6666666665, ans=0.125 2023-11-26 08:43:58,412 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3309446.6666666665, ans=0.125 2023-11-26 08:44:01,845 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 3450, loss[loss=0.06881, simple_loss=0.085, pruned_loss=0.01425, audio_tagging_loss=0.01206, over 14149.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.0903, pruned_loss=0.01264, audio_tagging_loss=0.008855, over 3036233.09 frames. ], batch size: 53, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:44:05,179 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3309513.3333333335, ans=0.125 2023-11-26 08:44:10,005 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3309513.3333333335, ans=0.2 2023-11-26 08:44:23,406 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.03 vs. limit=15.0 2023-11-26 08:44:23,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3309646.6666666665, ans=0.125 2023-11-26 08:44:24,918 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 496450 2023-11-26 08:44:41,766 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.803e+01 8.832e+01 9.547e+01 1.006e+02 1.211e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-26 08:44:46,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3309780.0, ans=0.05 2023-11-26 08:44:48,854 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3309780.0, ans=0.125 2023-11-26 08:44:55,810 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3309780.0, ans=0.05 2023-11-26 08:44:57,775 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 3500, loss[loss=0.06013, simple_loss=0.0825, pruned_loss=0.008636, audio_tagging_loss=0.01024, over 15604.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.09025, pruned_loss=0.01265, audio_tagging_loss=0.008728, over 3037623.81 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:45:05,426 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3309846.6666666665, ans=0.125 2023-11-26 08:45:10,746 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 08:45:20,663 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 496500 2023-11-26 08:45:25,968 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 08:45:36,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3310046.6666666665, ans=0.1 2023-11-26 08:45:53,132 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 3550, loss[loss=0.07284, simple_loss=0.09538, pruned_loss=0.01632, audio_tagging_loss=0.008827, over 14855.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.08957, pruned_loss=0.01245, audio_tagging_loss=0.008698, over 3044636.31 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:45:54,702 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.59 vs. limit=15.0 2023-11-26 08:46:16,833 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 496550 2023-11-26 08:46:32,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3310380.0, ans=0.1 2023-11-26 08:46:33,836 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.065e+01 8.432e+01 9.183e+01 9.736e+01 1.809e+02, threshold=1.837e+02, percent-clipped=0.0 2023-11-26 08:46:38,865 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.59 vs. limit=22.5 2023-11-26 08:46:48,259 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 3600, loss[loss=0.07351, simple_loss=0.09781, pruned_loss=0.01521, audio_tagging_loss=0.009394, over 14898.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.08954, pruned_loss=0.01247, audio_tagging_loss=0.008693, over 3056617.78 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 08:46:52,878 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.19 vs. limit=6.0 2023-11-26 08:47:07,013 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3310580.0, ans=0.125 2023-11-26 08:47:12,048 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 496600 2023-11-26 08:47:45,325 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 3650, loss[loss=0.0705, simple_loss=0.0978, pruned_loss=0.01312, audio_tagging_loss=0.008481, over 15159.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.09031, pruned_loss=0.01247, audio_tagging_loss=0.00862, over 3060595.08 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:47:47,589 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=3310846.6666666665, ans=0.05 2023-11-26 08:47:58,354 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 08:48:01,391 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3310913.3333333335, ans=0.0 2023-11-26 08:48:08,420 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 496650 2023-11-26 08:48:08,630 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3310980.0, ans=0.0 2023-11-26 08:48:16,466 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3310980.0, ans=0.125 2023-11-26 08:48:19,668 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3311046.6666666665, ans=0.04949747468305833 2023-11-26 08:48:29,082 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.052e+01 8.589e+01 9.068e+01 9.988e+01 1.098e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-26 08:48:33,619 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3311113.3333333335, ans=0.1 2023-11-26 08:48:40,913 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 3700, loss[loss=0.06292, simple_loss=0.08987, pruned_loss=0.01194, audio_tagging_loss=0.00605, over 16280.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.08988, pruned_loss=0.01234, audio_tagging_loss=0.00865, over 3059719.10 frames. ], batch size: 61, lr: 1.61e-03, grad_scale: 8.0 2023-11-26 08:48:42,133 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3311180.0, ans=0.125 2023-11-26 08:48:47,472 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3311180.0, ans=0.125 2023-11-26 08:48:47,608 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3311180.0, ans=0.1 2023-11-26 08:48:56,355 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3311246.6666666665, ans=0.2 2023-11-26 08:49:04,827 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 496700 2023-11-26 08:49:08,544 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3311313.3333333335, ans=0.0 2023-11-26 08:49:10,640 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3311313.3333333335, ans=0.0 2023-11-26 08:49:19,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3311380.0, ans=0.125 2023-11-26 08:49:24,471 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3311446.6666666665, ans=0.125 2023-11-26 08:49:34,473 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3311446.6666666665, ans=0.125 2023-11-26 08:49:34,510 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3311446.6666666665, ans=0.2 2023-11-26 08:49:36,479 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 3750, loss[loss=0.07683, simple_loss=0.1006, pruned_loss=0.01485, audio_tagging_loss=0.01166, over 15983.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.09089, pruned_loss=0.01262, audio_tagging_loss=0.008636, over 3059259.42 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 8.0 2023-11-26 08:49:44,046 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.77 vs. limit=15.0 2023-11-26 08:50:00,812 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 496750 2023-11-26 08:50:11,752 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3311713.3333333335, ans=0.125 2023-11-26 08:50:13,655 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 08:50:14,959 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3311713.3333333335, ans=0.1 2023-11-26 08:50:20,473 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.645e+01 8.835e+01 9.456e+01 1.002e+02 1.375e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-26 08:50:24,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3311780.0, ans=0.0 2023-11-26 08:50:33,738 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 3800, loss[loss=0.06858, simple_loss=0.09625, pruned_loss=0.01153, audio_tagging_loss=0.008922, over 15346.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.09119, pruned_loss=0.01261, audio_tagging_loss=0.008618, over 3059956.35 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 8.0 2023-11-26 08:50:33,886 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3311846.6666666665, ans=0.0 2023-11-26 08:50:41,516 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3311846.6666666665, ans=0.125 2023-11-26 08:50:45,774 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3311913.3333333335, ans=0.0 2023-11-26 08:50:56,194 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 496800 2023-11-26 08:51:06,122 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3312046.6666666665, ans=0.125 2023-11-26 08:51:29,054 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 3850, loss[loss=0.07661, simple_loss=0.1095, pruned_loss=0.0122, audio_tagging_loss=0.009671, over 15171.00 frames. ], tot_loss[loss=0.06729, simple_loss=0.0916, pruned_loss=0.01281, audio_tagging_loss=0.008682, over 3054565.51 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 8.0 2023-11-26 08:51:30,402 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 08:51:36,796 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3312180.0, ans=0.5 2023-11-26 08:51:52,584 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 496850 2023-11-26 08:51:56,310 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3312313.3333333335, ans=0.125 2023-11-26 08:52:09,813 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3312380.0, ans=0.125 2023-11-26 08:52:10,884 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3312380.0, ans=0.125 2023-11-26 08:52:12,780 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.556e+01 8.753e+01 9.436e+01 1.032e+02 1.247e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-26 08:52:24,992 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 3900, loss[loss=0.05909, simple_loss=0.08019, pruned_loss=0.0108, audio_tagging_loss=0.008194, over 15641.00 frames. ], tot_loss[loss=0.06736, simple_loss=0.09167, pruned_loss=0.0128, audio_tagging_loss=0.008723, over 3061134.80 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 8.0 2023-11-26 08:52:32,253 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3312513.3333333335, ans=0.0 2023-11-26 08:52:32,699 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.35 vs. limit=15.0 2023-11-26 08:52:41,269 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3312580.0, ans=0.125 2023-11-26 08:52:44,769 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.11 vs. limit=15.0 2023-11-26 08:52:49,039 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 496900 2023-11-26 08:53:02,321 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3312713.3333333335, ans=0.1 2023-11-26 08:53:10,168 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3312780.0, ans=0.05 2023-11-26 08:53:12,876 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3312780.0, ans=10.0 2023-11-26 08:53:21,510 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 3950, loss[loss=0.06336, simple_loss=0.08379, pruned_loss=0.01319, audio_tagging_loss=0.00828, over 16090.00 frames. ], tot_loss[loss=0.06746, simple_loss=0.09165, pruned_loss=0.01283, audio_tagging_loss=0.008806, over 3069201.95 frames. ], batch size: 60, lr: 1.61e-03, grad_scale: 8.0 2023-11-26 08:53:28,222 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3312846.6666666665, ans=0.0 2023-11-26 08:53:33,091 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3312913.3333333335, ans=0.125 2023-11-26 08:53:38,364 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3312913.3333333335, ans=0.125 2023-11-26 08:53:39,460 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3312913.3333333335, ans=0.125 2023-11-26 08:53:44,603 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 496950 2023-11-26 08:54:05,139 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.800e+01 8.614e+01 9.558e+01 1.040e+02 1.255e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-26 08:54:12,929 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3313113.3333333335, ans=0.0 2023-11-26 08:54:13,906 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3313113.3333333335, ans=0.125 2023-11-26 08:54:15,549 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3313113.3333333335, ans=0.0 2023-11-26 08:54:17,368 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 4000, loss[loss=0.04712, simple_loss=0.06657, pruned_loss=0.00654, audio_tagging_loss=0.007295, over 15296.00 frames. ], tot_loss[loss=0.06742, simple_loss=0.09136, pruned_loss=0.01283, audio_tagging_loss=0.00891, over 3064139.21 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:54:23,329 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.22 vs. limit=15.0 2023-11-26 08:54:41,090 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 497000 2023-11-26 08:55:04,740 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3313446.6666666665, ans=0.0 2023-11-26 08:55:07,998 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3313446.6666666665, ans=0.125 2023-11-26 08:55:13,073 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 4050, loss[loss=0.08812, simple_loss=0.1199, pruned_loss=0.01962, audio_tagging_loss=0.008566, over 14501.00 frames. ], tot_loss[loss=0.06741, simple_loss=0.09113, pruned_loss=0.01284, audio_tagging_loss=0.009005, over 3065957.57 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:55:14,672 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 08:55:21,797 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3313513.3333333335, ans=0.125 2023-11-26 08:55:37,121 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 497050 2023-11-26 08:55:43,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3313646.6666666665, ans=0.1 2023-11-26 08:55:57,332 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.667e+01 8.799e+01 9.457e+01 1.021e+02 1.367e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-26 08:55:57,584 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3313780.0, ans=0.125 2023-11-26 08:56:06,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3313780.0, ans=0.125 2023-11-26 08:56:09,659 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 4100, loss[loss=0.06736, simple_loss=0.08805, pruned_loss=0.01593, audio_tagging_loss=0.007408, over 14503.00 frames. ], tot_loss[loss=0.06744, simple_loss=0.09129, pruned_loss=0.01279, audio_tagging_loss=0.008999, over 3063943.40 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:56:09,802 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3313846.6666666665, ans=0.125 2023-11-26 08:56:31,694 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.50 vs. limit=22.5 2023-11-26 08:56:33,311 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 497100 2023-11-26 08:56:39,223 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.89 vs. limit=12.0 2023-11-26 08:56:43,329 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3314046.6666666665, ans=0.015 2023-11-26 08:56:43,432 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3314046.6666666665, ans=0.5 2023-11-26 08:56:49,277 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3314046.6666666665, ans=0.0 2023-11-26 08:56:52,792 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.03 vs. limit=22.5 2023-11-26 08:57:01,661 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3314113.3333333335, ans=0.035 2023-11-26 08:57:05,618 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.03 vs. limit=22.5 2023-11-26 08:57:05,874 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 4150, loss[loss=0.0786, simple_loss=0.1051, pruned_loss=0.01757, audio_tagging_loss=0.008459, over 16154.00 frames. ], tot_loss[loss=0.06716, simple_loss=0.09103, pruned_loss=0.01272, audio_tagging_loss=0.008925, over 3054250.72 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:57:10,300 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3314180.0, ans=0.0 2023-11-26 08:57:29,071 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3314313.3333333335, ans=0.09899494936611666 2023-11-26 08:57:29,889 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 497150 2023-11-26 08:57:30,974 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3314313.3333333335, ans=0.125 2023-11-26 08:57:37,945 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3314313.3333333335, ans=0.125 2023-11-26 08:57:45,171 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 08:57:49,402 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.533e+01 8.973e+01 9.473e+01 1.014e+02 1.383e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-26 08:57:52,123 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.08 vs. limit=12.0 2023-11-26 08:58:01,789 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 4200, loss[loss=0.06763, simple_loss=0.0917, pruned_loss=0.0137, audio_tagging_loss=0.008088, over 14811.00 frames. ], tot_loss[loss=0.06717, simple_loss=0.09165, pruned_loss=0.01267, audio_tagging_loss=0.008671, over 3054377.22 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:58:09,969 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3314513.3333333335, ans=0.125 2023-11-26 08:58:14,109 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.73 vs. limit=12.0 2023-11-26 08:58:16,844 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3314580.0, ans=0.0 2023-11-26 08:58:20,363 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3314580.0, ans=0.125 2023-11-26 08:58:25,587 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 497200 2023-11-26 08:58:36,629 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3314713.3333333335, ans=0.0 2023-11-26 08:58:46,886 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3314780.0, ans=0.07 2023-11-26 08:58:47,080 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.75 vs. limit=10.0 2023-11-26 08:58:58,285 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 4250, loss[loss=0.05735, simple_loss=0.07691, pruned_loss=0.008764, audio_tagging_loss=0.01013, over 13890.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.09078, pruned_loss=0.01245, audio_tagging_loss=0.008638, over 3052731.50 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 08:59:10,603 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3314913.3333333335, ans=0.125 2023-11-26 08:59:11,770 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3314913.3333333335, ans=0.125 2023-11-26 08:59:21,168 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 497250 2023-11-26 08:59:27,109 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3314980.0, ans=0.05 2023-11-26 08:59:29,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3314980.0, ans=0.0 2023-11-26 08:59:29,207 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3314980.0, ans=0.125 2023-11-26 08:59:37,736 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 08:59:41,845 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.249e+01 8.666e+01 9.281e+01 9.909e+01 1.116e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-26 08:59:54,038 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 4300, loss[loss=0.06768, simple_loss=0.08953, pruned_loss=0.01233, audio_tagging_loss=0.01058, over 16010.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.09088, pruned_loss=0.01247, audio_tagging_loss=0.008671, over 3046625.82 frames. ], batch size: 62, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:00:02,107 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.32 vs. limit=15.0 2023-11-26 09:00:17,468 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 497300 2023-11-26 09:00:33,605 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3315380.0, ans=0.0 2023-11-26 09:00:38,176 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.07 vs. limit=15.0 2023-11-26 09:00:41,965 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3315446.6666666665, ans=0.125 2023-11-26 09:00:49,343 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 4350, loss[loss=0.06362, simple_loss=0.08028, pruned_loss=0.01334, audio_tagging_loss=0.01014, over 16458.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.09113, pruned_loss=0.01254, audio_tagging_loss=0.008577, over 3052410.73 frames. ], batch size: 60, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:01:05,040 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3315580.0, ans=0.125 2023-11-26 09:01:13,997 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 497350 2023-11-26 09:01:33,043 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.562e+01 8.733e+01 9.373e+01 9.862e+01 1.351e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-26 09:01:46,599 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 4400, loss[loss=0.07135, simple_loss=0.09435, pruned_loss=0.01638, audio_tagging_loss=0.007794, over 16365.00 frames. ], tot_loss[loss=0.06704, simple_loss=0.09147, pruned_loss=0.01276, audio_tagging_loss=0.00854, over 3045402.64 frames. ], batch size: 61, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:01:57,897 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3315913.3333333335, ans=0.0 2023-11-26 09:02:09,308 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 497400 2023-11-26 09:02:23,650 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3316046.6666666665, ans=0.125 2023-11-26 09:02:29,498 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3316046.6666666665, ans=0.07 2023-11-26 09:02:36,571 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3316113.3333333335, ans=0.125 2023-11-26 09:02:42,681 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 4450, loss[loss=0.06738, simple_loss=0.09505, pruned_loss=0.01227, audio_tagging_loss=0.007584, over 15254.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.09062, pruned_loss=0.01261, audio_tagging_loss=0.008478, over 3036542.69 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:02:54,617 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3316246.6666666665, ans=0.1 2023-11-26 09:02:54,813 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.35 vs. limit=15.0 2023-11-26 09:02:55,494 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3316246.6666666665, ans=0.125 2023-11-26 09:03:06,114 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 497450 2023-11-26 09:03:08,592 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.50 vs. limit=15.0 2023-11-26 09:03:09,804 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.02 vs. limit=15.0 2023-11-26 09:03:24,255 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 09:03:26,126 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.847e+01 8.913e+01 9.793e+01 1.057e+02 1.326e+02, threshold=1.959e+02, percent-clipped=0.0 2023-11-26 09:03:26,303 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3316446.6666666665, ans=0.0 2023-11-26 09:03:27,349 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 09:03:37,883 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 4500, loss[loss=0.05983, simple_loss=0.08096, pruned_loss=0.01218, audio_tagging_loss=0.007173, over 14541.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.09077, pruned_loss=0.01264, audio_tagging_loss=0.008452, over 3041579.40 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:03:43,470 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3316513.3333333335, ans=0.0 2023-11-26 09:03:51,967 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-26 09:04:02,417 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 497500 2023-11-26 09:04:07,865 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3316646.6666666665, ans=0.125 2023-11-26 09:04:22,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3316780.0, ans=0.125 2023-11-26 09:04:28,719 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3316780.0, ans=0.1 2023-11-26 09:04:34,321 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 4550, loss[loss=0.05739, simple_loss=0.0764, pruned_loss=0.01072, audio_tagging_loss=0.00847, over 15552.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.09038, pruned_loss=0.01249, audio_tagging_loss=0.008479, over 3044365.59 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:04:36,634 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.22 vs. limit=22.5 2023-11-26 09:04:44,117 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3316846.6666666665, ans=0.125 2023-11-26 09:04:53,852 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3316913.3333333335, ans=0.2 2023-11-26 09:04:54,948 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3316913.3333333335, ans=0.0 2023-11-26 09:04:58,033 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 497550 2023-11-26 09:05:02,312 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3316980.0, ans=0.125 2023-11-26 09:05:05,908 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.11 vs. limit=15.0 2023-11-26 09:05:16,128 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 09:05:19,833 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.953e+01 8.805e+01 9.410e+01 9.881e+01 1.547e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-26 09:05:31,125 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 4600, loss[loss=0.0652, simple_loss=0.0897, pruned_loss=0.01206, audio_tagging_loss=0.008297, over 15664.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.0903, pruned_loss=0.01253, audio_tagging_loss=0.008554, over 3034706.54 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:05:34,577 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3317180.0, ans=0.2 2023-11-26 09:05:46,131 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3317246.6666666665, ans=0.125 2023-11-26 09:05:53,471 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 497600 2023-11-26 09:06:03,496 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.71 vs. limit=12.0 2023-11-26 09:06:05,320 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 09:06:27,038 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 4650, loss[loss=0.06711, simple_loss=0.07991, pruned_loss=0.01672, audio_tagging_loss=0.01044, over 14510.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.09025, pruned_loss=0.01262, audio_tagging_loss=0.008675, over 3034141.93 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:06:45,748 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3317580.0, ans=0.125 2023-11-26 09:06:50,796 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 497650 2023-11-26 09:07:02,296 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3317713.3333333335, ans=0.2 2023-11-26 09:07:10,697 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3317780.0, ans=0.0 2023-11-26 09:07:12,607 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.522e+01 8.826e+01 9.427e+01 1.038e+02 1.331e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-26 09:07:18,176 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3317780.0, ans=0.0 2023-11-26 09:07:18,414 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.59 vs. limit=22.5 2023-11-26 09:07:22,786 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 4700, loss[loss=0.06917, simple_loss=0.08941, pruned_loss=0.01256, audio_tagging_loss=0.01191, over 13994.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.08997, pruned_loss=0.01258, audio_tagging_loss=0.00879, over 3036150.41 frames. ], batch size: 53, lr: 1.61e-03, grad_scale: 8.0 2023-11-26 09:07:25,144 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3317846.6666666665, ans=0.5 2023-11-26 09:07:36,891 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3317913.3333333335, ans=0.125 2023-11-26 09:07:46,147 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 497700 2023-11-26 09:08:13,443 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3318113.3333333335, ans=0.125 2023-11-26 09:08:18,964 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 4750, loss[loss=0.06072, simple_loss=0.08567, pruned_loss=0.009348, audio_tagging_loss=0.00854, over 16215.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.08899, pruned_loss=0.01226, audio_tagging_loss=0.009021, over 3033026.46 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 8.0 2023-11-26 09:08:41,268 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 497750 2023-11-26 09:08:49,693 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.40 vs. limit=10.0 2023-11-26 09:09:03,703 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=3318446.6666666665, ans=0.05 2023-11-26 09:09:04,459 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.453e+01 8.590e+01 9.271e+01 9.941e+01 1.309e+02, threshold=1.854e+02, percent-clipped=0.0 2023-11-26 09:09:06,226 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.78 vs. limit=15.0 2023-11-26 09:09:12,530 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.01 vs. limit=12.0 2023-11-26 09:09:14,028 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 4800, loss[loss=0.07773, simple_loss=0.1127, pruned_loss=0.01279, audio_tagging_loss=0.008589, over 15565.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.08983, pruned_loss=0.01241, audio_tagging_loss=0.009062, over 3039692.44 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:09:18,886 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.84 vs. limit=15.0 2023-11-26 09:09:19,511 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3318513.3333333335, ans=0.125 2023-11-26 09:09:24,819 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3318580.0, ans=0.2 2023-11-26 09:09:36,795 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.40 vs. limit=15.0 2023-11-26 09:09:37,490 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 497800 2023-11-26 09:10:10,125 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 4850, loss[loss=0.08108, simple_loss=0.1063, pruned_loss=0.01748, audio_tagging_loss=0.01045, over 15143.00 frames. ], tot_loss[loss=0.0669, simple_loss=0.09056, pruned_loss=0.01256, audio_tagging_loss=0.009057, over 3038202.40 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:10:10,278 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3318846.6666666665, ans=0.125 2023-11-26 09:10:21,456 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 09:10:27,885 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.94 vs. limit=6.0 2023-11-26 09:10:33,035 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3318980.0, ans=0.125 2023-11-26 09:10:33,985 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 497850 2023-11-26 09:10:38,314 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3318980.0, ans=0.125 2023-11-26 09:10:39,411 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3318980.0, ans=0.2 2023-11-26 09:10:45,048 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.77 vs. limit=15.0 2023-11-26 09:10:46,895 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3319046.6666666665, ans=0.125 2023-11-26 09:10:48,146 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.63 vs. limit=12.0 2023-11-26 09:10:55,693 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.411e+01 8.672e+01 9.359e+01 1.001e+02 1.200e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-26 09:11:06,513 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 4900, loss[loss=0.05644, simple_loss=0.0883, pruned_loss=0.007563, audio_tagging_loss=0.004722, over 15173.00 frames. ], tot_loss[loss=0.06696, simple_loss=0.09087, pruned_loss=0.01258, audio_tagging_loss=0.008947, over 3042077.28 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:11:28,844 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 497900 2023-11-26 09:11:40,127 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3319380.0, ans=0.0 2023-11-26 09:11:56,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3319446.6666666665, ans=0.0 2023-11-26 09:11:58,737 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3319446.6666666665, ans=0.1 2023-11-26 09:12:01,620 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 4950, loss[loss=0.07615, simple_loss=0.1039, pruned_loss=0.01669, audio_tagging_loss=0.007496, over 15612.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09033, pruned_loss=0.0125, audio_tagging_loss=0.008884, over 3036847.00 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:12:18,359 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3319580.0, ans=0.2 2023-11-26 09:12:24,054 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3319646.6666666665, ans=0.0 2023-11-26 09:12:24,961 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 497950 2023-11-26 09:12:46,994 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.442e+01 8.822e+01 9.528e+01 1.003e+02 1.233e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-26 09:12:55,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3319846.6666666665, ans=0.1 2023-11-26 09:12:56,701 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 5000, loss[loss=0.07911, simple_loss=0.112, pruned_loss=0.01613, audio_tagging_loss=0.00699, over 14781.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.09023, pruned_loss=0.01243, audio_tagging_loss=0.008712, over 3038484.34 frames. ], batch size: 53, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:13:21,447 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 498000 2023-11-26 09:13:25,212 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.89 vs. limit=15.0 2023-11-26 09:13:34,741 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3320046.6666666665, ans=0.125 2023-11-26 09:13:36,868 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3320046.6666666665, ans=0.0 2023-11-26 09:13:43,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3320113.3333333335, ans=0.1 2023-11-26 09:13:52,837 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3320180.0, ans=0.2 2023-11-26 09:13:53,748 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 5050, loss[loss=0.0656, simple_loss=0.09669, pruned_loss=0.01078, audio_tagging_loss=0.006478, over 16575.00 frames. ], tot_loss[loss=0.06685, simple_loss=0.0912, pruned_loss=0.01259, audio_tagging_loss=0.008664, over 3039399.48 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:14:05,726 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3320246.6666666665, ans=0.2 2023-11-26 09:14:17,105 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 498050 2023-11-26 09:14:39,864 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.754e+01 8.596e+01 9.338e+01 9.985e+01 1.178e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-26 09:14:45,650 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.55 vs. limit=15.0 2023-11-26 09:14:49,961 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 5100, loss[loss=0.06568, simple_loss=0.08559, pruned_loss=0.01158, audio_tagging_loss=0.01131, over 15246.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.08985, pruned_loss=0.01239, audio_tagging_loss=0.008634, over 3040351.19 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:15:03,006 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.12 vs. limit=12.0 2023-11-26 09:15:13,379 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 498100 2023-11-26 09:15:33,957 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.85 vs. limit=6.0 2023-11-26 09:15:41,280 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3320780.0, ans=0.125 2023-11-26 09:15:44,437 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3320846.6666666665, ans=0.035 2023-11-26 09:15:45,339 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 5150, loss[loss=0.05603, simple_loss=0.07828, pruned_loss=0.008553, audio_tagging_loss=0.008337, over 15665.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08966, pruned_loss=0.01215, audio_tagging_loss=0.008594, over 3045974.04 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:16:00,461 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3320913.3333333335, ans=0.025 2023-11-26 09:16:09,104 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.34 vs. limit=15.0 2023-11-26 09:16:09,432 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 498150 2023-11-26 09:16:31,746 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.763e+01 8.833e+01 9.269e+01 1.033e+02 1.245e+02, threshold=1.854e+02, percent-clipped=0.0 2023-11-26 09:16:35,162 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3321113.3333333335, ans=0.125 2023-11-26 09:16:41,827 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 5200, loss[loss=0.0825, simple_loss=0.107, pruned_loss=0.02171, audio_tagging_loss=0.007313, over 15061.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08923, pruned_loss=0.01201, audio_tagging_loss=0.00858, over 3050252.02 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:16:41,982 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3321180.0, ans=0.2 2023-11-26 09:16:54,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3321246.6666666665, ans=0.0 2023-11-26 09:17:02,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3321246.6666666665, ans=0.0 2023-11-26 09:17:05,253 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 498200 2023-11-26 09:17:12,164 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3321313.3333333335, ans=0.0 2023-11-26 09:17:12,527 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.05 vs. limit=22.5 2023-11-26 09:17:31,055 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.86 vs. limit=15.0 2023-11-26 09:17:33,690 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3321446.6666666665, ans=0.125 2023-11-26 09:17:34,722 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3321446.6666666665, ans=0.125 2023-11-26 09:17:35,911 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3321446.6666666665, ans=0.0 2023-11-26 09:17:37,760 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 5250, loss[loss=0.04804, simple_loss=0.05829, pruned_loss=0.008076, audio_tagging_loss=0.01081, over 15934.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08963, pruned_loss=0.01212, audio_tagging_loss=0.00854, over 3052884.88 frames. ], batch size: 61, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:17:41,120 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3321513.3333333335, ans=0.125 2023-11-26 09:18:01,138 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 498250 2023-11-26 09:18:10,240 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3321713.3333333335, ans=0.2 2023-11-26 09:18:11,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3321713.3333333335, ans=0.125 2023-11-26 09:18:13,310 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.54 vs. limit=10.0 2023-11-26 09:18:20,437 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3321713.3333333335, ans=0.2 2023-11-26 09:18:23,386 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.448e+01 8.788e+01 9.359e+01 1.015e+02 1.795e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-26 09:18:31,836 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3321780.0, ans=0.125 2023-11-26 09:18:33,634 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 5300, loss[loss=0.07247, simple_loss=0.09646, pruned_loss=0.01415, audio_tagging_loss=0.01009, over 14562.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.09002, pruned_loss=0.01207, audio_tagging_loss=0.00849, over 3052189.13 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:18:33,762 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3321846.6666666665, ans=0.0 2023-11-26 09:18:56,567 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3321980.0, ans=0.0 2023-11-26 09:18:57,461 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 498300 2023-11-26 09:18:57,625 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3321980.0, ans=0.0 2023-11-26 09:19:02,162 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.15 vs. limit=15.0 2023-11-26 09:19:02,906 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3321980.0, ans=0.125 2023-11-26 09:19:05,407 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.02 vs. limit=15.0 2023-11-26 09:19:24,344 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3322113.3333333335, ans=0.125 2023-11-26 09:19:29,819 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 5350, loss[loss=0.0558, simple_loss=0.0731, pruned_loss=0.006962, audio_tagging_loss=0.01228, over 16015.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.09008, pruned_loss=0.01222, audio_tagging_loss=0.008591, over 3050947.10 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:19:30,448 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.29 vs. limit=15.0 2023-11-26 09:19:32,279 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.66 vs. limit=12.0 2023-11-26 09:19:33,210 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3322180.0, ans=0.125 2023-11-26 09:19:34,533 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.29 vs. limit=22.5 2023-11-26 09:19:48,495 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3322246.6666666665, ans=0.0 2023-11-26 09:19:52,673 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 498350 2023-11-26 09:20:01,829 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3322380.0, ans=0.05 2023-11-26 09:20:16,475 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.299e+01 8.636e+01 9.507e+01 1.021e+02 1.196e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-26 09:20:16,863 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3322446.6666666665, ans=0.1 2023-11-26 09:20:24,858 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3322513.3333333335, ans=0.125 2023-11-26 09:20:25,607 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 5400, loss[loss=0.05317, simple_loss=0.07376, pruned_loss=0.007963, audio_tagging_loss=0.008323, over 15792.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08951, pruned_loss=0.01218, audio_tagging_loss=0.008571, over 3046643.23 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:20:30,176 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3322513.3333333335, ans=0.125 2023-11-26 09:20:48,972 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 498400 2023-11-26 09:20:50,130 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3322646.6666666665, ans=0.125 2023-11-26 09:20:59,401 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3322713.3333333335, ans=0.125 2023-11-26 09:21:04,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3322713.3333333335, ans=0.07 2023-11-26 09:21:20,961 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 5450, loss[loss=0.07453, simple_loss=0.1005, pruned_loss=0.01417, audio_tagging_loss=0.01012, over 14433.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.09041, pruned_loss=0.01236, audio_tagging_loss=0.008725, over 3050696.19 frames. ], batch size: 52, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:21:22,774 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3322846.6666666665, ans=0.125 2023-11-26 09:21:40,273 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=3322913.3333333335, ans=22.5 2023-11-26 09:21:45,584 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 498450 2023-11-26 09:21:53,204 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3322980.0, ans=0.125 2023-11-26 09:22:08,290 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.192e+01 8.615e+01 9.390e+01 1.017e+02 1.312e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-26 09:22:13,230 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3323113.3333333335, ans=0.125 2023-11-26 09:22:17,208 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 5500, loss[loss=0.07115, simple_loss=0.08987, pruned_loss=0.0168, audio_tagging_loss=0.00941, over 14506.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09095, pruned_loss=0.01262, audio_tagging_loss=0.008761, over 3046407.45 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:22:40,718 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 498500 2023-11-26 09:23:02,554 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3323446.6666666665, ans=0.125 2023-11-26 09:23:04,826 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3323446.6666666665, ans=0.2 2023-11-26 09:23:13,563 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 5550, loss[loss=0.0758, simple_loss=0.111, pruned_loss=0.01238, audio_tagging_loss=0.007894, over 15554.00 frames. ], tot_loss[loss=0.06697, simple_loss=0.09113, pruned_loss=0.01256, audio_tagging_loss=0.008841, over 3055295.17 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:23:20,114 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3323513.3333333335, ans=0.0 2023-11-26 09:23:31,897 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3323580.0, ans=0.0 2023-11-26 09:23:36,483 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 498550 2023-11-26 09:23:39,286 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3323646.6666666665, ans=0.0 2023-11-26 09:23:52,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3323713.3333333335, ans=0.125 2023-11-26 09:23:52,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3323713.3333333335, ans=0.125 2023-11-26 09:24:00,244 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.769e+01 8.665e+01 9.160e+01 9.875e+01 1.167e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-26 09:24:07,914 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=3323846.6666666665, ans=10.0 2023-11-26 09:24:08,702 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 5600, loss[loss=0.06828, simple_loss=0.08912, pruned_loss=0.01446, audio_tagging_loss=0.009261, over 16621.00 frames. ], tot_loss[loss=0.06734, simple_loss=0.09165, pruned_loss=0.01256, audio_tagging_loss=0.008954, over 3058502.55 frames. ], batch size: 61, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:24:32,121 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 498600 2023-11-26 09:24:47,737 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 09:24:49,987 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3324046.6666666665, ans=0.1 2023-11-26 09:24:55,285 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3324113.3333333335, ans=0.125 2023-11-26 09:25:04,652 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 5650, loss[loss=0.05724, simple_loss=0.07971, pruned_loss=0.008369, audio_tagging_loss=0.009012, over 15408.00 frames. ], tot_loss[loss=0.06679, simple_loss=0.09077, pruned_loss=0.01237, audio_tagging_loss=0.009036, over 3061688.26 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:25:09,558 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3324180.0, ans=0.125 2023-11-26 09:25:15,459 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.56 vs. limit=22.5 2023-11-26 09:25:21,042 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.84 vs. limit=15.0 2023-11-26 09:25:27,975 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 498650 2023-11-26 09:25:40,695 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3324380.0, ans=0.0 2023-11-26 09:25:51,532 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.459e+01 8.711e+01 9.394e+01 1.016e+02 1.261e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-26 09:25:58,112 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3324446.6666666665, ans=0.1 2023-11-26 09:26:00,615 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 5700, loss[loss=0.07063, simple_loss=0.1021, pruned_loss=0.01172, audio_tagging_loss=0.007845, over 15920.00 frames. ], tot_loss[loss=0.06711, simple_loss=0.0913, pruned_loss=0.01254, audio_tagging_loss=0.008925, over 3061005.70 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:26:18,758 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3324580.0, ans=0.0 2023-11-26 09:26:18,860 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3324580.0, ans=0.0 2023-11-26 09:26:22,927 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 498700 2023-11-26 09:26:33,078 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3324713.3333333335, ans=0.1 2023-11-26 09:26:55,476 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 5750, loss[loss=0.07587, simple_loss=0.09305, pruned_loss=0.01923, audio_tagging_loss=0.01011, over 13725.00 frames. ], tot_loss[loss=0.06706, simple_loss=0.09127, pruned_loss=0.01258, audio_tagging_loss=0.008847, over 3056174.96 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:27:19,374 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 498750 2023-11-26 09:27:43,275 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.383e+01 8.688e+01 9.672e+01 1.037e+02 1.412e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-26 09:27:51,156 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 5800, loss[loss=0.08165, simple_loss=0.117, pruned_loss=0.01544, audio_tagging_loss=0.007733, over 15582.00 frames. ], tot_loss[loss=0.06724, simple_loss=0.09166, pruned_loss=0.01266, audio_tagging_loss=0.008745, over 3046355.26 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:27:57,121 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3325180.0, ans=0.125 2023-11-26 09:28:10,889 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3325246.6666666665, ans=0.2 2023-11-26 09:28:11,980 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3325246.6666666665, ans=0.0 2023-11-26 09:28:14,448 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.45 vs. limit=22.5 2023-11-26 09:28:15,021 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 498800 2023-11-26 09:28:21,648 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 09:28:24,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3325380.0, ans=0.07 2023-11-26 09:28:46,782 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 5850, loss[loss=0.05887, simple_loss=0.07544, pruned_loss=0.01035, audio_tagging_loss=0.01079, over 14753.00 frames. ], tot_loss[loss=0.06705, simple_loss=0.09121, pruned_loss=0.01275, audio_tagging_loss=0.008688, over 3048920.37 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:28:53,251 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.88 vs. limit=15.0 2023-11-26 09:29:04,460 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3325580.0, ans=0.05 2023-11-26 09:29:09,680 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 498850 2023-11-26 09:29:10,995 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3325646.6666666665, ans=0.0 2023-11-26 09:29:17,113 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.08 vs. limit=15.0 2023-11-26 09:29:31,026 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3325780.0, ans=0.1 2023-11-26 09:29:32,103 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3325780.0, ans=0.125 2023-11-26 09:29:35,030 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.507e+01 8.712e+01 9.460e+01 1.009e+02 5.552e+02, threshold=1.892e+02, percent-clipped=1.0 2023-11-26 09:29:42,429 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 5900, loss[loss=0.079, simple_loss=0.1149, pruned_loss=0.0132, audio_tagging_loss=0.008355, over 15280.00 frames. ], tot_loss[loss=0.06738, simple_loss=0.09169, pruned_loss=0.01292, audio_tagging_loss=0.008611, over 3042049.56 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:29:42,695 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3325846.6666666665, ans=0.125 2023-11-26 09:29:56,999 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3325913.3333333335, ans=0.09899494936611666 2023-11-26 09:29:59,119 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3325913.3333333335, ans=0.125 2023-11-26 09:30:00,189 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3325913.3333333335, ans=0.125 2023-11-26 09:30:05,949 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 498900 2023-11-26 09:30:06,460 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.88 vs. limit=10.0 2023-11-26 09:30:07,085 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3325980.0, ans=0.2 2023-11-26 09:30:08,749 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 09:30:09,814 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3325980.0, ans=0.2 2023-11-26 09:30:15,651 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3326046.6666666665, ans=0.04949747468305833 2023-11-26 09:30:23,089 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3326046.6666666665, ans=0.0 2023-11-26 09:30:34,589 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3326113.3333333335, ans=0.125 2023-11-26 09:30:37,539 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 5950, loss[loss=0.07442, simple_loss=0.105, pruned_loss=0.01501, audio_tagging_loss=0.006922, over 16009.00 frames. ], tot_loss[loss=0.06663, simple_loss=0.09057, pruned_loss=0.01269, audio_tagging_loss=0.008656, over 3040080.53 frames. ], batch size: 62, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:30:49,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3326246.6666666665, ans=0.125 2023-11-26 09:31:01,972 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 498950 2023-11-26 09:31:07,324 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3326313.3333333335, ans=0.07 2023-11-26 09:31:08,454 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3326313.3333333335, ans=0.125 2023-11-26 09:31:12,051 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.66 vs. limit=15.0 2023-11-26 09:31:12,781 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3326380.0, ans=0.0 2023-11-26 09:31:20,312 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 09:31:25,837 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.496e+01 8.795e+01 9.324e+01 1.011e+02 1.404e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-26 09:31:33,848 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 6000, loss[loss=0.05609, simple_loss=0.07269, pruned_loss=0.01134, audio_tagging_loss=0.008407, over 14936.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.09011, pruned_loss=0.01261, audio_tagging_loss=0.008617, over 3038507.76 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:31:33,849 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-26 09:31:49,475 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.1990, 3.0960, 2.6170, 2.7780], device='cuda:3') 2023-11-26 09:32:00,692 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([1.7629, 3.0248, 2.6942, 2.6516, 3.3753, 3.3839, 3.1127, 3.5861], device='cuda:3') 2023-11-26 09:32:06,553 INFO [train_asr.py:1267] (3/4) Epoch 42, validation: loss=0.05807, simple_loss=0.05064, pruned_loss=0.005286, audio_tagging_loss=0.02746, over 4681554.00 frames. 2023-11-26 09:32:06,553 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-26 09:32:07,858 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3326513.3333333335, ans=0.125 2023-11-26 09:32:12,031 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3326513.3333333335, ans=0.0 2023-11-26 09:32:29,924 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 499000 2023-11-26 09:32:40,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3326713.3333333335, ans=0.0 2023-11-26 09:32:45,648 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 09:33:01,923 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 6050, loss[loss=0.05985, simple_loss=0.07617, pruned_loss=0.01195, audio_tagging_loss=0.00982, over 15244.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.09024, pruned_loss=0.01263, audio_tagging_loss=0.008621, over 3033547.11 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:33:04,210 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3326846.6666666665, ans=0.125 2023-11-26 09:33:12,296 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3326913.3333333335, ans=0.0 2023-11-26 09:33:25,965 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 499050 2023-11-26 09:33:27,250 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3326980.0, ans=0.0 2023-11-26 09:33:29,328 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3326980.0, ans=0.125 2023-11-26 09:33:31,987 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.88 vs. limit=22.5 2023-11-26 09:33:49,741 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.079e+01 8.694e+01 9.426e+01 1.019e+02 1.507e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-26 09:33:54,787 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3327113.3333333335, ans=0.125 2023-11-26 09:33:54,805 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3327113.3333333335, ans=0.95 2023-11-26 09:33:58,298 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 6100, loss[loss=0.04547, simple_loss=0.05564, pruned_loss=0.007554, audio_tagging_loss=0.01009, over 15809.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.08989, pruned_loss=0.01249, audio_tagging_loss=0.008616, over 3036225.23 frames. ], batch size: 61, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:33:58,462 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3327180.0, ans=0.125 2023-11-26 09:34:11,721 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3327246.6666666665, ans=0.0 2023-11-26 09:34:21,294 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 499100 2023-11-26 09:34:54,327 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 6150, loss[loss=0.06836, simple_loss=0.1023, pruned_loss=0.01063, audio_tagging_loss=0.006568, over 15148.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.09075, pruned_loss=0.01258, audio_tagging_loss=0.008634, over 3034813.54 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:35:01,218 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.98 vs. limit=12.0 2023-11-26 09:35:17,632 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 499150 2023-11-26 09:35:30,862 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.51 vs. limit=15.0 2023-11-26 09:35:36,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=3327713.3333333335, ans=22.5 2023-11-26 09:35:40,064 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3327780.0, ans=0.2 2023-11-26 09:35:42,932 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.401e+01 8.630e+01 9.202e+01 1.002e+02 1.257e+02, threshold=1.840e+02, percent-clipped=0.0 2023-11-26 09:35:49,834 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 6200, loss[loss=0.06177, simple_loss=0.08974, pruned_loss=0.01008, audio_tagging_loss=0.006818, over 14525.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.09007, pruned_loss=0.01259, audio_tagging_loss=0.00871, over 3034995.21 frames. ], batch size: 53, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:35:53,526 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.53 vs. limit=15.0 2023-11-26 09:36:00,665 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.97 vs. limit=15.0 2023-11-26 09:36:13,304 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 499200 2023-11-26 09:36:46,678 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 6250, loss[loss=0.03937, simple_loss=0.04249, pruned_loss=0.007829, audio_tagging_loss=0.01029, over 14860.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08893, pruned_loss=0.01236, audio_tagging_loss=0.008779, over 3037380.41 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:36:52,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3328180.0, ans=0.125 2023-11-26 09:36:55,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3328180.0, ans=0.2 2023-11-26 09:37:00,039 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3328246.6666666665, ans=0.0 2023-11-26 09:37:00,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3328246.6666666665, ans=0.0 2023-11-26 09:37:09,557 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 499250 2023-11-26 09:37:13,970 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3328313.3333333335, ans=0.125 2023-11-26 09:37:23,797 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten.whitening_limit, batch_count=3328380.0, ans=22.5 2023-11-26 09:37:29,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3328380.0, ans=0.125 2023-11-26 09:37:36,549 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.201e+01 8.808e+01 9.356e+01 1.010e+02 1.714e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-26 09:37:40,108 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3328446.6666666665, ans=0.0 2023-11-26 09:37:42,472 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 6300, loss[loss=0.06439, simple_loss=0.08559, pruned_loss=0.01148, audio_tagging_loss=0.01011, over 14754.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.08927, pruned_loss=0.01233, audio_tagging_loss=0.008858, over 3038791.97 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 8.0 2023-11-26 09:37:56,441 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3328580.0, ans=0.125 2023-11-26 09:38:05,825 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 499300 2023-11-26 09:38:18,458 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.07 vs. limit=15.0 2023-11-26 09:38:27,200 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.27 vs. limit=15.0 2023-11-26 09:38:33,731 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.83 vs. limit=10.0 2023-11-26 09:38:36,491 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3328846.6666666665, ans=0.0 2023-11-26 09:38:37,417 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 6350, loss[loss=0.08341, simple_loss=0.1135, pruned_loss=0.01855, audio_tagging_loss=0.00813, over 15744.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.08935, pruned_loss=0.01253, audio_tagging_loss=0.008932, over 3039526.71 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 8.0 2023-11-26 09:38:41,366 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3328846.6666666665, ans=0.0 2023-11-26 09:38:49,421 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.31 vs. limit=15.0 2023-11-26 09:39:01,257 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 499350 2023-11-26 09:39:12,987 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.32 vs. limit=15.0 2023-11-26 09:39:23,856 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3329113.3333333335, ans=0.1 2023-11-26 09:39:27,766 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.655e+01 8.815e+01 9.462e+01 1.012e+02 1.581e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-26 09:39:33,589 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 6400, loss[loss=0.06335, simple_loss=0.08812, pruned_loss=0.008195, audio_tagging_loss=0.01109, over 14840.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.08906, pruned_loss=0.01243, audio_tagging_loss=0.009069, over 3034918.11 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:39:40,741 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3329180.0, ans=0.125 2023-11-26 09:39:50,057 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3329246.6666666665, ans=0.0 2023-11-26 09:39:56,838 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 499400 2023-11-26 09:40:06,243 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3329380.0, ans=0.125 2023-11-26 09:40:13,045 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3329380.0, ans=0.125 2023-11-26 09:40:24,102 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3329446.6666666665, ans=0.125 2023-11-26 09:40:29,305 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 6450, loss[loss=0.05751, simple_loss=0.0786, pruned_loss=0.01043, audio_tagging_loss=0.007789, over 16200.00 frames. ], tot_loss[loss=0.0663, simple_loss=0.08949, pruned_loss=0.01245, audio_tagging_loss=0.009105, over 3046194.64 frames. ], batch size: 61, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:40:33,037 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.01 vs. limit=15.0 2023-11-26 09:40:40,006 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.38 vs. limit=15.0 2023-11-26 09:40:52,746 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 499450 2023-11-26 09:41:19,119 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.267e+01 8.730e+01 9.364e+01 9.937e+01 1.364e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-26 09:41:25,101 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 6500, loss[loss=0.05801, simple_loss=0.08004, pruned_loss=0.00848, audio_tagging_loss=0.009509, over 14970.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.0891, pruned_loss=0.01237, audio_tagging_loss=0.00906, over 3052124.25 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:41:33,281 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3329846.6666666665, ans=0.0 2023-11-26 09:41:34,167 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3329846.6666666665, ans=0.0 2023-11-26 09:41:46,265 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.38 vs. limit=15.0 2023-11-26 09:41:47,943 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3329980.0, ans=0.125 2023-11-26 09:41:48,833 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 499500 2023-11-26 09:41:58,339 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3330046.6666666665, ans=0.2 2023-11-26 09:41:59,701 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.83 vs. limit=15.0 2023-11-26 09:42:06,278 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3330046.6666666665, ans=0.125 2023-11-26 09:42:14,363 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3330113.3333333335, ans=0.2 2023-11-26 09:42:15,549 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.90 vs. limit=12.0 2023-11-26 09:42:20,848 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 6550, loss[loss=0.06592, simple_loss=0.0907, pruned_loss=0.01175, audio_tagging_loss=0.008823, over 15184.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08864, pruned_loss=0.0122, audio_tagging_loss=0.008905, over 3046870.36 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:42:29,034 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3330180.0, ans=0.2 2023-11-26 09:42:33,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3330246.6666666665, ans=0.1 2023-11-26 09:42:43,922 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 499550 2023-11-26 09:42:54,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3330380.0, ans=0.125 2023-11-26 09:43:04,644 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3330446.6666666665, ans=0.1 2023-11-26 09:43:11,015 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.23 vs. limit=15.0 2023-11-26 09:43:11,435 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.554e+01 8.534e+01 9.204e+01 9.909e+01 1.481e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-26 09:43:15,790 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3330513.3333333335, ans=0.2 2023-11-26 09:43:16,670 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 6600, loss[loss=0.07202, simple_loss=0.0981, pruned_loss=0.01398, audio_tagging_loss=0.008993, over 15601.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08895, pruned_loss=0.01218, audio_tagging_loss=0.008778, over 3053176.76 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:43:22,085 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3330513.3333333335, ans=0.5 2023-11-26 09:43:26,364 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3330580.0, ans=0.0 2023-11-26 09:43:29,442 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3330580.0, ans=0.0 2023-11-26 09:43:30,833 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.66 vs. limit=10.0 2023-11-26 09:43:37,891 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3330646.6666666665, ans=0.125 2023-11-26 09:43:39,872 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 499600 2023-11-26 09:43:42,381 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3330646.6666666665, ans=0.0 2023-11-26 09:43:52,104 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.42 vs. limit=22.5 2023-11-26 09:44:11,840 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 6650, loss[loss=0.07244, simple_loss=0.1004, pruned_loss=0.01359, audio_tagging_loss=0.008663, over 15533.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08872, pruned_loss=0.01208, audio_tagging_loss=0.008773, over 3050881.47 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:44:36,717 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 499650 2023-11-26 09:44:50,831 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.26 vs. limit=15.0 2023-11-26 09:44:59,022 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3331113.3333333335, ans=0.035 2023-11-26 09:45:02,560 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.147e+01 8.692e+01 9.274e+01 1.004e+02 1.194e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-26 09:45:07,954 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 6700, loss[loss=0.06604, simple_loss=0.08242, pruned_loss=0.01721, audio_tagging_loss=0.007619, over 14203.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08882, pruned_loss=0.01213, audio_tagging_loss=0.008716, over 3047724.60 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:45:15,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3331180.0, ans=0.125 2023-11-26 09:45:28,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3331246.6666666665, ans=0.125 2023-11-26 09:45:31,205 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 499700 2023-11-26 09:45:32,287 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3331313.3333333335, ans=0.015 2023-11-26 09:45:50,466 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3331380.0, ans=0.125 2023-11-26 09:45:53,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3331446.6666666665, ans=0.125 2023-11-26 09:46:04,041 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 6750, loss[loss=0.04703, simple_loss=0.05027, pruned_loss=0.008922, audio_tagging_loss=0.01297, over 14834.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08878, pruned_loss=0.01227, audio_tagging_loss=0.008762, over 3044201.45 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:46:18,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3331580.0, ans=0.2 2023-11-26 09:46:19,098 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3331580.0, ans=0.125 2023-11-26 09:46:26,957 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 499750 2023-11-26 09:46:37,779 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3331713.3333333335, ans=0.035 2023-11-26 09:46:42,706 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3331713.3333333335, ans=0.0 2023-11-26 09:46:46,315 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.97 vs. limit=12.0 2023-11-26 09:46:51,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3331780.0, ans=0.0 2023-11-26 09:46:51,149 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3331780.0, ans=0.125 2023-11-26 09:46:53,938 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.511e+01 8.795e+01 9.376e+01 1.027e+02 1.751e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-26 09:46:59,205 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 6800, loss[loss=0.06489, simple_loss=0.08208, pruned_loss=0.0128, audio_tagging_loss=0.01105, over 14391.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08898, pruned_loss=0.01241, audio_tagging_loss=0.008756, over 3052223.61 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:47:15,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3331913.3333333335, ans=0.0 2023-11-26 09:47:15,480 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.90 vs. limit=15.0 2023-11-26 09:47:23,618 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 499800 2023-11-26 09:47:42,310 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.18 vs. limit=15.0 2023-11-26 09:47:43,087 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3332113.3333333335, ans=0.0 2023-11-26 09:47:44,156 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3332113.3333333335, ans=0.2 2023-11-26 09:47:55,094 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 6850, loss[loss=0.06876, simple_loss=0.08934, pruned_loss=0.01291, audio_tagging_loss=0.01119, over 14191.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08921, pruned_loss=0.01226, audio_tagging_loss=0.008663, over 3046040.29 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:48:02,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3332180.0, ans=0.125 2023-11-26 09:48:08,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3332246.6666666665, ans=0.0 2023-11-26 09:48:14,727 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3332246.6666666665, ans=0.0 2023-11-26 09:48:18,876 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 499850 2023-11-26 09:48:23,238 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3332313.3333333335, ans=0.0 2023-11-26 09:48:26,806 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.09 vs. limit=22.5 2023-11-26 09:48:27,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3332380.0, ans=0.07 2023-11-26 09:48:34,333 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3332380.0, ans=0.1 2023-11-26 09:48:40,175 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.80 vs. limit=10.0 2023-11-26 09:48:45,308 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.499e+01 8.735e+01 9.366e+01 1.004e+02 1.286e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-26 09:48:49,591 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.18 vs. limit=12.0 2023-11-26 09:48:51,079 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 6900, loss[loss=0.05563, simple_loss=0.07604, pruned_loss=0.009839, audio_tagging_loss=0.007766, over 14747.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.0892, pruned_loss=0.01229, audio_tagging_loss=0.00869, over 3045280.29 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:48:54,470 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3332513.3333333335, ans=0.0 2023-11-26 09:49:08,278 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3332580.0, ans=0.125 2023-11-26 09:49:13,453 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 499900 2023-11-26 09:49:27,611 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.54 vs. limit=15.0 2023-11-26 09:49:32,218 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3332713.3333333335, ans=0.125 2023-11-26 09:49:33,066 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 09:49:45,789 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 6950, loss[loss=0.05494, simple_loss=0.07165, pruned_loss=0.008438, audio_tagging_loss=0.01067, over 14244.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08941, pruned_loss=0.01242, audio_tagging_loss=0.008784, over 3042289.59 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:49:50,848 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.84 vs. limit=15.0 2023-11-26 09:50:01,378 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3332913.3333333335, ans=0.125 2023-11-26 09:50:09,631 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 499950 2023-11-26 09:50:24,178 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3333046.6666666665, ans=0.125 2023-11-26 09:50:31,678 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3333113.3333333335, ans=0.125 2023-11-26 09:50:35,562 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.592e+01 8.609e+01 9.217e+01 1.000e+02 1.228e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-26 09:50:38,983 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3333113.3333333335, ans=0.2 2023-11-26 09:50:41,423 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 7000, loss[loss=0.05938, simple_loss=0.07748, pruned_loss=0.01111, audio_tagging_loss=0.009533, over 14667.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.08918, pruned_loss=0.01233, audio_tagging_loss=0.008856, over 3039299.15 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:50:44,912 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3333180.0, ans=0.2 2023-11-26 09:50:51,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3333246.6666666665, ans=0.05 2023-11-26 09:50:53,422 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3333246.6666666665, ans=0.125 2023-11-26 09:51:05,441 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 500000 2023-11-26 09:51:22,583 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3333380.0, ans=0.1 2023-11-26 09:51:29,263 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3333446.6666666665, ans=0.0 2023-11-26 09:51:40,246 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 7050, loss[loss=0.06847, simple_loss=0.09418, pruned_loss=0.01362, audio_tagging_loss=0.007765, over 15081.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.08944, pruned_loss=0.01233, audio_tagging_loss=0.008882, over 3040097.55 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:52:00,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3333646.6666666665, ans=0.125 2023-11-26 09:52:02,593 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 500050 2023-11-26 09:52:10,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3333646.6666666665, ans=0.025 2023-11-26 09:52:12,845 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3333713.3333333335, ans=0.2 2023-11-26 09:52:14,927 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3333713.3333333335, ans=0.0 2023-11-26 09:52:15,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3333713.3333333335, ans=0.125 2023-11-26 09:52:18,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3333713.3333333335, ans=0.0 2023-11-26 09:52:22,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=3333713.3333333335, ans=0.05 2023-11-26 09:52:26,080 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3333780.0, ans=0.0 2023-11-26 09:52:31,145 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.330e+01 8.700e+01 9.418e+01 1.001e+02 1.210e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-26 09:52:34,749 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=3333846.6666666665, ans=0.2 2023-11-26 09:52:35,498 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 7100, loss[loss=0.06424, simple_loss=0.0854, pruned_loss=0.01269, audio_tagging_loss=0.008854, over 15825.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08882, pruned_loss=0.01226, audio_tagging_loss=0.008963, over 3045562.69 frames. ], batch size: 60, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:52:58,305 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 500100 2023-11-26 09:53:18,606 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3334113.3333333335, ans=0.0 2023-11-26 09:53:21,939 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3334113.3333333335, ans=0.125 2023-11-26 09:53:30,084 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 7150, loss[loss=0.07939, simple_loss=0.1088, pruned_loss=0.01756, audio_tagging_loss=0.007447, over 15140.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.08925, pruned_loss=0.01228, audio_tagging_loss=0.008973, over 3043874.70 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:53:30,596 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.86 vs. limit=15.0 2023-11-26 09:53:37,422 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3334180.0, ans=0.07 2023-11-26 09:53:52,519 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3334313.3333333335, ans=0.0 2023-11-26 09:53:52,623 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3334313.3333333335, ans=0.125 2023-11-26 09:53:54,574 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 500150 2023-11-26 09:53:55,327 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.07 vs. limit=15.0 2023-11-26 09:54:02,247 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3334313.3333333335, ans=0.125 2023-11-26 09:54:02,669 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.51 vs. limit=15.0 2023-11-26 09:54:14,313 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3334446.6666666665, ans=0.0 2023-11-26 09:54:15,337 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3334446.6666666665, ans=0.1 2023-11-26 09:54:20,581 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3334446.6666666665, ans=0.125 2023-11-26 09:54:21,459 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.118e+01 8.683e+01 9.180e+01 1.005e+02 1.351e+02, threshold=1.836e+02, percent-clipped=0.0 2023-11-26 09:54:26,313 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 7200, loss[loss=0.07348, simple_loss=0.1055, pruned_loss=0.01324, audio_tagging_loss=0.007497, over 14095.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.08958, pruned_loss=0.01228, audio_tagging_loss=0.008937, over 3047715.02 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:54:30,326 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3334513.3333333335, ans=0.04949747468305833 2023-11-26 09:54:31,733 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3334513.3333333335, ans=0.04949747468305833 2023-11-26 09:54:34,479 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.40 vs. limit=22.5 2023-11-26 09:54:35,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3334513.3333333335, ans=0.04949747468305833 2023-11-26 09:54:43,511 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3334580.0, ans=0.125 2023-11-26 09:54:45,472 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3334580.0, ans=0.0 2023-11-26 09:54:47,566 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3334646.6666666665, ans=0.125 2023-11-26 09:54:49,692 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 500200 2023-11-26 09:55:22,853 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 7250, loss[loss=0.05571, simple_loss=0.07388, pruned_loss=0.009934, audio_tagging_loss=0.008839, over 14653.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.08922, pruned_loss=0.01226, audio_tagging_loss=0.008981, over 3046631.21 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 09:55:34,791 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3334913.3333333335, ans=0.125 2023-11-26 09:55:38,046 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3334913.3333333335, ans=0.1 2023-11-26 09:55:45,987 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 500250 2023-11-26 09:55:47,152 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3334980.0, ans=0.2 2023-11-26 09:56:05,896 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3335046.6666666665, ans=0.2 2023-11-26 09:56:13,265 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3335113.3333333335, ans=0.0 2023-11-26 09:56:15,149 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.837e+01 8.852e+01 9.361e+01 1.013e+02 1.372e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-26 09:56:18,398 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 7300, loss[loss=0.07339, simple_loss=0.1044, pruned_loss=0.0142, audio_tagging_loss=0.006989, over 15568.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.08988, pruned_loss=0.01222, audio_tagging_loss=0.008884, over 3048549.58 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:56:20,227 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.50 vs. limit=22.5 2023-11-26 09:56:23,438 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3335180.0, ans=0.125 2023-11-26 09:56:23,737 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.20 vs. limit=12.0 2023-11-26 09:56:27,641 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3335180.0, ans=0.0 2023-11-26 09:56:32,383 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3335246.6666666665, ans=0.95 2023-11-26 09:56:33,337 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3335246.6666666665, ans=0.125 2023-11-26 09:56:38,215 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3335246.6666666665, ans=0.0 2023-11-26 09:56:42,133 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 500300 2023-11-26 09:57:14,282 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 7350, loss[loss=0.06443, simple_loss=0.09939, pruned_loss=0.009463, audio_tagging_loss=0.005271, over 15297.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.0901, pruned_loss=0.01248, audio_tagging_loss=0.008733, over 3052965.24 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:57:21,404 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3335513.3333333335, ans=0.125 2023-11-26 09:57:26,622 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3335580.0, ans=0.0 2023-11-26 09:57:34,674 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3335580.0, ans=0.0 2023-11-26 09:57:37,763 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 500350 2023-11-26 09:57:51,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3335713.3333333335, ans=0.125 2023-11-26 09:58:06,802 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.208e+01 8.616e+01 9.240e+01 1.003e+02 1.313e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-26 09:58:09,962 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 7400, loss[loss=0.07582, simple_loss=0.108, pruned_loss=0.0166, audio_tagging_loss=0.005223, over 15376.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.09009, pruned_loss=0.01244, audio_tagging_loss=0.008644, over 3048282.40 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:58:29,242 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.09 vs. limit=15.0 2023-11-26 09:58:31,320 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3335980.0, ans=0.125 2023-11-26 09:58:33,309 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 500400 2023-11-26 09:59:05,895 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 7450, loss[loss=0.05148, simple_loss=0.07693, pruned_loss=0.007692, audio_tagging_loss=0.005323, over 14384.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08924, pruned_loss=0.0124, audio_tagging_loss=0.008601, over 3048649.39 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 09:59:29,772 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 500450 2023-11-26 09:59:58,272 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.569e+01 8.877e+01 9.404e+01 1.015e+02 1.370e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-26 10:00:01,515 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 7500, loss[loss=0.06338, simple_loss=0.07863, pruned_loss=0.01208, audio_tagging_loss=0.01198, over 15991.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08927, pruned_loss=0.01229, audio_tagging_loss=0.008605, over 3043831.20 frames. ], batch size: 60, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 10:00:03,455 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3336513.3333333335, ans=0.125 2023-11-26 10:00:07,470 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3336513.3333333335, ans=0.0 2023-11-26 10:00:25,308 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 500500 2023-11-26 10:00:25,394 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3336646.6666666665, ans=0.125 2023-11-26 10:00:29,711 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3336646.6666666665, ans=0.125 2023-11-26 10:00:33,227 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.59 vs. limit=12.0 2023-11-26 10:00:48,668 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3336780.0, ans=0.015 2023-11-26 10:00:52,511 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3336780.0, ans=0.0 2023-11-26 10:00:57,572 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 7550, loss[loss=0.07997, simple_loss=0.113, pruned_loss=0.01505, audio_tagging_loss=0.008438, over 15504.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.08902, pruned_loss=0.01214, audio_tagging_loss=0.008579, over 3046330.60 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 10:01:01,039 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 10:01:05,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=3336846.6666666665, ans=0.5 2023-11-26 10:01:10,279 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.16 vs. limit=6.0 2023-11-26 10:01:21,052 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 500550 2023-11-26 10:01:22,269 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3336980.0, ans=0.125 2023-11-26 10:01:44,659 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3337113.3333333335, ans=0.1 2023-11-26 10:01:45,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3337113.3333333335, ans=0.1 2023-11-26 10:01:49,615 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.075e+01 8.478e+01 8.999e+01 9.554e+01 1.187e+02, threshold=1.800e+02, percent-clipped=0.0 2023-11-26 10:01:53,388 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 7600, loss[loss=0.06237, simple_loss=0.08026, pruned_loss=0.01309, audio_tagging_loss=0.009152, over 16160.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08888, pruned_loss=0.01214, audio_tagging_loss=0.008582, over 3053490.90 frames. ], batch size: 60, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 10:02:01,435 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.40 vs. limit=22.5 2023-11-26 10:02:03,542 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3337246.6666666665, ans=0.0 2023-11-26 10:02:14,571 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3337313.3333333335, ans=0.125 2023-11-26 10:02:17,239 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 500600 2023-11-26 10:02:17,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3337313.3333333335, ans=0.125 2023-11-26 10:02:25,499 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.37 vs. limit=22.5 2023-11-26 10:02:35,367 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.43 vs. limit=6.0 2023-11-26 10:02:36,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3337380.0, ans=0.2 2023-11-26 10:02:49,122 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 7650, loss[loss=0.07795, simple_loss=0.1132, pruned_loss=0.01211, audio_tagging_loss=0.009265, over 16171.00 frames. ], tot_loss[loss=0.06468, simple_loss=0.08828, pruned_loss=0.01193, audio_tagging_loss=0.008609, over 3046843.36 frames. ], batch size: 60, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 10:02:55,207 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3337513.3333333335, ans=0.125 2023-11-26 10:03:01,600 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=3337580.0, ans=22.5 2023-11-26 10:03:09,705 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3337580.0, ans=0.0 2023-11-26 10:03:12,804 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 500650 2023-11-26 10:03:13,537 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.92 vs. limit=15.0 2023-11-26 10:03:13,994 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3337646.6666666665, ans=0.125 2023-11-26 10:03:19,849 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3337646.6666666665, ans=0.1 2023-11-26 10:03:32,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3337713.3333333335, ans=0.0 2023-11-26 10:03:41,954 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.661e+01 8.670e+01 9.156e+01 1.010e+02 1.245e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-26 10:03:45,679 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 7700, loss[loss=0.06499, simple_loss=0.07924, pruned_loss=0.01625, audio_tagging_loss=0.009114, over 15028.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08902, pruned_loss=0.01204, audio_tagging_loss=0.008608, over 3043675.70 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 10:04:08,444 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 500700 2023-11-26 10:04:12,184 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3337980.0, ans=0.0 2023-11-26 10:04:12,651 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.40 vs. limit=15.0 2023-11-26 10:04:15,397 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3337980.0, ans=0.0 2023-11-26 10:04:21,480 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.81 vs. limit=15.0 2023-11-26 10:04:40,558 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 7750, loss[loss=0.06028, simple_loss=0.07895, pruned_loss=0.01148, audio_tagging_loss=0.009329, over 15842.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.0899, pruned_loss=0.01219, audio_tagging_loss=0.008631, over 3048484.60 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 10:05:04,473 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 500750 2023-11-26 10:05:04,812 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.01 vs. limit=6.0 2023-11-26 10:05:07,785 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3338313.3333333335, ans=0.125 2023-11-26 10:05:20,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3338380.0, ans=0.025 2023-11-26 10:05:32,992 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.118e+01 9.045e+01 9.530e+01 1.049e+02 1.522e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-26 10:05:36,762 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 7800, loss[loss=0.05788, simple_loss=0.07567, pruned_loss=0.01044, audio_tagging_loss=0.009603, over 14946.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.0896, pruned_loss=0.01213, audio_tagging_loss=0.008795, over 3040530.83 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 10:06:00,207 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 500800 2023-11-26 10:06:00,424 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3338646.6666666665, ans=0.0 2023-11-26 10:06:05,050 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3338646.6666666665, ans=0.0 2023-11-26 10:06:17,227 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3338713.3333333335, ans=0.0 2023-11-26 10:06:23,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3338780.0, ans=0.2 2023-11-26 10:06:32,784 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 7850, loss[loss=0.07426, simple_loss=0.08893, pruned_loss=0.01845, audio_tagging_loss=0.01134, over 14273.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.09041, pruned_loss=0.01222, audio_tagging_loss=0.008734, over 3046591.69 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 10:06:42,321 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.07 vs. limit=10.0 2023-11-26 10:06:50,479 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3338913.3333333335, ans=0.125 2023-11-26 10:06:55,623 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 500850 2023-11-26 10:07:06,490 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3339046.6666666665, ans=0.125 2023-11-26 10:07:17,965 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3339113.3333333335, ans=0.0 2023-11-26 10:07:25,272 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.232e+01 8.991e+01 9.385e+01 9.905e+01 1.223e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-26 10:07:27,589 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3339180.0, ans=0.125 2023-11-26 10:07:28,426 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 7900, loss[loss=0.07636, simple_loss=0.102, pruned_loss=0.01446, audio_tagging_loss=0.0109, over 15107.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.09061, pruned_loss=0.01235, audio_tagging_loss=0.008786, over 3043803.36 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 10:07:52,008 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 500900 2023-11-26 10:08:20,090 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3339446.6666666665, ans=0.125 2023-11-26 10:08:23,021 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 7950, loss[loss=0.08278, simple_loss=0.1149, pruned_loss=0.01578, audio_tagging_loss=0.009569, over 14951.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.09009, pruned_loss=0.01231, audio_tagging_loss=0.008867, over 3039783.97 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 10:08:27,487 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3339513.3333333335, ans=0.125 2023-11-26 10:08:29,928 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.61 vs. limit=15.0 2023-11-26 10:08:29,980 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.17 vs. limit=6.0 2023-11-26 10:08:38,283 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 10:08:42,127 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 10:08:46,433 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3339646.6666666665, ans=0.0 2023-11-26 10:08:46,759 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.61 vs. limit=22.5 2023-11-26 10:08:47,342 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 500950 2023-11-26 10:09:08,642 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.09 vs. limit=15.0 2023-11-26 10:09:10,809 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3339780.0, ans=0.0 2023-11-26 10:09:15,821 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.552e+01 8.923e+01 9.464e+01 1.020e+02 1.321e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 10:09:16,730 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3339780.0, ans=0.0 2023-11-26 10:09:19,575 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 8000, loss[loss=0.04184, simple_loss=0.05575, pruned_loss=0.003405, audio_tagging_loss=0.01056, over 15302.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.0887, pruned_loss=0.01201, audio_tagging_loss=0.009059, over 3033420.20 frames. ], batch size: 60, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 10:09:29,283 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.31 vs. limit=15.0 2023-11-26 10:09:29,903 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3339913.3333333335, ans=0.95 2023-11-26 10:09:36,244 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3339913.3333333335, ans=0.0 2023-11-26 10:09:37,224 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3339913.3333333335, ans=0.125 2023-11-26 10:09:42,425 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 501000 2023-11-26 10:09:45,959 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3339980.0, ans=0.5 2023-11-26 10:09:46,954 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3339980.0, ans=0.04949747468305833 2023-11-26 10:09:55,462 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3340046.6666666665, ans=0.125 2023-11-26 10:10:15,404 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 8050, loss[loss=0.07463, simple_loss=0.1001, pruned_loss=0.01484, audio_tagging_loss=0.009751, over 15393.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08896, pruned_loss=0.01224, audio_tagging_loss=0.009119, over 3033390.79 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 10:10:15,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3340180.0, ans=0.2 2023-11-26 10:10:15,956 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.76 vs. limit=6.0 2023-11-26 10:10:16,560 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3340180.0, ans=0.2 2023-11-26 10:10:25,117 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3340246.6666666665, ans=0.04949747468305833 2023-11-26 10:10:38,625 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 501050 2023-11-26 10:11:06,535 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3340446.6666666665, ans=0.1 2023-11-26 10:11:07,297 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.954e+01 8.742e+01 9.437e+01 9.965e+01 1.262e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-26 10:11:10,521 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 8100, loss[loss=0.0557, simple_loss=0.07509, pruned_loss=0.00979, audio_tagging_loss=0.008371, over 15463.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.08965, pruned_loss=0.01232, audio_tagging_loss=0.009028, over 3044146.60 frames. ], batch size: 62, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 10:11:10,799 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3340513.3333333335, ans=0.0 2023-11-26 10:11:11,031 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.64 vs. limit=15.0 2023-11-26 10:11:35,114 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 501100 2023-11-26 10:11:43,742 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3340713.3333333335, ans=0.1 2023-11-26 10:11:48,281 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.whiten.whitening_limit, batch_count=3340713.3333333335, ans=12.0 2023-11-26 10:11:54,227 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3340780.0, ans=0.0 2023-11-26 10:11:54,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3340780.0, ans=0.125 2023-11-26 10:11:58,999 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3340780.0, ans=0.1 2023-11-26 10:12:06,554 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=3340846.6666666665, ans=0.025 2023-11-26 10:12:07,296 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 8150, loss[loss=0.07627, simple_loss=0.103, pruned_loss=0.01485, audio_tagging_loss=0.009941, over 14127.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.09028, pruned_loss=0.01245, audio_tagging_loss=0.00893, over 3041461.75 frames. ], batch size: 52, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 10:12:14,286 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3340846.6666666665, ans=0.0 2023-11-26 10:12:17,543 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3340913.3333333335, ans=0.0 2023-11-26 10:12:18,562 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 10:12:20,743 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3340913.3333333335, ans=0.0 2023-11-26 10:12:27,495 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.76 vs. limit=6.0 2023-11-26 10:12:30,326 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 501150 2023-11-26 10:12:39,551 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=3341046.6666666665, ans=0.05 2023-11-26 10:12:49,758 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=3341046.6666666665, ans=6.0 2023-11-26 10:12:55,880 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3341113.3333333335, ans=0.0 2023-11-26 10:13:00,382 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.443e+01 8.798e+01 9.302e+01 1.005e+02 1.230e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-26 10:13:02,550 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 8200, loss[loss=0.06932, simple_loss=0.09721, pruned_loss=0.01518, audio_tagging_loss=0.005529, over 16112.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.08989, pruned_loss=0.01236, audio_tagging_loss=0.008837, over 3042287.05 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 10:13:03,684 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 10:13:23,293 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3341313.3333333335, ans=0.125 2023-11-26 10:13:25,290 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 501200 2023-11-26 10:13:28,867 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 10:13:32,636 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3341313.3333333335, ans=0.0 2023-11-26 10:13:32,687 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3341313.3333333335, ans=0.125 2023-11-26 10:13:57,319 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 8250, loss[loss=0.06668, simple_loss=0.08587, pruned_loss=0.0114, audio_tagging_loss=0.01234, over 14843.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.09003, pruned_loss=0.01241, audio_tagging_loss=0.00875, over 3041213.15 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 10:14:06,823 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.21 vs. limit=15.0 2023-11-26 10:14:16,157 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3341580.0, ans=0.125 2023-11-26 10:14:17,214 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3341580.0, ans=0.1 2023-11-26 10:14:21,699 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 501250 2023-11-26 10:14:22,796 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3341646.6666666665, ans=0.125 2023-11-26 10:14:23,163 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.82 vs. limit=15.0 2023-11-26 10:14:37,616 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3341713.3333333335, ans=0.125 2023-11-26 10:14:37,745 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3341713.3333333335, ans=0.07 2023-11-26 10:14:45,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3341780.0, ans=0.125 2023-11-26 10:14:50,645 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.634e+01 8.840e+01 9.471e+01 1.008e+02 1.505e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-26 10:14:52,772 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 8300, loss[loss=0.07125, simple_loss=0.1013, pruned_loss=0.01233, audio_tagging_loss=0.008265, over 15427.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.09022, pruned_loss=0.01243, audio_tagging_loss=0.008805, over 3040133.78 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 10:15:16,625 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 501300 2023-11-26 10:15:49,213 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 8350, loss[loss=0.04543, simple_loss=0.05522, pruned_loss=0.006539, audio_tagging_loss=0.01128, over 15147.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.08958, pruned_loss=0.0123, audio_tagging_loss=0.008827, over 3038243.63 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-26 10:15:51,608 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3342180.0, ans=0.125 2023-11-26 10:15:53,029 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.36 vs. limit=15.0 2023-11-26 10:15:59,897 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3342246.6666666665, ans=0.125 2023-11-26 10:16:00,995 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3342246.6666666665, ans=0.0 2023-11-26 10:16:03,044 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3342246.6666666665, ans=0.0 2023-11-26 10:16:11,930 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 501350 2023-11-26 10:16:40,253 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3342446.6666666665, ans=0.1 2023-11-26 10:16:42,095 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.412e+01 8.918e+01 9.503e+01 1.018e+02 1.589e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-26 10:16:44,280 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 8400, loss[loss=0.05039, simple_loss=0.06834, pruned_loss=0.008883, audio_tagging_loss=0.007338, over 15091.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.09035, pruned_loss=0.01244, audio_tagging_loss=0.00876, over 3040492.91 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-26 10:17:08,414 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 501400 2023-11-26 10:17:17,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3342713.3333333335, ans=0.125 2023-11-26 10:17:18,954 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3342713.3333333335, ans=0.0 2023-11-26 10:17:22,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3342713.3333333335, ans=0.125 2023-11-26 10:17:25,280 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3342713.3333333335, ans=0.125 2023-11-26 10:17:25,762 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.45 vs. limit=22.5 2023-11-26 10:17:27,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3342713.3333333335, ans=0.0 2023-11-26 10:17:40,359 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 8450, loss[loss=0.0722, simple_loss=0.1036, pruned_loss=0.01272, audio_tagging_loss=0.007682, over 15102.00 frames. ], tot_loss[loss=0.06712, simple_loss=0.09151, pruned_loss=0.0127, audio_tagging_loss=0.00866, over 3045973.39 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:17:58,071 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.48 vs. limit=8.0 2023-11-26 10:18:01,613 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3342980.0, ans=0.0 2023-11-26 10:18:01,803 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.44 vs. limit=15.0 2023-11-26 10:18:03,599 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 501450 2023-11-26 10:18:33,661 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.695e+01 8.791e+01 9.317e+01 9.949e+01 1.409e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-26 10:18:34,215 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.90 vs. limit=22.5 2023-11-26 10:18:36,382 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 8500, loss[loss=0.08903, simple_loss=0.1265, pruned_loss=0.02062, audio_tagging_loss=0.005171, over 14820.00 frames. ], tot_loss[loss=0.06742, simple_loss=0.09192, pruned_loss=0.01283, audio_tagging_loss=0.008631, over 3046497.72 frames. ], batch size: 53, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:18:59,121 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 501500 2023-11-26 10:19:07,602 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3343380.0, ans=0.125 2023-11-26 10:19:11,687 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.40 vs. limit=15.0 2023-11-26 10:19:24,128 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3343446.6666666665, ans=0.0 2023-11-26 10:19:29,530 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3343446.6666666665, ans=0.125 2023-11-26 10:19:31,398 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 8550, loss[loss=0.0474, simple_loss=0.0573, pruned_loss=0.008539, audio_tagging_loss=0.01021, over 16072.00 frames. ], tot_loss[loss=0.06697, simple_loss=0.09141, pruned_loss=0.01265, audio_tagging_loss=0.008612, over 3046681.31 frames. ], batch size: 61, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:19:38,058 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3343513.3333333335, ans=0.125 2023-11-26 10:19:40,142 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3343513.3333333335, ans=0.0 2023-11-26 10:19:41,062 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3343580.0, ans=0.0 2023-11-26 10:19:54,773 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 501550 2023-11-26 10:20:04,521 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3343713.3333333335, ans=0.125 2023-11-26 10:20:22,441 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3343780.0, ans=0.125 2023-11-26 10:20:24,837 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.108e+01 8.767e+01 9.434e+01 1.006e+02 1.411e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-26 10:20:26,932 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 8600, loss[loss=0.05787, simple_loss=0.07749, pruned_loss=0.00814, audio_tagging_loss=0.01099, over 16200.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.09062, pruned_loss=0.01265, audio_tagging_loss=0.008665, over 3045946.72 frames. ], batch size: 61, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:20:29,949 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.18 vs. limit=5.0 2023-11-26 10:20:31,506 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3343846.6666666665, ans=0.125 2023-11-26 10:20:38,359 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3343913.3333333335, ans=0.125 2023-11-26 10:20:41,832 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3343913.3333333335, ans=0.015 2023-11-26 10:20:50,921 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 501600 2023-11-26 10:20:59,805 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3344046.6666666665, ans=0.0 2023-11-26 10:21:20,183 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.11 vs. limit=22.5 2023-11-26 10:21:23,363 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 8650, loss[loss=0.08144, simple_loss=0.1107, pruned_loss=0.01798, audio_tagging_loss=0.008137, over 14307.00 frames. ], tot_loss[loss=0.06682, simple_loss=0.09084, pruned_loss=0.01271, audio_tagging_loss=0.008682, over 3052872.96 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:21:42,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3344246.6666666665, ans=0.0 2023-11-26 10:21:46,211 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 501650 2023-11-26 10:22:05,103 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.68 vs. limit=6.0 2023-11-26 10:22:10,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3344446.6666666665, ans=0.125 2023-11-26 10:22:18,152 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.517e+01 8.843e+01 9.565e+01 1.046e+02 1.310e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-26 10:22:19,251 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 8700, loss[loss=0.07103, simple_loss=0.09251, pruned_loss=0.01583, audio_tagging_loss=0.008948, over 15369.00 frames. ], tot_loss[loss=0.06703, simple_loss=0.09116, pruned_loss=0.01268, audio_tagging_loss=0.00877, over 3056406.82 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:22:20,810 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.15 vs. limit=15.0 2023-11-26 10:22:24,058 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.79 vs. limit=22.5 2023-11-26 10:22:26,958 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3344513.3333333335, ans=0.0 2023-11-26 10:22:42,591 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 501700 2023-11-26 10:22:46,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3344646.6666666665, ans=0.07 2023-11-26 10:22:50,686 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3344646.6666666665, ans=0.0 2023-11-26 10:23:00,252 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.25 vs. limit=15.0 2023-11-26 10:23:05,065 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3344780.0, ans=0.0 2023-11-26 10:23:10,453 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3344780.0, ans=0.0 2023-11-26 10:23:14,991 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 8750, loss[loss=0.05052, simple_loss=0.06268, pruned_loss=0.008174, audio_tagging_loss=0.011, over 14220.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.09105, pruned_loss=0.01254, audio_tagging_loss=0.008847, over 3046183.14 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:23:26,282 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3344913.3333333335, ans=0.1 2023-11-26 10:23:32,269 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3344913.3333333335, ans=0.125 2023-11-26 10:23:37,652 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3344980.0, ans=0.1 2023-11-26 10:23:38,502 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 501750 2023-11-26 10:23:49,580 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3345046.6666666665, ans=0.125 2023-11-26 10:23:52,715 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-26 10:24:09,411 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.663e+01 8.969e+01 9.428e+01 9.946e+01 1.483e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-26 10:24:10,486 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 8800, loss[loss=0.06685, simple_loss=0.09541, pruned_loss=0.01146, audio_tagging_loss=0.007683, over 15744.00 frames. ], tot_loss[loss=0.06716, simple_loss=0.09137, pruned_loss=0.01256, audio_tagging_loss=0.00891, over 3046955.19 frames. ], batch size: 60, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:24:19,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3345180.0, ans=0.09899494936611666 2023-11-26 10:24:31,467 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.87 vs. limit=15.0 2023-11-26 10:24:34,040 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 501800 2023-11-26 10:24:41,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3345313.3333333335, ans=0.0 2023-11-26 10:24:52,503 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3345380.0, ans=0.125 2023-11-26 10:25:06,492 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 8850, loss[loss=0.08771, simple_loss=0.1271, pruned_loss=0.01773, audio_tagging_loss=0.006422, over 15293.00 frames. ], tot_loss[loss=0.06728, simple_loss=0.09146, pruned_loss=0.01265, audio_tagging_loss=0.0089, over 3047042.58 frames. ], batch size: 54, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:25:18,738 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 10:25:19,843 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3345580.0, ans=0.015 2023-11-26 10:25:29,803 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 501850 2023-11-26 10:25:38,258 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3345646.6666666665, ans=0.1 2023-11-26 10:25:45,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3345713.3333333335, ans=0.125 2023-11-26 10:25:46,722 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3345713.3333333335, ans=0.125 2023-11-26 10:25:49,145 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.78 vs. limit=22.5 2023-11-26 10:26:01,684 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 8900, loss[loss=0.06793, simple_loss=0.09564, pruned_loss=0.01332, audio_tagging_loss=0.006791, over 15217.00 frames. ], tot_loss[loss=0.06718, simple_loss=0.0913, pruned_loss=0.01275, audio_tagging_loss=0.008786, over 3048226.61 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 10:26:02,721 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.355e+01 8.662e+01 9.297e+01 1.004e+02 1.286e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-26 10:26:12,606 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3345913.3333333335, ans=0.2 2023-11-26 10:26:14,137 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.52 vs. limit=15.0 2023-11-26 10:26:25,478 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 501900 2023-11-26 10:26:38,891 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3346046.6666666665, ans=0.0 2023-11-26 10:26:47,027 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3346113.3333333335, ans=0.125 2023-11-26 10:26:52,248 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3346113.3333333335, ans=0.2 2023-11-26 10:26:55,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3346113.3333333335, ans=0.0 2023-11-26 10:26:56,921 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3346180.0, ans=0.125 2023-11-26 10:26:57,802 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 8950, loss[loss=0.05487, simple_loss=0.08361, pruned_loss=0.006447, audio_tagging_loss=0.006615, over 16052.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.09108, pruned_loss=0.01261, audio_tagging_loss=0.008684, over 3049009.58 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 10:27:00,160 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3346180.0, ans=0.1 2023-11-26 10:27:20,544 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 501950 2023-11-26 10:27:41,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3346446.6666666665, ans=0.0 2023-11-26 10:27:46,323 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3346446.6666666665, ans=0.0 2023-11-26 10:27:53,378 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 9000, loss[loss=0.05947, simple_loss=0.08071, pruned_loss=0.01143, audio_tagging_loss=0.007681, over 15241.00 frames. ], tot_loss[loss=0.06697, simple_loss=0.09129, pruned_loss=0.01273, audio_tagging_loss=0.008595, over 3041432.54 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 10:27:53,379 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-26 10:28:16,961 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.8973, 1.6441, 3.4676, 3.0258, 2.8673, 3.0797, 3.0425, 3.1982], device='cuda:3') 2023-11-26 10:28:26,042 INFO [train_asr.py:1267] (3/4) Epoch 42, validation: loss=0.05901, simple_loss=0.0506, pruned_loss=0.005264, audio_tagging_loss=0.02845, over 4681554.00 frames. 2023-11-26 10:28:26,043 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-26 10:28:27,064 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.648e+01 8.869e+01 9.478e+01 9.908e+01 1.192e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-26 10:28:39,237 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3346580.0, ans=0.125 2023-11-26 10:28:48,521 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.55 vs. limit=15.0 2023-11-26 10:28:49,210 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 502000 2023-11-26 10:29:11,811 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3346780.0, ans=0.125 2023-11-26 10:29:16,558 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3346780.0, ans=0.0 2023-11-26 10:29:21,683 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 9050, loss[loss=0.06418, simple_loss=0.08249, pruned_loss=0.01066, audio_tagging_loss=0.01227, over 15322.00 frames. ], tot_loss[loss=0.06732, simple_loss=0.09188, pruned_loss=0.01273, audio_tagging_loss=0.008645, over 3039360.38 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 10:29:26,518 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3346846.6666666665, ans=0.125 2023-11-26 10:29:36,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3346913.3333333335, ans=0.1 2023-11-26 10:29:44,473 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 502050 2023-11-26 10:29:53,122 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3346980.0, ans=0.2 2023-11-26 10:30:00,027 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3347046.6666666665, ans=0.0 2023-11-26 10:30:11,472 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.87 vs. limit=15.0 2023-11-26 10:30:12,254 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3347113.3333333335, ans=0.125 2023-11-26 10:30:14,470 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3347113.3333333335, ans=0.125 2023-11-26 10:30:17,415 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 9100, loss[loss=0.05294, simple_loss=0.06881, pruned_loss=0.008421, audio_tagging_loss=0.01011, over 14456.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.09027, pruned_loss=0.01238, audio_tagging_loss=0.008588, over 3038240.13 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 10:30:18,471 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.744e+01 8.587e+01 9.359e+01 1.002e+02 1.217e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-26 10:30:33,224 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3347246.6666666665, ans=0.0 2023-11-26 10:30:37,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3347246.6666666665, ans=0.0 2023-11-26 10:30:41,769 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 502100 2023-11-26 10:30:59,543 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3347380.0, ans=0.2 2023-11-26 10:31:12,309 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 10:31:13,170 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 9150, loss[loss=0.0653, simple_loss=0.09618, pruned_loss=0.01117, audio_tagging_loss=0.00604, over 15499.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.09004, pruned_loss=0.0124, audio_tagging_loss=0.008645, over 3034890.15 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 10:31:18,717 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3347513.3333333335, ans=0.125 2023-11-26 10:31:21,797 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3347513.3333333335, ans=0.0 2023-11-26 10:31:34,457 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3347580.0, ans=0.125 2023-11-26 10:31:37,539 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 502150 2023-11-26 10:31:38,837 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3347646.6666666665, ans=0.125 2023-11-26 10:31:38,963 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.64 vs. limit=15.0 2023-11-26 10:32:10,074 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 9200, loss[loss=0.06197, simple_loss=0.07937, pruned_loss=0.01318, audio_tagging_loss=0.009106, over 14748.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.09003, pruned_loss=0.01224, audio_tagging_loss=0.0086, over 3044090.37 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:32:11,101 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.386e+01 8.621e+01 9.324e+01 1.045e+02 1.374e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-26 10:32:12,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3347846.6666666665, ans=0.125 2023-11-26 10:32:20,390 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3347913.3333333335, ans=0.2 2023-11-26 10:32:20,546 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3347913.3333333335, ans=0.125 2023-11-26 10:32:33,158 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 502200 2023-11-26 10:32:33,367 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3347980.0, ans=0.125 2023-11-26 10:33:00,488 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3348113.3333333335, ans=0.0 2023-11-26 10:33:06,575 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 9250, loss[loss=0.06596, simple_loss=0.09843, pruned_loss=0.009407, audio_tagging_loss=0.007339, over 14959.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.09085, pruned_loss=0.0123, audio_tagging_loss=0.008552, over 3048541.69 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:33:09,498 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.59 vs. limit=12.0 2023-11-26 10:33:15,386 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3348180.0, ans=0.125 2023-11-26 10:33:27,632 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.68 vs. limit=12.0 2023-11-26 10:33:30,100 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 502250 2023-11-26 10:33:42,169 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.21 vs. limit=15.0 2023-11-26 10:33:47,209 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3348380.0, ans=0.0 2023-11-26 10:33:48,264 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3348380.0, ans=0.025 2023-11-26 10:33:51,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3348446.6666666665, ans=0.125 2023-11-26 10:34:02,052 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 9300, loss[loss=0.08018, simple_loss=0.1169, pruned_loss=0.01424, audio_tagging_loss=0.007467, over 15498.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.09041, pruned_loss=0.01231, audio_tagging_loss=0.008553, over 3049802.42 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:34:03,056 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.207e+01 8.725e+01 9.420e+01 1.023e+02 1.550e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-26 10:34:12,331 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3348580.0, ans=0.125 2023-11-26 10:34:15,018 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3348580.0, ans=0.0 2023-11-26 10:34:26,043 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 502300 2023-11-26 10:34:26,621 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.97 vs. limit=22.5 2023-11-26 10:34:37,914 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3348713.3333333335, ans=0.125 2023-11-26 10:34:43,472 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.17 vs. limit=15.0 2023-11-26 10:34:53,395 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3348780.0, ans=0.1 2023-11-26 10:34:53,816 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.21 vs. limit=15.0 2023-11-26 10:34:54,355 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3348780.0, ans=0.0 2023-11-26 10:34:57,937 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 9350, loss[loss=0.03653, simple_loss=0.03769, pruned_loss=0.004535, audio_tagging_loss=0.01315, over 15063.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.09032, pruned_loss=0.01231, audio_tagging_loss=0.008678, over 3049430.51 frames. ], batch size: 60, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:35:04,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3348846.6666666665, ans=0.125 2023-11-26 10:35:17,295 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3348913.3333333335, ans=0.0 2023-11-26 10:35:21,329 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 502350 2023-11-26 10:35:54,258 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 9400, loss[loss=0.05582, simple_loss=0.06867, pruned_loss=0.008635, audio_tagging_loss=0.01285, over 15390.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.09041, pruned_loss=0.01248, audio_tagging_loss=0.008799, over 3045824.38 frames. ], batch size: 61, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:35:55,296 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.423e+01 8.779e+01 9.527e+01 1.025e+02 1.453e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-26 10:36:00,886 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3349180.0, ans=0.0 2023-11-26 10:36:07,327 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3349246.6666666665, ans=0.0 2023-11-26 10:36:17,348 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 502400 2023-11-26 10:36:21,545 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3349313.3333333335, ans=0.1 2023-11-26 10:36:28,703 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.85 vs. limit=22.5 2023-11-26 10:36:28,817 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.82 vs. limit=15.0 2023-11-26 10:36:41,945 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3349446.6666666665, ans=0.2 2023-11-26 10:36:50,163 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 9450, loss[loss=0.05547, simple_loss=0.0692, pruned_loss=0.009715, audio_tagging_loss=0.01116, over 15135.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.09017, pruned_loss=0.01244, audio_tagging_loss=0.008839, over 3046427.71 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:36:50,214 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 10:37:00,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3349580.0, ans=0.2 2023-11-26 10:37:06,468 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3349580.0, ans=0.0 2023-11-26 10:37:14,688 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 502450 2023-11-26 10:37:25,383 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3349713.3333333335, ans=0.04949747468305833 2023-11-26 10:37:28,673 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3349713.3333333335, ans=0.125 2023-11-26 10:37:33,926 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3349780.0, ans=0.125 2023-11-26 10:37:36,373 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.35 vs. limit=15.0 2023-11-26 10:37:46,024 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 9500, loss[loss=0.07594, simple_loss=0.1055, pruned_loss=0.01267, audio_tagging_loss=0.01053, over 14812.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.09021, pruned_loss=0.01245, audio_tagging_loss=0.008866, over 3053202.84 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:37:47,601 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.536e+01 8.787e+01 9.530e+01 1.013e+02 1.442e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-26 10:37:48,968 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3349846.6666666665, ans=0.125 2023-11-26 10:37:52,102 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3349846.6666666665, ans=0.0 2023-11-26 10:37:54,700 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3349846.6666666665, ans=0.2 2023-11-26 10:38:09,470 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 502500 2023-11-26 10:38:20,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3350046.6666666665, ans=0.125 2023-11-26 10:38:37,810 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3350113.3333333335, ans=0.0 2023-11-26 10:38:42,497 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 9550, loss[loss=0.07223, simple_loss=0.1038, pruned_loss=0.01106, audio_tagging_loss=0.009257, over 14286.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.09019, pruned_loss=0.01239, audio_tagging_loss=0.008839, over 3052765.68 frames. ], batch size: 53, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:38:47,099 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3350180.0, ans=0.125 2023-11-26 10:38:48,086 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3350180.0, ans=0.0 2023-11-26 10:39:05,363 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 502550 2023-11-26 10:39:14,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3350380.0, ans=0.2 2023-11-26 10:39:37,564 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 9600, loss[loss=0.06232, simple_loss=0.08401, pruned_loss=0.01127, audio_tagging_loss=0.009043, over 14795.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.09029, pruned_loss=0.0125, audio_tagging_loss=0.00893, over 3049434.49 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:39:38,620 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.737e+01 8.684e+01 9.310e+01 1.004e+02 1.298e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-26 10:39:39,194 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.47 vs. limit=15.0 2023-11-26 10:39:41,908 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3350513.3333333335, ans=0.125 2023-11-26 10:39:45,704 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3350513.3333333335, ans=0.0 2023-11-26 10:39:49,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3350580.0, ans=0.125 2023-11-26 10:39:55,927 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3350580.0, ans=0.125 2023-11-26 10:39:57,846 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3350580.0, ans=0.0 2023-11-26 10:39:57,866 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3350580.0, ans=0.1 2023-11-26 10:40:01,561 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 502600 2023-11-26 10:40:04,172 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3350646.6666666665, ans=0.125 2023-11-26 10:40:26,353 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.64 vs. limit=15.0 2023-11-26 10:40:30,851 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3350780.0, ans=0.125 2023-11-26 10:40:33,883 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 9650, loss[loss=0.07035, simple_loss=0.09185, pruned_loss=0.01615, audio_tagging_loss=0.008279, over 15913.00 frames. ], tot_loss[loss=0.06677, simple_loss=0.09051, pruned_loss=0.01256, audio_tagging_loss=0.00896, over 3051679.09 frames. ], batch size: 60, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:40:36,233 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3350846.6666666665, ans=0.0 2023-11-26 10:40:42,228 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3350846.6666666665, ans=0.1 2023-11-26 10:40:53,824 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3350913.3333333335, ans=0.95 2023-11-26 10:40:54,777 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3350913.3333333335, ans=0.125 2023-11-26 10:40:57,895 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 502650 2023-11-26 10:41:30,419 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 9700, loss[loss=0.08096, simple_loss=0.1173, pruned_loss=0.01506, audio_tagging_loss=0.007264, over 16535.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.09056, pruned_loss=0.0125, audio_tagging_loss=0.008776, over 3052382.18 frames. ], batch size: 59, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:41:31,448 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.274e+01 8.780e+01 9.294e+01 1.006e+02 1.332e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-26 10:41:36,512 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3351180.0, ans=0.125 2023-11-26 10:41:42,208 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.43 vs. limit=15.0 2023-11-26 10:41:53,991 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 502700 2023-11-26 10:42:06,419 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3351380.0, ans=0.125 2023-11-26 10:42:08,587 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3351380.0, ans=0.0 2023-11-26 10:42:18,804 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3351446.6666666665, ans=0.1 2023-11-26 10:42:26,556 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 9750, loss[loss=0.06694, simple_loss=0.0895, pruned_loss=0.01448, audio_tagging_loss=0.007713, over 15492.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.09052, pruned_loss=0.01246, audio_tagging_loss=0.008685, over 3052194.27 frames. ], batch size: 59, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:42:28,850 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3351513.3333333335, ans=0.125 2023-11-26 10:42:28,951 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3351513.3333333335, ans=0.125 2023-11-26 10:42:36,777 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3351580.0, ans=0.2 2023-11-26 10:42:37,800 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3351580.0, ans=0.0 2023-11-26 10:42:49,943 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 502750 2023-11-26 10:43:15,640 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3351780.0, ans=0.025 2023-11-26 10:43:21,295 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3351846.6666666665, ans=0.015 2023-11-26 10:43:22,286 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 9800, loss[loss=0.04089, simple_loss=0.05367, pruned_loss=0.005439, audio_tagging_loss=0.008613, over 15624.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.09096, pruned_loss=0.01241, audio_tagging_loss=0.008528, over 3053868.22 frames. ], batch size: 59, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:43:23,299 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.375e+01 9.016e+01 9.407e+01 1.014e+02 1.286e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-26 10:43:26,887 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.91 vs. limit=22.5 2023-11-26 10:43:41,488 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3351913.3333333335, ans=0.125 2023-11-26 10:43:43,525 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 10:43:45,562 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 502800 2023-11-26 10:43:52,970 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3351980.0, ans=0.0 2023-11-26 10:43:54,187 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.35 vs. limit=22.5 2023-11-26 10:44:06,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3352113.3333333335, ans=0.1 2023-11-26 10:44:14,072 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 10:44:15,464 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3352113.3333333335, ans=0.0 2023-11-26 10:44:18,870 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 9850, loss[loss=0.05219, simple_loss=0.07566, pruned_loss=0.007726, audio_tagging_loss=0.006633, over 13564.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.09045, pruned_loss=0.0123, audio_tagging_loss=0.008602, over 3058324.06 frames. ], batch size: 52, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:44:24,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3352180.0, ans=0.0 2023-11-26 10:44:28,678 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 10:44:41,060 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3352313.3333333335, ans=0.2 2023-11-26 10:44:41,938 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 502850 2023-11-26 10:44:48,888 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3352313.3333333335, ans=0.1 2023-11-26 10:44:52,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3352380.0, ans=10.0 2023-11-26 10:44:55,841 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3352380.0, ans=0.0 2023-11-26 10:45:09,136 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3352446.6666666665, ans=0.1 2023-11-26 10:45:12,534 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.13 vs. limit=10.0 2023-11-26 10:45:13,855 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.19 vs. limit=22.5 2023-11-26 10:45:14,243 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 9900, loss[loss=0.05573, simple_loss=0.08584, pruned_loss=0.005705, audio_tagging_loss=0.007103, over 15146.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.09002, pruned_loss=0.0123, audio_tagging_loss=0.008596, over 3056356.31 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:45:16,978 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.238e+01 8.825e+01 9.361e+01 1.007e+02 1.352e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-26 10:45:24,520 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3352580.0, ans=0.125 2023-11-26 10:45:26,869 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.60 vs. limit=15.0 2023-11-26 10:45:36,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3352646.6666666665, ans=0.1 2023-11-26 10:45:38,018 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.59 vs. limit=15.0 2023-11-26 10:45:38,573 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 502900 2023-11-26 10:45:47,861 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3352713.3333333335, ans=0.125 2023-11-26 10:45:52,124 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3352713.3333333335, ans=0.125 2023-11-26 10:46:01,143 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3352780.0, ans=0.2 2023-11-26 10:46:11,023 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 9950, loss[loss=0.05292, simple_loss=0.07128, pruned_loss=0.008964, audio_tagging_loss=0.008322, over 13875.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.08989, pruned_loss=0.01239, audio_tagging_loss=0.008691, over 3058373.96 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:46:15,872 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=3352846.6666666665, ans=10.0 2023-11-26 10:46:18,261 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3352846.6666666665, ans=0.125 2023-11-26 10:46:18,926 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.87 vs. limit=6.0 2023-11-26 10:46:34,415 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 502950 2023-11-26 10:46:38,791 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3352980.0, ans=0.0 2023-11-26 10:47:05,093 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3353113.3333333335, ans=0.2 2023-11-26 10:47:06,955 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 10000, loss[loss=0.06309, simple_loss=0.09168, pruned_loss=0.008936, audio_tagging_loss=0.008312, over 15297.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08966, pruned_loss=0.01236, audio_tagging_loss=0.008674, over 3054960.73 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:47:09,649 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.184e+01 8.822e+01 9.378e+01 1.009e+02 1.316e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-26 10:47:15,222 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 10:47:18,416 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3353246.6666666665, ans=0.125 2023-11-26 10:47:30,545 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 503000 2023-11-26 10:47:44,895 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3353380.0, ans=0.0 2023-11-26 10:48:03,175 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 10050, loss[loss=0.08947, simple_loss=0.1284, pruned_loss=0.02028, audio_tagging_loss=0.004978, over 15004.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.0898, pruned_loss=0.0124, audio_tagging_loss=0.008682, over 3052805.86 frames. ], batch size: 54, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:48:04,529 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3353513.3333333335, ans=0.04949747468305833 2023-11-26 10:48:26,998 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 503050 2023-11-26 10:48:54,367 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3353780.0, ans=0.125 2023-11-26 10:48:54,410 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3353780.0, ans=0.0 2023-11-26 10:48:59,505 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 10100, loss[loss=0.05072, simple_loss=0.06444, pruned_loss=0.006515, audio_tagging_loss=0.01199, over 15065.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.08995, pruned_loss=0.01242, audio_tagging_loss=0.008763, over 3057586.55 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:49:00,826 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3353846.6666666665, ans=0.0 2023-11-26 10:49:02,638 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.273e+01 8.507e+01 9.128e+01 1.020e+02 1.362e+02, threshold=1.826e+02, percent-clipped=0.0 2023-11-26 10:49:09,902 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3353913.3333333335, ans=0.0 2023-11-26 10:49:20,158 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.79 vs. limit=10.0 2023-11-26 10:49:22,973 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 503100 2023-11-26 10:49:25,374 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3353980.0, ans=0.125 2023-11-26 10:49:28,795 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.59 vs. limit=15.0 2023-11-26 10:49:30,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=3353980.0, ans=15.0 2023-11-26 10:49:39,034 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3354046.6666666665, ans=0.125 2023-11-26 10:49:44,331 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3354113.3333333335, ans=0.0 2023-11-26 10:49:45,233 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 10:49:52,486 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3354113.3333333335, ans=0.125 2023-11-26 10:49:53,409 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3354113.3333333335, ans=0.0 2023-11-26 10:49:55,342 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 10150, loss[loss=0.05618, simple_loss=0.07914, pruned_loss=0.009355, audio_tagging_loss=0.007258, over 15505.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08943, pruned_loss=0.01209, audio_tagging_loss=0.008728, over 3059507.73 frames. ], batch size: 59, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:50:00,590 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.47 vs. limit=12.0 2023-11-26 10:50:04,489 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3354180.0, ans=0.2 2023-11-26 10:50:18,272 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 503150 2023-11-26 10:50:22,584 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 10:50:37,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3354380.0, ans=0.125 2023-11-26 10:50:50,984 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 10200, loss[loss=0.04813, simple_loss=0.05758, pruned_loss=0.00691, audio_tagging_loss=0.01243, over 15189.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08881, pruned_loss=0.01201, audio_tagging_loss=0.008883, over 3054707.54 frames. ], batch size: 60, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:50:54,052 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.380e+01 8.608e+01 9.223e+01 1.008e+02 1.287e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-26 10:51:13,282 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 10:51:14,386 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 503200 2023-11-26 10:51:37,996 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3354780.0, ans=0.125 2023-11-26 10:51:43,361 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3354780.0, ans=0.125 2023-11-26 10:51:46,352 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 10250, loss[loss=0.06454, simple_loss=0.09215, pruned_loss=0.0105, audio_tagging_loss=0.007959, over 14403.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08942, pruned_loss=0.01209, audio_tagging_loss=0.008972, over 3057331.25 frames. ], batch size: 54, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:51:55,091 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3354846.6666666665, ans=0.125 2023-11-26 10:51:57,270 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3354913.3333333335, ans=0.2 2023-11-26 10:52:05,415 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3354913.3333333335, ans=0.5 2023-11-26 10:52:10,950 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 503250 2023-11-26 10:52:20,654 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3355046.6666666665, ans=0.125 2023-11-26 10:52:23,059 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=5.70 vs. limit=15.0 2023-11-26 10:52:23,893 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3355046.6666666665, ans=0.0 2023-11-26 10:52:33,881 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3355113.3333333335, ans=0.125 2023-11-26 10:52:37,983 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.11 vs. limit=22.5 2023-11-26 10:52:43,315 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 10300, loss[loss=0.06372, simple_loss=0.08191, pruned_loss=0.01216, audio_tagging_loss=0.01061, over 15375.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.08973, pruned_loss=0.01221, audio_tagging_loss=0.008977, over 3057502.95 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:52:46,394 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.459e+01 8.732e+01 9.378e+01 1.015e+02 1.295e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-26 10:53:01,144 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3355246.6666666665, ans=0.125 2023-11-26 10:53:06,351 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 503300 2023-11-26 10:53:32,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3355446.6666666665, ans=0.1 2023-11-26 10:53:39,419 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 10350, loss[loss=0.06685, simple_loss=0.08882, pruned_loss=0.0125, audio_tagging_loss=0.009936, over 14414.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.09047, pruned_loss=0.01235, audio_tagging_loss=0.008997, over 3058758.19 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:53:41,578 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3355513.3333333335, ans=0.015 2023-11-26 10:53:42,704 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3355513.3333333335, ans=0.125 2023-11-26 10:53:50,552 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.39 vs. limit=15.0 2023-11-26 10:54:02,157 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 503350 2023-11-26 10:54:15,690 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3355713.3333333335, ans=0.0 2023-11-26 10:54:25,533 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.01 vs. limit=15.0 2023-11-26 10:54:29,487 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff3.min_abs, batch_count=3355780.0, ans=0.2 2023-11-26 10:54:29,837 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.55 vs. limit=12.0 2023-11-26 10:54:34,670 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 10400, loss[loss=0.07572, simple_loss=0.1147, pruned_loss=0.01034, audio_tagging_loss=0.008036, over 13660.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.09007, pruned_loss=0.01222, audio_tagging_loss=0.009071, over 3055206.51 frames. ], batch size: 53, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:54:37,763 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.157e+01 8.834e+01 9.411e+01 9.985e+01 1.468e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-26 10:54:42,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3355846.6666666665, ans=0.0 2023-11-26 10:54:43,207 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.48 vs. limit=15.0 2023-11-26 10:54:58,698 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 503400 2023-11-26 10:55:08,081 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.35 vs. limit=15.0 2023-11-26 10:55:27,356 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3356113.3333333335, ans=0.125 2023-11-26 10:55:30,879 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 10450, loss[loss=0.06637, simple_loss=0.09276, pruned_loss=0.01, audio_tagging_loss=0.009985, over 15141.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.09014, pruned_loss=0.01229, audio_tagging_loss=0.008986, over 3053639.92 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:55:34,890 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 10:55:39,863 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.47 vs. limit=15.0 2023-11-26 10:55:54,527 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 503450 2023-11-26 10:55:56,847 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3356313.3333333335, ans=0.0 2023-11-26 10:55:58,079 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.52 vs. limit=15.0 2023-11-26 10:56:05,287 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.51 vs. limit=10.0 2023-11-26 10:56:11,175 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 10:56:13,984 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.55 vs. limit=15.0 2023-11-26 10:56:16,439 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3356446.6666666665, ans=0.125 2023-11-26 10:56:27,222 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 10500, loss[loss=0.05118, simple_loss=0.07121, pruned_loss=0.00758, audio_tagging_loss=0.007993, over 14624.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08929, pruned_loss=0.01226, audio_tagging_loss=0.008876, over 3050155.13 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:56:28,548 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3356513.3333333335, ans=0.125 2023-11-26 10:56:30,345 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.498e+01 8.592e+01 9.300e+01 9.951e+01 1.449e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-26 10:56:39,615 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.56 vs. limit=15.0 2023-11-26 10:56:44,917 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.23 vs. limit=15.0 2023-11-26 10:56:48,294 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3356646.6666666665, ans=0.0 2023-11-26 10:56:50,275 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 503500 2023-11-26 10:57:07,892 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3356713.3333333335, ans=0.0 2023-11-26 10:57:14,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3356780.0, ans=0.125 2023-11-26 10:57:22,553 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 10550, loss[loss=0.08011, simple_loss=0.105, pruned_loss=0.017, audio_tagging_loss=0.01063, over 15991.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.08939, pruned_loss=0.01223, audio_tagging_loss=0.008811, over 3050085.24 frames. ], batch size: 61, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 10:57:24,908 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3356846.6666666665, ans=0.2 2023-11-26 10:57:31,833 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 10:57:36,043 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3356913.3333333335, ans=0.2 2023-11-26 10:57:47,197 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 503550 2023-11-26 10:57:47,411 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3356980.0, ans=0.2 2023-11-26 10:57:51,656 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3356980.0, ans=0.0 2023-11-26 10:57:53,777 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3356980.0, ans=0.125 2023-11-26 10:58:03,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3357046.6666666665, ans=0.0 2023-11-26 10:58:14,453 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3357113.3333333335, ans=0.04949747468305833 2023-11-26 10:58:16,954 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.54 vs. limit=15.0 2023-11-26 10:58:18,546 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 10600, loss[loss=0.07376, simple_loss=0.1045, pruned_loss=0.01212, audio_tagging_loss=0.009397, over 15417.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.09006, pruned_loss=0.01239, audio_tagging_loss=0.008695, over 3048754.81 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:58:23,334 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.764e+01 8.911e+01 9.725e+01 1.038e+02 1.409e+02, threshold=1.945e+02, percent-clipped=0.0 2023-11-26 10:58:26,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3357180.0, ans=0.125 2023-11-26 10:58:28,644 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.34 vs. limit=22.5 2023-11-26 10:58:32,060 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3357246.6666666665, ans=0.125 2023-11-26 10:58:33,166 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3357246.6666666665, ans=0.125 2023-11-26 10:58:33,197 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3357246.6666666665, ans=0.2 2023-11-26 10:58:38,592 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3357246.6666666665, ans=0.125 2023-11-26 10:58:42,570 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 503600 2023-11-26 10:58:57,250 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3357380.0, ans=0.1 2023-11-26 10:59:15,783 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 10650, loss[loss=0.06094, simple_loss=0.09057, pruned_loss=0.007297, audio_tagging_loss=0.008355, over 14620.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08964, pruned_loss=0.01227, audio_tagging_loss=0.008593, over 3054865.11 frames. ], batch size: 54, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 10:59:21,344 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3357513.3333333335, ans=0.125 2023-11-26 10:59:23,517 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3357513.3333333335, ans=0.07 2023-11-26 10:59:38,860 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 503650 2023-11-26 10:59:43,550 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.51 vs. limit=15.0 2023-11-26 10:59:44,102 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3357646.6666666665, ans=0.125 2023-11-26 10:59:46,326 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.77 vs. limit=6.0 2023-11-26 10:59:52,170 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3357713.3333333335, ans=0.1 2023-11-26 11:00:01,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3357780.0, ans=0.125 2023-11-26 11:00:06,588 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3357780.0, ans=0.0 2023-11-26 11:00:10,590 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 10700, loss[loss=0.07028, simple_loss=0.103, pruned_loss=0.01114, audio_tagging_loss=0.007654, over 15515.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.09025, pruned_loss=0.01231, audio_tagging_loss=0.008608, over 3044257.05 frames. ], batch size: 59, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 11:00:14,917 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.195e+01 8.978e+01 9.499e+01 1.034e+02 2.026e+02, threshold=1.900e+02, percent-clipped=1.0 2023-11-26 11:00:34,235 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 503700 2023-11-26 11:00:46,120 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3358046.6666666665, ans=0.125 2023-11-26 11:00:47,006 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3358046.6666666665, ans=0.035 2023-11-26 11:00:55,561 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3358113.3333333335, ans=0.125 2023-11-26 11:01:06,433 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 10750, loss[loss=0.06808, simple_loss=0.1002, pruned_loss=0.009779, audio_tagging_loss=0.008222, over 14895.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.09068, pruned_loss=0.01236, audio_tagging_loss=0.008593, over 3053580.40 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 11:01:12,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3358180.0, ans=0.125 2023-11-26 11:01:17,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3358246.6666666665, ans=0.0 2023-11-26 11:01:30,443 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 503750 2023-11-26 11:01:57,166 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3358446.6666666665, ans=0.125 2023-11-26 11:02:02,705 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 10800, loss[loss=0.07352, simple_loss=0.1068, pruned_loss=0.01242, audio_tagging_loss=0.007724, over 15767.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.09103, pruned_loss=0.01248, audio_tagging_loss=0.008581, over 3054749.29 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 11:02:02,904 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3358513.3333333335, ans=0.0 2023-11-26 11:02:06,945 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.46 vs. limit=15.0 2023-11-26 11:02:07,438 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.700e+01 8.688e+01 9.211e+01 9.904e+01 2.001e+02, threshold=1.842e+02, percent-clipped=1.0 2023-11-26 11:02:08,740 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3358513.3333333335, ans=0.0 2023-11-26 11:02:11,931 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3358513.3333333335, ans=0.125 2023-11-26 11:02:25,506 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 503800 2023-11-26 11:02:36,550 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3358713.3333333335, ans=0.2 2023-11-26 11:02:38,648 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3358713.3333333335, ans=0.125 2023-11-26 11:02:48,513 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3358780.0, ans=0.0 2023-11-26 11:02:50,365 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3358780.0, ans=0.07 2023-11-26 11:02:58,737 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 10850, loss[loss=0.05948, simple_loss=0.07981, pruned_loss=0.009049, audio_tagging_loss=0.01053, over 15423.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.09017, pruned_loss=0.0124, audio_tagging_loss=0.008726, over 3055855.81 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 11:03:13,410 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3358913.3333333335, ans=0.125 2023-11-26 11:03:14,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3358913.3333333335, ans=0.125 2023-11-26 11:03:22,345 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 503850 2023-11-26 11:03:50,063 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3359113.3333333335, ans=0.125 2023-11-26 11:03:51,965 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 11:03:54,700 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 10900, loss[loss=0.07879, simple_loss=0.1054, pruned_loss=0.01825, audio_tagging_loss=0.007831, over 14720.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.08994, pruned_loss=0.01236, audio_tagging_loss=0.008745, over 3050936.36 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 11:03:58,887 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.407e+01 8.941e+01 9.584e+01 1.034e+02 1.250e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-26 11:04:04,435 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.80 vs. limit=22.5 2023-11-26 11:04:06,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3359246.6666666665, ans=0.125 2023-11-26 11:04:18,863 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 503900 2023-11-26 11:04:50,466 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 10950, loss[loss=0.04626, simple_loss=0.05444, pruned_loss=0.00894, audio_tagging_loss=0.01011, over 15182.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.09008, pruned_loss=0.01231, audio_tagging_loss=0.008767, over 3043835.47 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 32.0 2023-11-26 11:04:59,598 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3359513.3333333335, ans=0.0 2023-11-26 11:05:06,721 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3359580.0, ans=0.0 2023-11-26 11:05:06,740 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3359580.0, ans=0.125 2023-11-26 11:05:13,783 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 503950 2023-11-26 11:05:41,463 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.18 vs. limit=15.0 2023-11-26 11:05:42,219 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3359780.0, ans=0.5 2023-11-26 11:05:46,840 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 11000, loss[loss=0.0694, simple_loss=0.1002, pruned_loss=0.01243, audio_tagging_loss=0.00687, over 15328.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.08981, pruned_loss=0.01229, audio_tagging_loss=0.008791, over 3045879.30 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 11:05:52,182 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.199e+01 8.584e+01 9.485e+01 1.002e+02 1.136e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-26 11:05:52,374 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3359846.6666666665, ans=0.0 2023-11-26 11:05:52,462 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3359846.6666666665, ans=0.125 2023-11-26 11:05:56,472 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 11:06:05,609 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3359913.3333333335, ans=0.0 2023-11-26 11:06:10,143 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 504000 2023-11-26 11:06:10,667 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.27 vs. limit=22.5 2023-11-26 11:06:16,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3359980.0, ans=0.1 2023-11-26 11:06:36,606 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.77 vs. limit=6.0 2023-11-26 11:06:44,392 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 11050, loss[loss=0.07234, simple_loss=0.0963, pruned_loss=0.01462, audio_tagging_loss=0.009562, over 15353.00 frames. ], tot_loss[loss=0.06667, simple_loss=0.0907, pruned_loss=0.01237, audio_tagging_loss=0.008944, over 3053369.69 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 11:06:46,187 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3360180.0, ans=0.0 2023-11-26 11:06:53,157 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.43 vs. limit=15.0 2023-11-26 11:06:59,841 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.94 vs. limit=10.0 2023-11-26 11:07:02,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3360246.6666666665, ans=0.5 2023-11-26 11:07:03,806 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.27 vs. limit=22.5 2023-11-26 11:07:08,397 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 504050 2023-11-26 11:07:21,785 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3360380.0, ans=0.125 2023-11-26 11:07:40,684 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 11100, loss[loss=0.1045, simple_loss=0.1435, pruned_loss=0.0238, audio_tagging_loss=0.008908, over 16227.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.08977, pruned_loss=0.01228, audio_tagging_loss=0.00903, over 3061415.60 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 11:07:46,437 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.379e+01 8.699e+01 9.322e+01 9.971e+01 1.375e+02, threshold=1.864e+02, percent-clipped=0.0 2023-11-26 11:08:04,219 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 504100 2023-11-26 11:08:04,305 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3360646.6666666665, ans=0.125 2023-11-26 11:08:05,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3360646.6666666665, ans=0.1 2023-11-26 11:08:18,592 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3360713.3333333335, ans=0.125 2023-11-26 11:08:25,735 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3360780.0, ans=0.125 2023-11-26 11:08:36,170 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.02 vs. limit=15.0 2023-11-26 11:08:36,570 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 11150, loss[loss=0.06633, simple_loss=0.09152, pruned_loss=0.01276, audio_tagging_loss=0.007812, over 14512.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.0893, pruned_loss=0.01225, audio_tagging_loss=0.009204, over 3053222.77 frames. ], batch size: 54, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 11:08:47,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3360913.3333333335, ans=0.0 2023-11-26 11:08:53,944 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3360913.3333333335, ans=0.0 2023-11-26 11:09:00,116 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 504150 2023-11-26 11:09:04,133 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3360980.0, ans=10.0 2023-11-26 11:09:32,154 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.85 vs. limit=15.0 2023-11-26 11:09:32,599 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 11200, loss[loss=0.06102, simple_loss=0.07819, pruned_loss=0.01465, audio_tagging_loss=0.007281, over 14309.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.08886, pruned_loss=0.01224, audio_tagging_loss=0.009276, over 3055340.83 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 11:09:39,459 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.816e+01 8.765e+01 9.322e+01 9.953e+01 1.270e+02, threshold=1.864e+02, percent-clipped=0.0 2023-11-26 11:09:49,078 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.40 vs. limit=10.0 2023-11-26 11:09:50,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3361246.6666666665, ans=0.125 2023-11-26 11:09:52,548 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3361246.6666666665, ans=0.1 2023-11-26 11:09:56,426 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 504200 2023-11-26 11:10:28,731 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 11250, loss[loss=0.07738, simple_loss=0.09914, pruned_loss=0.01577, audio_tagging_loss=0.01203, over 15868.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08832, pruned_loss=0.01215, audio_tagging_loss=0.009294, over 3052982.64 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 11:10:40,168 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3361580.0, ans=0.125 2023-11-26 11:10:46,649 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3361580.0, ans=0.125 2023-11-26 11:10:48,997 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.55 vs. limit=15.0 2023-11-26 11:10:51,749 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 504250 2023-11-26 11:10:57,685 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 11:11:03,813 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.22 vs. limit=22.5 2023-11-26 11:11:10,271 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3361713.3333333335, ans=0.0 2023-11-26 11:11:24,513 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 11300, loss[loss=0.06809, simple_loss=0.1015, pruned_loss=0.0101, audio_tagging_loss=0.007256, over 15258.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08884, pruned_loss=0.0122, audio_tagging_loss=0.009043, over 3048336.90 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 11:11:30,896 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.077e+01 8.741e+01 9.355e+01 1.007e+02 1.157e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-26 11:11:48,059 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 504300 2023-11-26 11:12:01,871 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3362046.6666666665, ans=0.125 2023-11-26 11:12:06,858 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.43 vs. limit=15.0 2023-11-26 11:12:19,982 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 11350, loss[loss=0.07337, simple_loss=0.1051, pruned_loss=0.01236, audio_tagging_loss=0.008453, over 16058.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08871, pruned_loss=0.01219, audio_tagging_loss=0.008867, over 3046734.61 frames. ], batch size: 60, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 11:12:25,320 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.70 vs. limit=15.0 2023-11-26 11:12:44,470 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 504350 2023-11-26 11:12:54,039 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3362380.0, ans=0.0 2023-11-26 11:12:56,311 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3362380.0, ans=0.0 2023-11-26 11:13:15,817 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 11400, loss[loss=0.05075, simple_loss=0.0662, pruned_loss=0.01112, audio_tagging_loss=0.006526, over 16490.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08837, pruned_loss=0.01204, audio_tagging_loss=0.008834, over 3052282.23 frames. ], batch size: 67, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 11:13:23,572 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.034e+01 8.698e+01 9.567e+01 1.043e+02 1.467e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-26 11:13:39,019 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 504400 2023-11-26 11:13:46,009 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.62 vs. limit=15.0 2023-11-26 11:13:54,340 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3362713.3333333335, ans=0.125 2023-11-26 11:14:05,666 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3362780.0, ans=0.125 2023-11-26 11:14:12,481 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 11450, loss[loss=0.05372, simple_loss=0.07577, pruned_loss=0.008014, audio_tagging_loss=0.007823, over 15681.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.0887, pruned_loss=0.01221, audio_tagging_loss=0.008864, over 3052092.84 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 11:14:20,299 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.38 vs. limit=15.0 2023-11-26 11:14:34,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3362980.0, ans=0.0 2023-11-26 11:14:34,501 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 11:14:35,879 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 504450 2023-11-26 11:14:37,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3362980.0, ans=0.1 2023-11-26 11:14:39,273 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3362980.0, ans=0.125 2023-11-26 11:14:53,017 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3363046.6666666665, ans=0.2 2023-11-26 11:14:57,274 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3363113.3333333335, ans=0.125 2023-11-26 11:15:07,648 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 11500, loss[loss=0.05143, simple_loss=0.07688, pruned_loss=0.004893, audio_tagging_loss=0.008098, over 16038.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08915, pruned_loss=0.01228, audio_tagging_loss=0.008826, over 3046562.88 frames. ], batch size: 63, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 11:15:15,052 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.382e+01 8.706e+01 9.345e+01 1.007e+02 1.360e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-26 11:15:16,851 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3363180.0, ans=0.125 2023-11-26 11:15:23,035 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.61 vs. limit=22.5 2023-11-26 11:15:31,726 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 504500 2023-11-26 11:16:03,942 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 11550, loss[loss=0.05571, simple_loss=0.0759, pruned_loss=0.007849, audio_tagging_loss=0.009915, over 15221.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.08991, pruned_loss=0.01237, audio_tagging_loss=0.008747, over 3052588.75 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 11:16:10,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3363513.3333333335, ans=0.125 2023-11-26 11:16:13,608 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.72 vs. limit=15.0 2023-11-26 11:16:23,335 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3363580.0, ans=0.125 2023-11-26 11:16:27,559 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 504550 2023-11-26 11:16:34,020 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3363646.6666666665, ans=0.125 2023-11-26 11:16:36,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3363713.3333333335, ans=0.125 2023-11-26 11:16:38,071 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 11:17:00,350 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 11600, loss[loss=0.05887, simple_loss=0.08501, pruned_loss=0.009814, audio_tagging_loss=0.006546, over 14906.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.09058, pruned_loss=0.01243, audio_tagging_loss=0.008756, over 3057754.53 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 11:17:05,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3363846.6666666665, ans=0.125 2023-11-26 11:17:08,666 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.954e+01 8.748e+01 9.551e+01 1.048e+02 1.358e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-26 11:17:12,184 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3363913.3333333335, ans=0.0 2023-11-26 11:17:22,568 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 504600 2023-11-26 11:17:30,981 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3363980.0, ans=0.2 2023-11-26 11:17:42,700 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3364046.6666666665, ans=0.1 2023-11-26 11:17:45,869 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3364113.3333333335, ans=0.1 2023-11-26 11:17:46,944 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3364113.3333333335, ans=0.125 2023-11-26 11:17:54,526 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3364180.0, ans=0.2 2023-11-26 11:17:55,415 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 11650, loss[loss=0.07111, simple_loss=0.09624, pruned_loss=0.01386, audio_tagging_loss=0.009127, over 14734.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09094, pruned_loss=0.0126, audio_tagging_loss=0.008787, over 3052913.40 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 11:18:01,130 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3364180.0, ans=0.1 2023-11-26 11:18:04,153 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3364180.0, ans=0.0 2023-11-26 11:18:19,670 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 504650 2023-11-26 11:18:21,927 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3364313.3333333335, ans=0.0 2023-11-26 11:18:27,753 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3364313.3333333335, ans=0.0 2023-11-26 11:18:36,451 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3364380.0, ans=0.125 2023-11-26 11:18:36,626 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.15 vs. limit=15.0 2023-11-26 11:18:51,434 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 11700, loss[loss=0.06986, simple_loss=0.08463, pruned_loss=0.01556, audio_tagging_loss=0.01199, over 15993.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.08985, pruned_loss=0.01238, audio_tagging_loss=0.008789, over 3046656.58 frames. ], batch size: 60, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 11:18:51,716 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3364513.3333333335, ans=0.1 2023-11-26 11:19:00,939 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.138e+01 8.724e+01 9.353e+01 9.996e+01 1.834e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-26 11:19:09,487 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.58 vs. limit=15.0 2023-11-26 11:19:12,201 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3364580.0, ans=0.1 2023-11-26 11:19:15,247 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 504700 2023-11-26 11:19:29,051 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3364713.3333333335, ans=0.2 2023-11-26 11:19:47,709 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 11750, loss[loss=0.09528, simple_loss=0.1313, pruned_loss=0.02233, audio_tagging_loss=0.007329, over 14387.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.08955, pruned_loss=0.01245, audio_tagging_loss=0.008848, over 3039139.62 frames. ], batch size: 52, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 11:19:47,939 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=3364846.6666666665, ans=0.02 2023-11-26 11:20:10,608 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 504750 2023-11-26 11:20:25,730 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.74 vs. limit=15.0 2023-11-26 11:20:34,981 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3365113.3333333335, ans=0.0 2023-11-26 11:20:40,314 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3365113.3333333335, ans=0.125 2023-11-26 11:20:43,312 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 11800, loss[loss=0.07698, simple_loss=0.1, pruned_loss=0.01651, audio_tagging_loss=0.01048, over 14215.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08911, pruned_loss=0.01233, audio_tagging_loss=0.008865, over 3037967.48 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 11:20:51,616 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.646e+01 8.955e+01 9.712e+01 1.042e+02 1.352e+02, threshold=1.942e+02, percent-clipped=0.0 2023-11-26 11:20:59,306 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.08 vs. limit=12.0 2023-11-26 11:21:01,970 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3365246.6666666665, ans=0.025 2023-11-26 11:21:06,730 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 504800 2023-11-26 11:21:20,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3365380.0, ans=0.1 2023-11-26 11:21:23,440 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2023-11-26 11:21:39,228 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 11850, loss[loss=0.06009, simple_loss=0.0755, pruned_loss=0.01367, audio_tagging_loss=0.008671, over 15309.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08885, pruned_loss=0.01222, audio_tagging_loss=0.008888, over 3051625.54 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 11:21:41,011 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.95 vs. limit=22.5 2023-11-26 11:22:02,241 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3365646.6666666665, ans=0.2 2023-11-26 11:22:03,161 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 504850 2023-11-26 11:22:03,278 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3365646.6666666665, ans=0.0 2023-11-26 11:22:05,661 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.25 vs. limit=15.0 2023-11-26 11:22:17,384 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.70 vs. limit=15.0 2023-11-26 11:22:31,587 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3365780.0, ans=0.1 2023-11-26 11:22:34,567 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 11900, loss[loss=0.06131, simple_loss=0.07474, pruned_loss=0.01495, audio_tagging_loss=0.008992, over 15188.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08795, pruned_loss=0.0121, audio_tagging_loss=0.00899, over 3048305.89 frames. ], batch size: 60, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 11:22:40,566 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.94 vs. limit=15.0 2023-11-26 11:22:44,164 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.301e+01 8.882e+01 9.383e+01 1.003e+02 1.296e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-26 11:22:49,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3365913.3333333335, ans=0.125 2023-11-26 11:22:58,070 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 504900 2023-11-26 11:23:09,342 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3366046.6666666665, ans=0.0 2023-11-26 11:23:14,269 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3366046.6666666665, ans=0.0 2023-11-26 11:23:20,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3366113.3333333335, ans=10.0 2023-11-26 11:23:25,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3366113.3333333335, ans=0.2 2023-11-26 11:23:30,284 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 11950, loss[loss=0.06784, simple_loss=0.09622, pruned_loss=0.0118, audio_tagging_loss=0.007929, over 16176.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08881, pruned_loss=0.01226, audio_tagging_loss=0.009084, over 3043054.95 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 8.0 2023-11-26 11:23:53,606 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 504950 2023-11-26 11:23:58,079 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3366313.3333333335, ans=0.125 2023-11-26 11:24:00,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3366313.3333333335, ans=0.125 2023-11-26 11:24:03,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3366380.0, ans=0.2 2023-11-26 11:24:05,128 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3366380.0, ans=0.07 2023-11-26 11:24:13,387 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3366446.6666666665, ans=0.125 2023-11-26 11:24:17,556 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3366446.6666666665, ans=0.125 2023-11-26 11:24:24,562 INFO [train_asr.py:1235] (3/4) Epoch 42, batch 12000, loss[loss=0.04851, simple_loss=0.06172, pruned_loss=0.004543, audio_tagging_loss=0.0131, over 14990.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08856, pruned_loss=0.01226, audio_tagging_loss=0.009208, over 3044203.38 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-26 11:24:24,563 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-26 11:24:37,073 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.3212, 5.0729, 4.6553, 4.8584], device='cuda:3') 2023-11-26 11:24:41,668 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.3746, 3.0477, 3.2054, 3.0100, 3.6889, 3.7527, 3.2787, 3.2418], device='cuda:3') 2023-11-26 11:24:50,939 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.4571, 3.7796, 4.3892, 3.4956], device='cuda:3') 2023-11-26 11:24:57,252 INFO [train_asr.py:1267] (3/4) Epoch 42, validation: loss=0.05796, simple_loss=0.05063, pruned_loss=0.005274, audio_tagging_loss=0.02738, over 4681554.00 frames. 2023-11-26 11:24:57,252 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-26 11:24:59,382 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3366513.3333333335, ans=0.0 2023-11-26 11:25:03,780 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.20 vs. limit=15.0 2023-11-26 11:25:05,475 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.022e+01 8.933e+01 9.493e+01 1.025e+02 1.345e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-26 11:25:06,759 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 11:25:19,327 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 505000 2023-11-26 11:25:50,630 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 0, loss[loss=0.08723, simple_loss=0.1035, pruned_loss=0.017, audio_tagging_loss=0.01847, over 16043.00 frames. ], tot_loss[loss=0.08723, simple_loss=0.1035, pruned_loss=0.017, audio_tagging_loss=0.01847, over 16043.00 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 32.0 2023-11-26 11:25:50,631 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-26 11:26:02,830 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([6.4836, 6.2045, 6.0206, 6.0255], device='cuda:3') 2023-11-26 11:26:21,924 INFO [train_asr.py:1267] (3/4) Epoch 43, validation: loss=0.05779, simple_loss=0.05063, pruned_loss=0.005275, audio_tagging_loss=0.0272, over 4681554.00 frames. 2023-11-26 11:26:21,924 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-26 11:26:23,711 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.18 vs. limit=15.0 2023-11-26 11:26:51,256 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3366806.6666666665, ans=0.0 2023-11-26 11:27:14,022 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 505050 2023-11-26 11:27:17,144 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 50, loss[loss=0.07325, simple_loss=0.0862, pruned_loss=0.01158, audio_tagging_loss=0.01857, over 14665.00 frames. ], tot_loss[loss=0.07311, simple_loss=0.08931, pruned_loss=0.01169, audio_tagging_loss=0.01677, over 680109.46 frames. ], batch size: 54, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 11:27:29,546 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3367073.3333333335, ans=0.2 2023-11-26 11:27:31,946 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.78 vs. limit=22.5 2023-11-26 11:27:33,702 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3367073.3333333335, ans=0.5 2023-11-26 11:27:50,327 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3367206.6666666665, ans=0.125 2023-11-26 11:27:53,676 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3367206.6666666665, ans=0.125 2023-11-26 11:27:54,031 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.69 vs. limit=15.0 2023-11-26 11:27:56,528 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.204e+01 9.484e+01 1.020e+02 1.096e+02 2.411e+02, threshold=2.041e+02, percent-clipped=1.0 2023-11-26 11:28:06,762 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.68 vs. limit=10.0 2023-11-26 11:28:09,879 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 505100 2023-11-26 11:28:13,116 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 100, loss[loss=0.0601, simple_loss=0.07048, pruned_loss=0.009205, audio_tagging_loss=0.01565, over 15608.00 frames. ], tot_loss[loss=0.073, simple_loss=0.09019, pruned_loss=0.01186, audio_tagging_loss=0.01604, over 1204415.81 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 11:28:31,897 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3367406.6666666665, ans=0.2 2023-11-26 11:28:51,601 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3367540.0, ans=0.0 2023-11-26 11:29:01,797 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3367606.6666666665, ans=0.125 2023-11-26 11:29:06,285 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 505150 2023-11-26 11:29:08,023 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.63 vs. limit=22.5 2023-11-26 11:29:09,529 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 150, loss[loss=0.07486, simple_loss=0.1085, pruned_loss=0.01088, audio_tagging_loss=0.009739, over 15834.00 frames. ], tot_loss[loss=0.07183, simple_loss=0.09065, pruned_loss=0.01207, audio_tagging_loss=0.01443, over 1610321.16 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 11:29:25,131 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3367740.0, ans=0.125 2023-11-26 11:29:45,399 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3367873.3333333335, ans=0.0 2023-11-26 11:29:47,331 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3367873.3333333335, ans=0.025 2023-11-26 11:29:47,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3367873.3333333335, ans=0.1 2023-11-26 11:29:49,993 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.756e+01 9.187e+01 9.762e+01 1.032e+02 1.254e+02, threshold=1.952e+02, percent-clipped=0.0 2023-11-26 11:29:53,383 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3367940.0, ans=0.125 2023-11-26 11:29:54,374 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3367940.0, ans=0.2 2023-11-26 11:29:55,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=3367940.0, ans=0.5 2023-11-26 11:29:55,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3367940.0, ans=0.1 2023-11-26 11:29:58,302 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3367940.0, ans=0.0 2023-11-26 11:30:02,313 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 505200 2023-11-26 11:30:05,740 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 200, loss[loss=0.0721, simple_loss=0.09559, pruned_loss=0.009126, audio_tagging_loss=0.01518, over 15794.00 frames. ], tot_loss[loss=0.071, simple_loss=0.09125, pruned_loss=0.01242, audio_tagging_loss=0.01295, over 1930012.25 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 11:30:20,438 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3368073.3333333335, ans=0.1 2023-11-26 11:30:24,713 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3368073.3333333335, ans=0.0 2023-11-26 11:30:32,376 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3368140.0, ans=0.0 2023-11-26 11:30:43,351 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3368206.6666666665, ans=0.125 2023-11-26 11:30:54,246 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3368273.3333333335, ans=0.2 2023-11-26 11:30:58,370 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 505250 2023-11-26 11:31:01,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3368340.0, ans=0.125 2023-11-26 11:31:02,030 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 250, loss[loss=0.07419, simple_loss=0.1016, pruned_loss=0.01409, audio_tagging_loss=0.009313, over 15469.00 frames. ], tot_loss[loss=0.07083, simple_loss=0.09284, pruned_loss=0.01276, audio_tagging_loss=0.01165, over 2183872.24 frames. ], batch size: 54, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 11:31:04,797 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.55 vs. limit=15.0 2023-11-26 11:31:10,134 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3368340.0, ans=0.125 2023-11-26 11:31:23,555 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3368473.3333333335, ans=0.125 2023-11-26 11:31:24,005 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.55 vs. limit=22.5 2023-11-26 11:31:27,262 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3368473.3333333335, ans=0.0 2023-11-26 11:31:34,736 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3368540.0, ans=0.125 2023-11-26 11:31:36,839 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3368540.0, ans=0.125 2023-11-26 11:31:39,019 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3368540.0, ans=0.125 2023-11-26 11:31:43,455 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.725e+01 9.006e+01 9.717e+01 1.058e+02 1.490e+02, threshold=1.943e+02, percent-clipped=0.0 2023-11-26 11:31:54,692 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 505300 2023-11-26 11:31:58,296 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 300, loss[loss=0.06529, simple_loss=0.09153, pruned_loss=0.01025, audio_tagging_loss=0.00928, over 15169.00 frames. ], tot_loss[loss=0.07021, simple_loss=0.09315, pruned_loss=0.01293, audio_tagging_loss=0.0107, over 2379149.51 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 11:32:01,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3368673.3333333335, ans=0.0 2023-11-26 11:32:18,031 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3368740.0, ans=0.0 2023-11-26 11:32:31,442 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3368873.3333333335, ans=0.125 2023-11-26 11:32:44,532 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.65 vs. limit=15.0 2023-11-26 11:32:50,424 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 505350 2023-11-26 11:32:51,638 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 11:32:54,041 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 350, loss[loss=0.05842, simple_loss=0.07539, pruned_loss=0.009378, audio_tagging_loss=0.01134, over 15644.00 frames. ], tot_loss[loss=0.06941, simple_loss=0.09276, pruned_loss=0.01289, audio_tagging_loss=0.01014, over 2530295.44 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 11:32:58,658 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3369006.6666666665, ans=0.2 2023-11-26 11:33:01,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3369006.6666666665, ans=0.0 2023-11-26 11:33:25,281 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.44 vs. limit=12.0 2023-11-26 11:33:34,396 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3369206.6666666665, ans=0.125 2023-11-26 11:33:35,781 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.517e+01 8.731e+01 9.201e+01 9.958e+01 1.413e+02, threshold=1.840e+02, percent-clipped=0.0 2023-11-26 11:33:41,660 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.01 vs. limit=15.0 2023-11-26 11:33:46,626 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 505400 2023-11-26 11:33:50,620 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 400, loss[loss=0.07764, simple_loss=0.1128, pruned_loss=0.01425, audio_tagging_loss=0.006989, over 15961.00 frames. ], tot_loss[loss=0.06822, simple_loss=0.09149, pruned_loss=0.01266, audio_tagging_loss=0.009805, over 2645683.70 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 11:34:04,699 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3369406.6666666665, ans=0.1 2023-11-26 11:34:06,787 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3369406.6666666665, ans=0.125 2023-11-26 11:34:31,718 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3369540.0, ans=0.125 2023-11-26 11:34:33,316 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3369540.0, ans=0.0 2023-11-26 11:34:43,226 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 505450 2023-11-26 11:34:46,993 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 450, loss[loss=0.0659, simple_loss=0.08826, pruned_loss=0.01416, audio_tagging_loss=0.007614, over 13965.00 frames. ], tot_loss[loss=0.06785, simple_loss=0.09132, pruned_loss=0.01262, audio_tagging_loss=0.009568, over 2728498.72 frames. ], batch size: 52, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 11:34:51,479 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3369673.3333333335, ans=0.125 2023-11-26 11:35:27,888 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.612e+01 8.871e+01 9.451e+01 1.026e+02 1.404e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-26 11:35:32,938 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3369940.0, ans=0.125 2023-11-26 11:35:39,161 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 505500 2023-11-26 11:35:42,269 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 500, loss[loss=0.0587, simple_loss=0.08246, pruned_loss=0.007702, audio_tagging_loss=0.009769, over 15331.00 frames. ], tot_loss[loss=0.06704, simple_loss=0.09047, pruned_loss=0.01237, audio_tagging_loss=0.009438, over 2796830.76 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 11:35:56,942 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3370073.3333333335, ans=0.125 2023-11-26 11:36:04,967 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3370140.0, ans=0.1 2023-11-26 11:36:08,644 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3370140.0, ans=0.125 2023-11-26 11:36:35,160 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 505550 2023-11-26 11:36:38,284 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 550, loss[loss=0.05223, simple_loss=0.06129, pruned_loss=0.01098, audio_tagging_loss=0.01061, over 14136.00 frames. ], tot_loss[loss=0.0669, simple_loss=0.09045, pruned_loss=0.01236, audio_tagging_loss=0.009314, over 2842836.46 frames. ], batch size: 53, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 11:36:47,703 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3370340.0, ans=0.125 2023-11-26 11:36:48,621 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3370406.6666666665, ans=0.0 2023-11-26 11:36:49,669 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3370406.6666666665, ans=0.1 2023-11-26 11:36:51,722 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.65 vs. limit=6.0 2023-11-26 11:36:52,455 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3370406.6666666665, ans=0.125 2023-11-26 11:37:08,871 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3370473.3333333335, ans=0.2 2023-11-26 11:37:09,997 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3370473.3333333335, ans=0.125 2023-11-26 11:37:19,743 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.474e+01 8.880e+01 9.459e+01 1.018e+02 1.226e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-26 11:37:24,914 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=3370606.6666666665, ans=6.0 2023-11-26 11:37:26,758 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3370606.6666666665, ans=0.2 2023-11-26 11:37:31,039 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 505600 2023-11-26 11:37:34,124 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.10 vs. limit=15.0 2023-11-26 11:37:34,737 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 600, loss[loss=0.08915, simple_loss=0.1225, pruned_loss=0.01965, audio_tagging_loss=0.00823, over 16267.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.09047, pruned_loss=0.01243, audio_tagging_loss=0.009204, over 2884751.41 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 11:37:48,663 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.13 vs. limit=15.0 2023-11-26 11:37:52,699 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.63 vs. limit=15.0 2023-11-26 11:37:54,552 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3370740.0, ans=0.125 2023-11-26 11:38:02,694 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3370806.6666666665, ans=0.125 2023-11-26 11:38:07,324 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3370873.3333333335, ans=0.2 2023-11-26 11:38:11,583 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3370873.3333333335, ans=0.1 2023-11-26 11:38:23,419 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3370940.0, ans=0.125 2023-11-26 11:38:27,378 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 505650 2023-11-26 11:38:30,505 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 650, loss[loss=0.06377, simple_loss=0.09162, pruned_loss=0.008312, audio_tagging_loss=0.009652, over 15739.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.09051, pruned_loss=0.01246, audio_tagging_loss=0.009057, over 2925383.05 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 11:39:12,095 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.982e+01 8.620e+01 9.335e+01 1.001e+02 1.278e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-26 11:39:22,913 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 505700 2023-11-26 11:39:25,290 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3371340.0, ans=0.125 2023-11-26 11:39:26,079 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 700, loss[loss=0.07537, simple_loss=0.1084, pruned_loss=0.01321, audio_tagging_loss=0.007967, over 15263.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.08988, pruned_loss=0.01229, audio_tagging_loss=0.00901, over 2964426.05 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 11:39:33,042 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.10 vs. limit=15.0 2023-11-26 11:39:35,725 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3371340.0, ans=0.125 2023-11-26 11:39:39,045 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3371406.6666666665, ans=0.0 2023-11-26 11:40:05,534 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3371540.0, ans=0.2 2023-11-26 11:40:19,243 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 505750 2023-11-26 11:40:19,498 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3371606.6666666665, ans=0.125 2023-11-26 11:40:22,312 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 750, loss[loss=0.03383, simple_loss=0.03286, pruned_loss=0.004532, audio_tagging_loss=0.01287, over 14548.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08949, pruned_loss=0.01206, audio_tagging_loss=0.008962, over 2985999.42 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 11:40:51,234 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3371806.6666666665, ans=0.125 2023-11-26 11:40:51,598 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.35 vs. limit=15.0 2023-11-26 11:40:55,978 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3371873.3333333335, ans=0.2 2023-11-26 11:41:03,807 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.272e+01 8.569e+01 9.106e+01 9.803e+01 1.361e+02, threshold=1.821e+02, percent-clipped=0.0 2023-11-26 11:41:12,674 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3371940.0, ans=10.0 2023-11-26 11:41:15,685 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 505800 2023-11-26 11:41:19,106 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 800, loss[loss=0.06009, simple_loss=0.09035, pruned_loss=0.005603, audio_tagging_loss=0.009311, over 15927.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.08997, pruned_loss=0.01214, audio_tagging_loss=0.009014, over 3002449.32 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 32.0 2023-11-26 11:41:20,355 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3372006.6666666665, ans=0.125 2023-11-26 11:41:29,214 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten.whitening_limit, batch_count=3372073.3333333335, ans=15.0 2023-11-26 11:41:32,323 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3372073.3333333335, ans=0.125 2023-11-26 11:41:40,282 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3372140.0, ans=0.125 2023-11-26 11:41:59,254 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.94 vs. limit=22.5 2023-11-26 11:42:09,511 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3372273.3333333335, ans=0.025 2023-11-26 11:42:11,419 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 505850 2023-11-26 11:42:14,552 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 850, loss[loss=0.06695, simple_loss=0.0977, pruned_loss=0.009637, audio_tagging_loss=0.008465, over 14726.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.09016, pruned_loss=0.01216, audio_tagging_loss=0.00911, over 3011303.05 frames. ], batch size: 55, lr: 1.58e-03, grad_scale: 32.0 2023-11-26 11:42:30,221 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.92 vs. limit=6.0 2023-11-26 11:42:33,412 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff2.min_abs, batch_count=3372406.6666666665, ans=0.1 2023-11-26 11:42:40,467 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3372473.3333333335, ans=0.07 2023-11-26 11:42:49,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3372540.0, ans=0.125 2023-11-26 11:42:55,514 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.04 vs. limit=15.0 2023-11-26 11:42:55,992 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.462e+01 8.767e+01 9.422e+01 1.006e+02 1.207e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-26 11:42:57,369 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3372540.0, ans=0.0 2023-11-26 11:42:58,748 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.85 vs. limit=15.0 2023-11-26 11:43:07,282 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 505900 2023-11-26 11:43:11,000 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 900, loss[loss=0.06475, simple_loss=0.07849, pruned_loss=0.01436, audio_tagging_loss=0.01114, over 14894.00 frames. ], tot_loss[loss=0.06706, simple_loss=0.09088, pruned_loss=0.01243, audio_tagging_loss=0.009189, over 3017970.27 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 32.0 2023-11-26 11:43:12,432 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3372673.3333333335, ans=0.1 2023-11-26 11:43:42,768 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.04 vs. limit=22.5 2023-11-26 11:43:43,542 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3372873.3333333335, ans=0.125 2023-11-26 11:44:04,589 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 505950 2023-11-26 11:44:05,843 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3372940.0, ans=0.125 2023-11-26 11:44:06,883 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3373006.6666666665, ans=0.0 2023-11-26 11:44:07,743 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 950, loss[loss=0.05985, simple_loss=0.07642, pruned_loss=0.009574, audio_tagging_loss=0.01206, over 14307.00 frames. ], tot_loss[loss=0.06719, simple_loss=0.09122, pruned_loss=0.01245, audio_tagging_loss=0.009124, over 3025442.07 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 32.0 2023-11-26 11:44:24,917 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3373073.3333333335, ans=0.04949747468305833 2023-11-26 11:44:28,312 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.12 vs. limit=22.5 2023-11-26 11:44:33,857 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3373140.0, ans=10.0 2023-11-26 11:44:40,254 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 11:44:48,916 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.455e+01 8.709e+01 9.307e+01 9.774e+01 1.284e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-26 11:44:59,659 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 506000 2023-11-26 11:45:03,182 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 1000, loss[loss=0.06292, simple_loss=0.0873, pruned_loss=0.01207, audio_tagging_loss=0.007202, over 14918.00 frames. ], tot_loss[loss=0.06693, simple_loss=0.09073, pruned_loss=0.01257, audio_tagging_loss=0.008994, over 3027964.42 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 32.0 2023-11-26 11:45:22,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3373406.6666666665, ans=0.0 2023-11-26 11:45:27,322 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 11:45:29,028 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.32 vs. limit=22.5 2023-11-26 11:45:30,824 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3373473.3333333335, ans=0.2 2023-11-26 11:45:36,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3373540.0, ans=0.1 2023-11-26 11:45:55,510 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 506050 2023-11-26 11:45:58,642 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 1050, loss[loss=0.07391, simple_loss=0.1004, pruned_loss=0.01687, audio_tagging_loss=0.006834, over 16490.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.0897, pruned_loss=0.01241, audio_tagging_loss=0.008812, over 3028290.99 frames. ], batch size: 60, lr: 1.58e-03, grad_scale: 32.0 2023-11-26 11:46:10,595 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.83 vs. limit=15.0 2023-11-26 11:46:19,757 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3373740.0, ans=0.1 2023-11-26 11:46:32,468 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3373873.3333333335, ans=0.125 2023-11-26 11:46:40,309 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.136e+01 8.798e+01 9.171e+01 9.764e+01 1.249e+02, threshold=1.834e+02, percent-clipped=0.0 2023-11-26 11:46:43,165 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.62 vs. limit=15.0 2023-11-26 11:46:49,606 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3373940.0, ans=0.0 2023-11-26 11:46:51,275 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3373940.0, ans=0.125 2023-11-26 11:46:52,190 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 506100 2023-11-26 11:46:55,360 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 1100, loss[loss=0.05611, simple_loss=0.07516, pruned_loss=0.009244, audio_tagging_loss=0.009283, over 14093.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08964, pruned_loss=0.01233, audio_tagging_loss=0.008693, over 3029940.51 frames. ], batch size: 53, lr: 1.58e-03, grad_scale: 32.0 2023-11-26 11:46:57,480 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 11:47:24,900 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3374140.0, ans=0.125 2023-11-26 11:47:30,686 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3374206.6666666665, ans=0.1 2023-11-26 11:47:47,598 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 506150 2023-11-26 11:47:50,745 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 1150, loss[loss=0.05809, simple_loss=0.07615, pruned_loss=0.009368, audio_tagging_loss=0.01064, over 13920.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08875, pruned_loss=0.01201, audio_tagging_loss=0.008673, over 3026948.75 frames. ], batch size: 53, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 11:47:59,525 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3374340.0, ans=0.125 2023-11-26 11:48:01,537 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.14 vs. limit=15.0 2023-11-26 11:48:09,668 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3374406.6666666665, ans=0.0 2023-11-26 11:48:28,560 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3374540.0, ans=0.0 2023-11-26 11:48:29,893 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3374540.0, ans=0.0 2023-11-26 11:48:33,724 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.518e+01 8.776e+01 9.671e+01 1.060e+02 1.290e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-26 11:48:43,888 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 506200 2023-11-26 11:48:47,300 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 1200, loss[loss=0.06147, simple_loss=0.08351, pruned_loss=0.01104, audio_tagging_loss=0.008674, over 14280.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.089, pruned_loss=0.01222, audio_tagging_loss=0.008624, over 3027619.38 frames. ], batch size: 54, lr: 1.58e-03, grad_scale: 32.0 2023-11-26 11:48:51,543 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3374673.3333333335, ans=0.1 2023-11-26 11:48:54,405 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3374673.3333333335, ans=0.2 2023-11-26 11:48:55,615 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.84 vs. limit=15.0 2023-11-26 11:48:56,551 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3374673.3333333335, ans=0.125 2023-11-26 11:49:04,151 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.46 vs. limit=15.0 2023-11-26 11:49:20,599 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.41 vs. limit=15.0 2023-11-26 11:49:40,146 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 506250 2023-11-26 11:49:43,030 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3375006.6666666665, ans=0.125 2023-11-26 11:49:43,816 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 1250, loss[loss=0.0534, simple_loss=0.06874, pruned_loss=0.008269, audio_tagging_loss=0.01076, over 14643.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08891, pruned_loss=0.01228, audio_tagging_loss=0.008616, over 3040985.25 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 32.0 2023-11-26 11:49:53,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3375006.6666666665, ans=0.125 2023-11-26 11:49:55,305 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3375073.3333333335, ans=0.2 2023-11-26 11:49:56,357 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3375073.3333333335, ans=0.125 2023-11-26 11:50:11,058 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.97 vs. limit=15.0 2023-11-26 11:50:13,904 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3375140.0, ans=0.2 2023-11-26 11:50:26,477 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.041e+01 8.620e+01 9.220e+01 9.934e+01 1.296e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-26 11:50:36,712 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 506300 2023-11-26 11:50:39,855 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 1300, loss[loss=0.06275, simple_loss=0.08481, pruned_loss=0.01242, audio_tagging_loss=0.007919, over 15955.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.08847, pruned_loss=0.01214, audio_tagging_loss=0.008635, over 3033316.27 frames. ], batch size: 60, lr: 1.58e-03, grad_scale: 32.0 2023-11-26 11:50:42,316 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3375340.0, ans=0.125 2023-11-26 11:50:52,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3375406.6666666665, ans=0.1 2023-11-26 11:50:59,945 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3375406.6666666665, ans=0.0 2023-11-26 11:51:08,570 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3375473.3333333335, ans=0.125 2023-11-26 11:51:09,666 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3375473.3333333335, ans=0.2 2023-11-26 11:51:16,583 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3375540.0, ans=0.0 2023-11-26 11:51:20,718 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3375540.0, ans=0.0 2023-11-26 11:51:27,054 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3375606.6666666665, ans=0.0 2023-11-26 11:51:32,247 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 506350 2023-11-26 11:51:35,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3375673.3333333335, ans=0.0 2023-11-26 11:51:36,021 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 1350, loss[loss=0.06166, simple_loss=0.07997, pruned_loss=0.01275, audio_tagging_loss=0.008931, over 15595.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08898, pruned_loss=0.01231, audio_tagging_loss=0.008624, over 3038671.42 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 32.0 2023-11-26 11:51:39,454 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3375673.3333333335, ans=0.0 2023-11-26 11:51:41,552 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3375673.3333333335, ans=0.0 2023-11-26 11:52:15,831 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 11:52:19,581 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.306e+01 8.796e+01 9.351e+01 9.969e+01 1.266e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-26 11:52:19,703 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3375940.0, ans=0.125 2023-11-26 11:52:28,746 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 506400 2023-11-26 11:52:32,107 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 1400, loss[loss=0.07362, simple_loss=0.08982, pruned_loss=0.01441, audio_tagging_loss=0.01431, over 15217.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08888, pruned_loss=0.01219, audio_tagging_loss=0.008773, over 3041022.74 frames. ], batch size: 59, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 11:52:35,002 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3376006.6666666665, ans=0.125 2023-11-26 11:52:41,706 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.61 vs. limit=15.0 2023-11-26 11:52:54,896 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3376140.0, ans=0.0 2023-11-26 11:52:55,969 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3376140.0, ans=0.05 2023-11-26 11:53:09,493 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.74 vs. limit=22.5 2023-11-26 11:53:16,603 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 11:53:25,004 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 506450 2023-11-26 11:53:28,663 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 1450, loss[loss=0.07807, simple_loss=0.1006, pruned_loss=0.01846, audio_tagging_loss=0.009292, over 14885.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.08914, pruned_loss=0.01235, audio_tagging_loss=0.00882, over 3039161.33 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 11:53:38,321 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 11:53:39,750 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.48 vs. limit=15.0 2023-11-26 11:53:46,438 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.39 vs. limit=12.0 2023-11-26 11:53:47,531 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.67 vs. limit=10.0 2023-11-26 11:54:07,952 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.85 vs. limit=15.0 2023-11-26 11:54:08,536 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3376540.0, ans=0.0 2023-11-26 11:54:12,032 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.549e+01 8.794e+01 9.262e+01 1.013e+02 1.743e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-26 11:54:13,396 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3376606.6666666665, ans=0.0 2023-11-26 11:54:20,677 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 506500 2023-11-26 11:54:23,740 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 1500, loss[loss=0.0545, simple_loss=0.0652, pruned_loss=0.01042, audio_tagging_loss=0.01148, over 14989.00 frames. ], tot_loss[loss=0.06668, simple_loss=0.09012, pruned_loss=0.01266, audio_tagging_loss=0.008958, over 3038438.66 frames. ], batch size: 59, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 11:54:25,122 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3376673.3333333335, ans=0.125 2023-11-26 11:54:29,182 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.01 vs. limit=22.5 2023-11-26 11:54:49,878 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3376806.6666666665, ans=0.2 2023-11-26 11:54:56,907 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3376873.3333333335, ans=0.0 2023-11-26 11:55:01,242 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 11:55:09,571 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.45 vs. limit=15.0 2023-11-26 11:55:16,596 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 506550 2023-11-26 11:55:19,943 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.62 vs. limit=15.0 2023-11-26 11:55:20,281 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 1550, loss[loss=0.063, simple_loss=0.09261, pruned_loss=0.008581, audio_tagging_loss=0.008115, over 15080.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.09049, pruned_loss=0.01258, audio_tagging_loss=0.00901, over 3035975.22 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 4.0 2023-11-26 11:55:42,933 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3377140.0, ans=0.05 2023-11-26 11:55:44,883 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3377140.0, ans=0.125 2023-11-26 11:55:55,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3377206.6666666665, ans=0.1 2023-11-26 11:56:06,472 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.131e+01 8.914e+01 9.617e+01 1.050e+02 1.576e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-26 11:56:13,028 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 506600 2023-11-26 11:56:15,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3377340.0, ans=0.125 2023-11-26 11:56:16,423 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 1600, loss[loss=0.06564, simple_loss=0.09142, pruned_loss=0.01129, audio_tagging_loss=0.008642, over 14299.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.09039, pruned_loss=0.01254, audio_tagging_loss=0.009048, over 3046127.55 frames. ], batch size: 53, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 11:56:20,975 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3377340.0, ans=0.125 2023-11-26 11:56:22,889 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3377340.0, ans=0.125 2023-11-26 11:57:09,293 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 506650 2023-11-26 11:57:12,415 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 1650, loss[loss=0.08098, simple_loss=0.1085, pruned_loss=0.01582, audio_tagging_loss=0.01093, over 14367.00 frames. ], tot_loss[loss=0.06717, simple_loss=0.09093, pruned_loss=0.01275, audio_tagging_loss=0.008957, over 3046525.23 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 11:57:17,816 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.61 vs. limit=15.0 2023-11-26 11:57:22,792 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 11:57:23,800 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3377740.0, ans=0.1 2023-11-26 11:57:42,168 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3377806.6666666665, ans=0.1 2023-11-26 11:57:46,088 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.04 vs. limit=15.0 2023-11-26 11:57:58,098 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3377940.0, ans=0.0 2023-11-26 11:57:58,839 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.445e+01 8.844e+01 9.530e+01 1.032e+02 1.539e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-26 11:58:05,851 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 506700 2023-11-26 11:58:09,014 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 1700, loss[loss=0.08658, simple_loss=0.118, pruned_loss=0.02128, audio_tagging_loss=0.006294, over 14745.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.09021, pruned_loss=0.01262, audio_tagging_loss=0.009077, over 3051260.25 frames. ], batch size: 55, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 11:58:09,902 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3378006.6666666665, ans=0.0 2023-11-26 11:58:23,125 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3378073.3333333335, ans=0.125 2023-11-26 11:58:25,293 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3378073.3333333335, ans=0.0 2023-11-26 11:58:36,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3378140.0, ans=0.2 2023-11-26 11:58:48,582 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3378206.6666666665, ans=0.125 2023-11-26 11:58:55,609 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.07 vs. limit=15.0 2023-11-26 11:59:02,198 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 506750 2023-11-26 11:59:03,456 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3378273.3333333335, ans=0.125 2023-11-26 11:59:03,754 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.15 vs. limit=15.0 2023-11-26 11:59:05,384 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 1750, loss[loss=0.07832, simple_loss=0.1088, pruned_loss=0.014, audio_tagging_loss=0.009917, over 14410.00 frames. ], tot_loss[loss=0.06668, simple_loss=0.09017, pruned_loss=0.01252, audio_tagging_loss=0.009078, over 3050189.95 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 11:59:06,710 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3378340.0, ans=0.1 2023-11-26 11:59:12,296 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.13 vs. limit=15.0 2023-11-26 11:59:35,216 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.23 vs. limit=15.0 2023-11-26 11:59:47,691 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.68 vs. limit=15.0 2023-11-26 11:59:51,255 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.129e+01 8.598e+01 9.201e+01 1.005e+02 1.270e+02, threshold=1.840e+02, percent-clipped=0.0 2023-11-26 11:59:57,756 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 506800 2023-11-26 12:00:01,108 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 1800, loss[loss=0.05814, simple_loss=0.07336, pruned_loss=0.01177, audio_tagging_loss=0.009698, over 14805.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.08972, pruned_loss=0.01248, audio_tagging_loss=0.009028, over 3055123.54 frames. ], batch size: 59, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 12:00:07,299 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.51 vs. limit=15.0 2023-11-26 12:00:17,975 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.29 vs. limit=22.5 2023-11-26 12:00:18,746 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3378740.0, ans=0.125 2023-11-26 12:00:20,678 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3378740.0, ans=0.2 2023-11-26 12:00:20,799 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3378740.0, ans=0.0 2023-11-26 12:00:54,397 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 506850 2023-11-26 12:00:57,541 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 1850, loss[loss=0.06513, simple_loss=0.08827, pruned_loss=0.01384, audio_tagging_loss=0.007156, over 14420.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.08974, pruned_loss=0.01244, audio_tagging_loss=0.008879, over 3049443.24 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 12:00:58,967 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.86 vs. limit=22.5 2023-11-26 12:01:04,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3379006.6666666665, ans=0.0 2023-11-26 12:01:06,324 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3379006.6666666665, ans=0.2 2023-11-26 12:01:07,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3379006.6666666665, ans=0.125 2023-11-26 12:01:18,863 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3379140.0, ans=0.125 2023-11-26 12:01:40,549 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.63 vs. limit=6.0 2023-11-26 12:01:43,739 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.338e+01 8.799e+01 9.499e+01 1.025e+02 1.305e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-26 12:01:48,837 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3379273.3333333335, ans=0.125 2023-11-26 12:01:50,862 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 506900 2023-11-26 12:01:53,967 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 1900, loss[loss=0.07467, simple_loss=0.1081, pruned_loss=0.01263, audio_tagging_loss=0.007976, over 15149.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.09019, pruned_loss=0.01244, audio_tagging_loss=0.008852, over 3050947.79 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 12:02:01,673 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3379340.0, ans=0.0 2023-11-26 12:02:09,179 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3379406.6666666665, ans=0.2 2023-11-26 12:02:44,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3379606.6666666665, ans=0.2 2023-11-26 12:02:46,146 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 506950 2023-11-26 12:02:48,489 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3379673.3333333335, ans=0.07 2023-11-26 12:02:49,349 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 1950, loss[loss=0.07181, simple_loss=0.08929, pruned_loss=0.01481, audio_tagging_loss=0.01235, over 15248.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.09005, pruned_loss=0.01253, audio_tagging_loss=0.008847, over 3047101.38 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 12:02:55,928 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3379673.3333333335, ans=0.125 2023-11-26 12:03:20,616 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 12:03:35,544 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.070e+01 8.710e+01 9.475e+01 1.012e+02 2.962e+02, threshold=1.895e+02, percent-clipped=1.0 2023-11-26 12:03:42,042 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.05 vs. limit=15.0 2023-11-26 12:03:42,557 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 507000 2023-11-26 12:03:45,962 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 2000, loss[loss=0.07206, simple_loss=0.09968, pruned_loss=0.01574, audio_tagging_loss=0.006489, over 14759.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.0888, pruned_loss=0.01241, audio_tagging_loss=0.008807, over 3048365.42 frames. ], batch size: 55, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:03:53,571 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3380006.6666666665, ans=0.125 2023-11-26 12:04:18,526 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.88 vs. limit=6.0 2023-11-26 12:04:38,177 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3380273.3333333335, ans=0.0 2023-11-26 12:04:39,505 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 507050 2023-11-26 12:04:42,696 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 2050, loss[loss=0.06131, simple_loss=0.0796, pruned_loss=0.01098, audio_tagging_loss=0.01053, over 15854.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08901, pruned_loss=0.01244, audio_tagging_loss=0.008785, over 3043433.59 frames. ], batch size: 59, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:04:47,376 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3380340.0, ans=0.0 2023-11-26 12:05:06,352 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3380473.3333333335, ans=0.1 2023-11-26 12:05:14,276 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 12:05:20,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3380540.0, ans=0.125 2023-11-26 12:05:26,840 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.72 vs. limit=15.0 2023-11-26 12:05:28,416 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.082e+01 8.859e+01 9.273e+01 1.013e+02 1.302e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-26 12:05:34,984 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 507100 2023-11-26 12:05:38,117 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 2100, loss[loss=0.07244, simple_loss=0.09548, pruned_loss=0.01568, audio_tagging_loss=0.00902, over 15692.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08947, pruned_loss=0.01241, audio_tagging_loss=0.008718, over 3052891.72 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:05:40,393 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3380673.3333333335, ans=0.0 2023-11-26 12:05:46,781 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3380673.3333333335, ans=0.125 2023-11-26 12:05:48,959 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3380740.0, ans=0.2 2023-11-26 12:05:50,531 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3380740.0, ans=0.125 2023-11-26 12:06:02,987 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.75 vs. limit=8.0 2023-11-26 12:06:03,871 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.86 vs. limit=22.5 2023-11-26 12:06:05,933 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3380806.6666666665, ans=0.0 2023-11-26 12:06:24,053 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3380940.0, ans=0.125 2023-11-26 12:06:25,514 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.54 vs. limit=22.5 2023-11-26 12:06:26,251 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3380940.0, ans=0.09899494936611666 2023-11-26 12:06:30,288 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 507150 2023-11-26 12:06:33,900 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 2150, loss[loss=0.0632, simple_loss=0.08878, pruned_loss=0.01023, audio_tagging_loss=0.008581, over 16050.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08904, pruned_loss=0.01212, audio_tagging_loss=0.008815, over 3057906.87 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:06:51,860 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3381073.3333333335, ans=0.125 2023-11-26 12:06:52,802 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3381073.3333333335, ans=0.125 2023-11-26 12:06:53,980 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3381073.3333333335, ans=0.125 2023-11-26 12:07:07,569 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 12:07:19,711 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.860e+01 8.634e+01 9.242e+01 1.004e+02 1.355e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-26 12:07:26,696 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 507200 2023-11-26 12:07:30,658 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 2200, loss[loss=0.07163, simple_loss=0.09311, pruned_loss=0.01673, audio_tagging_loss=0.008349, over 15217.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.08941, pruned_loss=0.01227, audio_tagging_loss=0.0088, over 3054920.06 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:07:34,583 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.11 vs. limit=6.0 2023-11-26 12:07:44,103 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.39 vs. limit=15.0 2023-11-26 12:07:46,767 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3381406.6666666665, ans=0.1 2023-11-26 12:07:47,937 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3381406.6666666665, ans=0.125 2023-11-26 12:07:51,184 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3381473.3333333335, ans=0.0 2023-11-26 12:08:11,357 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3381540.0, ans=0.125 2023-11-26 12:08:23,032 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 507250 2023-11-26 12:08:25,843 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.77 vs. limit=15.0 2023-11-26 12:08:26,133 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 2250, loss[loss=0.06851, simple_loss=0.09405, pruned_loss=0.01234, audio_tagging_loss=0.00914, over 15556.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.08987, pruned_loss=0.01236, audio_tagging_loss=0.008811, over 3042613.61 frames. ], batch size: 54, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:08:33,822 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3381673.3333333335, ans=0.07 2023-11-26 12:08:42,694 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3381740.0, ans=0.04949747468305833 2023-11-26 12:08:53,108 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.36 vs. limit=15.0 2023-11-26 12:09:00,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3381873.3333333335, ans=0.125 2023-11-26 12:09:04,642 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.31 vs. limit=6.0 2023-11-26 12:09:11,481 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.618e+01 8.710e+01 9.301e+01 1.010e+02 1.448e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-26 12:09:17,968 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 507300 2023-11-26 12:09:21,642 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 2300, loss[loss=0.06566, simple_loss=0.08018, pruned_loss=0.01364, audio_tagging_loss=0.01192, over 15326.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.08921, pruned_loss=0.01238, audio_tagging_loss=0.008841, over 3041545.11 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:09:22,933 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3382006.6666666665, ans=0.125 2023-11-26 12:09:40,480 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.27 vs. limit=22.5 2023-11-26 12:10:10,778 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 12:10:13,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3382273.3333333335, ans=0.125 2023-11-26 12:10:14,598 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 507350 2023-11-26 12:10:17,689 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 2350, loss[loss=0.05047, simple_loss=0.06641, pruned_loss=0.006773, audio_tagging_loss=0.01049, over 16246.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.08958, pruned_loss=0.01247, audio_tagging_loss=0.008876, over 3046498.87 frames. ], batch size: 62, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:10:24,881 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3382340.0, ans=0.125 2023-11-26 12:10:55,783 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3382540.0, ans=0.0 2023-11-26 12:11:03,362 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3382606.6666666665, ans=0.0 2023-11-26 12:11:04,119 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.454e+01 8.890e+01 9.561e+01 1.021e+02 1.290e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-26 12:11:11,230 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 507400 2023-11-26 12:11:14,681 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 2400, loss[loss=0.0796, simple_loss=0.111, pruned_loss=0.01579, audio_tagging_loss=0.008327, over 16174.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.08994, pruned_loss=0.01237, audio_tagging_loss=0.008905, over 3048808.05 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 32.0 2023-11-26 12:11:26,629 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3382740.0, ans=0.2 2023-11-26 12:11:58,281 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3382940.0, ans=0.125 2023-11-26 12:12:01,392 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3382940.0, ans=0.125 2023-11-26 12:12:06,706 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 507450 2023-11-26 12:12:09,769 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 2450, loss[loss=0.07102, simple_loss=0.1024, pruned_loss=0.01109, audio_tagging_loss=0.008738, over 14864.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.09067, pruned_loss=0.0125, audio_tagging_loss=0.008863, over 3041563.09 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:12:18,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3383006.6666666665, ans=0.125 2023-11-26 12:12:45,148 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3383206.6666666665, ans=0.0 2023-11-26 12:12:55,319 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3383273.3333333335, ans=0.125 2023-11-26 12:12:57,203 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.415e+01 8.728e+01 9.307e+01 9.934e+01 1.225e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-26 12:13:02,386 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 507500 2023-11-26 12:13:06,031 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 2500, loss[loss=0.04221, simple_loss=0.05378, pruned_loss=0.006658, audio_tagging_loss=0.008668, over 14620.00 frames. ], tot_loss[loss=0.06675, simple_loss=0.09077, pruned_loss=0.01252, audio_tagging_loss=0.00884, over 3042852.63 frames. ], batch size: 55, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:13:09,427 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3383340.0, ans=0.0 2023-11-26 12:13:43,322 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.36 vs. limit=15.0 2023-11-26 12:13:58,565 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.82 vs. limit=15.0 2023-11-26 12:13:59,021 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 507550 2023-11-26 12:14:00,199 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3383606.6666666665, ans=0.125 2023-11-26 12:14:02,170 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 2550, loss[loss=0.08236, simple_loss=0.1156, pruned_loss=0.01644, audio_tagging_loss=0.008138, over 13767.00 frames. ], tot_loss[loss=0.06679, simple_loss=0.09088, pruned_loss=0.01258, audio_tagging_loss=0.00877, over 3036280.30 frames. ], batch size: 53, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:14:18,082 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3383740.0, ans=0.2 2023-11-26 12:14:26,041 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3383806.6666666665, ans=0.2 2023-11-26 12:14:34,353 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.98 vs. limit=15.0 2023-11-26 12:14:43,019 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3383873.3333333335, ans=0.1 2023-11-26 12:14:49,150 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.468e+01 8.661e+01 9.276e+01 1.004e+02 1.739e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-26 12:14:51,027 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3383940.0, ans=0.125 2023-11-26 12:14:52,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3383940.0, ans=0.0 2023-11-26 12:14:54,981 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 507600 2023-11-26 12:14:58,339 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 2600, loss[loss=0.06346, simple_loss=0.0828, pruned_loss=0.01321, audio_tagging_loss=0.00885, over 15653.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.09113, pruned_loss=0.01258, audio_tagging_loss=0.008629, over 3031531.53 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:15:03,326 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3384006.6666666665, ans=0.1 2023-11-26 12:15:17,772 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3384073.3333333335, ans=0.0 2023-11-26 12:15:27,782 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3384140.0, ans=0.035 2023-11-26 12:15:44,736 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.18 vs. limit=22.5 2023-11-26 12:15:48,124 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3384273.3333333335, ans=0.1 2023-11-26 12:15:51,095 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 507650 2023-11-26 12:15:54,229 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 2650, loss[loss=0.06611, simple_loss=0.09232, pruned_loss=0.01124, audio_tagging_loss=0.008714, over 17119.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.0907, pruned_loss=0.01256, audio_tagging_loss=0.008537, over 3036735.52 frames. ], batch size: 66, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 12:16:14,608 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3384406.6666666665, ans=0.2 2023-11-26 12:16:19,573 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.34 vs. limit=15.0 2023-11-26 12:16:42,421 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.481e+01 8.705e+01 9.342e+01 1.013e+02 1.276e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-26 12:16:47,309 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 507700 2023-11-26 12:16:50,471 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 2700, loss[loss=0.04568, simple_loss=0.04832, pruned_loss=0.007194, audio_tagging_loss=0.01432, over 14127.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.09023, pruned_loss=0.01254, audio_tagging_loss=0.008607, over 3038798.30 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 12:17:00,401 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3384740.0, ans=0.125 2023-11-26 12:17:02,546 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3384740.0, ans=0.1 2023-11-26 12:17:22,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3384806.6666666665, ans=0.125 2023-11-26 12:17:26,306 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3384873.3333333335, ans=0.035 2023-11-26 12:17:30,055 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3384873.3333333335, ans=0.0 2023-11-26 12:17:32,220 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3384873.3333333335, ans=0.2 2023-11-26 12:17:38,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3384940.0, ans=0.2 2023-11-26 12:17:42,628 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 507750 2023-11-26 12:17:45,890 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 2750, loss[loss=0.05564, simple_loss=0.07711, pruned_loss=0.00878, audio_tagging_loss=0.008305, over 15494.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08917, pruned_loss=0.01236, audio_tagging_loss=0.008667, over 3039915.58 frames. ], batch size: 59, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 12:17:46,192 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3385006.6666666665, ans=0.1 2023-11-26 12:17:46,199 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3385006.6666666665, ans=0.125 2023-11-26 12:18:16,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3385140.0, ans=0.0 2023-11-26 12:18:20,147 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3385206.6666666665, ans=10.0 2023-11-26 12:18:22,148 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3385206.6666666665, ans=0.0 2023-11-26 12:18:34,400 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.630e+01 8.797e+01 9.310e+01 1.006e+02 1.204e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-26 12:18:36,009 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 12:18:39,236 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 507800 2023-11-26 12:18:40,417 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3385273.3333333335, ans=0.125 2023-11-26 12:18:42,661 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 2800, loss[loss=0.0572, simple_loss=0.08159, pruned_loss=0.008696, audio_tagging_loss=0.007712, over 14377.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.0886, pruned_loss=0.01221, audio_tagging_loss=0.008679, over 3040755.77 frames. ], batch size: 54, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:18:48,112 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.95 vs. limit=15.0 2023-11-26 12:19:00,058 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3385406.6666666665, ans=0.0 2023-11-26 12:19:32,937 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.66 vs. limit=15.0 2023-11-26 12:19:36,160 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 507850 2023-11-26 12:19:38,445 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3385673.3333333335, ans=0.035 2023-11-26 12:19:39,345 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 2850, loss[loss=0.06906, simple_loss=0.0974, pruned_loss=0.01507, audio_tagging_loss=0.005295, over 14958.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08858, pruned_loss=0.01225, audio_tagging_loss=0.008672, over 3042554.20 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:19:39,494 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3385673.3333333335, ans=0.0 2023-11-26 12:19:41,710 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3385673.3333333335, ans=0.125 2023-11-26 12:19:56,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3385740.0, ans=0.0 2023-11-26 12:20:01,984 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3385806.6666666665, ans=0.0 2023-11-26 12:20:08,505 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3385806.6666666665, ans=0.0 2023-11-26 12:20:12,659 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.00 vs. limit=15.0 2023-11-26 12:20:13,303 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3385873.3333333335, ans=0.0 2023-11-26 12:20:24,404 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3385940.0, ans=0.2 2023-11-26 12:20:28,302 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.236e+01 8.713e+01 9.303e+01 9.917e+01 1.324e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-26 12:20:31,670 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 507900 2023-11-26 12:20:34,806 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 2900, loss[loss=0.04894, simple_loss=0.06251, pruned_loss=0.005872, audio_tagging_loss=0.01181, over 15493.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.08822, pruned_loss=0.01213, audio_tagging_loss=0.008769, over 3043728.68 frames. ], batch size: 60, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 12:20:41,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3386006.6666666665, ans=0.0 2023-11-26 12:20:41,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3386006.6666666665, ans=0.1 2023-11-26 12:20:41,876 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.02 vs. limit=15.0 2023-11-26 12:20:46,819 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3386073.3333333335, ans=0.125 2023-11-26 12:21:04,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3386140.0, ans=0.125 2023-11-26 12:21:05,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3386140.0, ans=0.125 2023-11-26 12:21:15,690 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3386206.6666666665, ans=0.0 2023-11-26 12:21:27,873 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 507950 2023-11-26 12:21:31,523 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 2950, loss[loss=0.07461, simple_loss=0.1013, pruned_loss=0.0129, audio_tagging_loss=0.01107, over 15340.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08866, pruned_loss=0.01226, audio_tagging_loss=0.008801, over 3042116.11 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 12:21:34,234 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.50 vs. limit=12.0 2023-11-26 12:21:42,191 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.29 vs. limit=22.5 2023-11-26 12:21:51,825 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3386406.6666666665, ans=0.0 2023-11-26 12:21:59,097 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3386473.3333333335, ans=0.125 2023-11-26 12:22:06,696 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3386540.0, ans=0.0 2023-11-26 12:22:20,226 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3386606.6666666665, ans=0.1 2023-11-26 12:22:20,949 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.213e+01 8.672e+01 9.532e+01 9.988e+01 1.402e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-26 12:22:24,241 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 508000 2023-11-26 12:22:30,161 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 3000, loss[loss=0.07113, simple_loss=0.09649, pruned_loss=0.01433, audio_tagging_loss=0.008552, over 15287.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08862, pruned_loss=0.0122, audio_tagging_loss=0.008859, over 3048245.56 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 12:22:30,162 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-26 12:23:02,711 INFO [train_asr.py:1267] (3/4) Epoch 43, validation: loss=0.05754, simple_loss=0.05056, pruned_loss=0.00524, audio_tagging_loss=0.02702, over 4681554.00 frames. 2023-11-26 12:23:02,712 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-26 12:23:21,352 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3386740.0, ans=0.125 2023-11-26 12:23:30,409 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3386806.6666666665, ans=0.0 2023-11-26 12:23:52,141 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3386940.0, ans=0.0 2023-11-26 12:23:53,128 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3386940.0, ans=0.125 2023-11-26 12:23:54,621 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.73 vs. limit=6.0 2023-11-26 12:23:55,282 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 508050 2023-11-26 12:23:58,999 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 3050, loss[loss=0.0765, simple_loss=0.1024, pruned_loss=0.01644, audio_tagging_loss=0.00885, over 15153.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08918, pruned_loss=0.01219, audio_tagging_loss=0.008822, over 3047169.46 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 12:24:15,390 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3387073.3333333335, ans=0.0 2023-11-26 12:24:30,610 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.38 vs. limit=15.0 2023-11-26 12:24:32,791 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 12:24:48,667 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.074e+01 8.532e+01 9.331e+01 1.001e+02 1.251e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-26 12:24:48,851 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3387273.3333333335, ans=0.125 2023-11-26 12:24:52,559 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 508100 2023-11-26 12:24:55,663 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 3100, loss[loss=0.08231, simple_loss=0.1243, pruned_loss=0.01478, audio_tagging_loss=0.005374, over 14438.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.08967, pruned_loss=0.01237, audio_tagging_loss=0.008872, over 3051903.25 frames. ], batch size: 52, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 12:25:41,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3387606.6666666665, ans=0.0 2023-11-26 12:25:47,453 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 508150 2023-11-26 12:25:47,458 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3387606.6666666665, ans=0.1 2023-11-26 12:25:50,634 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 3150, loss[loss=0.07464, simple_loss=0.1051, pruned_loss=0.01496, audio_tagging_loss=0.007112, over 15846.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.09033, pruned_loss=0.01243, audio_tagging_loss=0.008977, over 3053990.55 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 8.0 2023-11-26 12:25:55,136 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3387673.3333333335, ans=0.05 2023-11-26 12:26:02,108 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3387740.0, ans=0.0 2023-11-26 12:26:10,649 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3387740.0, ans=0.1 2023-11-26 12:26:14,215 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.76 vs. limit=15.0 2023-11-26 12:26:24,159 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3387873.3333333335, ans=0.125 2023-11-26 12:26:24,202 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3387873.3333333335, ans=0.1 2023-11-26 12:26:27,373 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3387873.3333333335, ans=0.125 2023-11-26 12:26:34,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3387940.0, ans=0.0 2023-11-26 12:26:39,857 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.518e+01 8.956e+01 9.437e+01 1.017e+02 1.314e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-26 12:26:43,689 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 508200 2023-11-26 12:26:43,800 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3387940.0, ans=0.0 2023-11-26 12:26:47,021 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 3200, loss[loss=0.06935, simple_loss=0.09674, pruned_loss=0.0138, audio_tagging_loss=0.007181, over 14305.00 frames. ], tot_loss[loss=0.06679, simple_loss=0.09055, pruned_loss=0.01249, audio_tagging_loss=0.009028, over 3055480.28 frames. ], batch size: 54, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:26:53,582 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.36 vs. limit=15.0 2023-11-26 12:27:06,041 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3388073.3333333335, ans=0.125 2023-11-26 12:27:33,181 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3388273.3333333335, ans=0.07 2023-11-26 12:27:40,476 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 508250 2023-11-26 12:27:44,167 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 3250, loss[loss=0.05673, simple_loss=0.07226, pruned_loss=0.01116, audio_tagging_loss=0.009444, over 14772.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.08962, pruned_loss=0.01234, audio_tagging_loss=0.009148, over 3053029.94 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:27:47,648 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3388340.0, ans=0.0 2023-11-26 12:27:49,893 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3388340.0, ans=0.0 2023-11-26 12:28:04,779 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3388473.3333333335, ans=0.0 2023-11-26 12:28:16,153 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3388540.0, ans=0.0 2023-11-26 12:28:19,827 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3388540.0, ans=0.2 2023-11-26 12:28:25,301 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3388540.0, ans=0.0 2023-11-26 12:28:27,405 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3388540.0, ans=0.125 2023-11-26 12:28:31,783 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3388606.6666666665, ans=0.125 2023-11-26 12:28:33,659 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.209e+01 8.911e+01 9.477e+01 1.008e+02 1.285e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-26 12:28:36,986 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 508300 2023-11-26 12:28:40,128 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 3300, loss[loss=0.05826, simple_loss=0.07217, pruned_loss=0.0109, audio_tagging_loss=0.01128, over 15543.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.08954, pruned_loss=0.01236, audio_tagging_loss=0.009183, over 3051718.58 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:28:44,749 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3388673.3333333335, ans=0.0 2023-11-26 12:28:54,793 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3388740.0, ans=0.1 2023-11-26 12:29:14,245 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.80 vs. limit=15.0 2023-11-26 12:29:23,644 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3388940.0, ans=0.0 2023-11-26 12:29:32,640 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 508350 2023-11-26 12:29:32,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3388940.0, ans=0.09899494936611666 2023-11-26 12:29:35,786 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 3350, loss[loss=0.07669, simple_loss=0.1078, pruned_loss=0.01286, audio_tagging_loss=0.009918, over 13970.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.08933, pruned_loss=0.01241, audio_tagging_loss=0.009128, over 3050824.68 frames. ], batch size: 50, lr: 1.58e-03, grad_scale: 16.0 2023-11-26 12:29:46,231 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3389073.3333333335, ans=0.125 2023-11-26 12:30:20,043 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=3389273.3333333335, ans=0.95 2023-11-26 12:30:25,724 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.370e+01 8.755e+01 9.551e+01 1.033e+02 1.237e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-26 12:30:29,040 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 508400 2023-11-26 12:30:32,999 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 3400, loss[loss=0.05295, simple_loss=0.06766, pruned_loss=0.009806, audio_tagging_loss=0.009311, over 16049.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.08988, pruned_loss=0.01242, audio_tagging_loss=0.008974, over 3047323.90 frames. ], batch size: 60, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:30:37,178 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3389340.0, ans=0.1 2023-11-26 12:30:38,155 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3389340.0, ans=0.2 2023-11-26 12:30:43,373 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3389406.6666666665, ans=0.125 2023-11-26 12:30:48,852 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3389406.6666666665, ans=0.125 2023-11-26 12:31:19,114 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3389606.6666666665, ans=0.125 2023-11-26 12:31:25,709 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 508450 2023-11-26 12:31:28,822 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 3450, loss[loss=0.07681, simple_loss=0.09942, pruned_loss=0.0147, audio_tagging_loss=0.0124, over 15469.00 frames. ], tot_loss[loss=0.0666, simple_loss=0.09052, pruned_loss=0.01243, audio_tagging_loss=0.008909, over 3051600.66 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:31:50,923 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.85 vs. limit=15.0 2023-11-26 12:31:54,737 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3389806.6666666665, ans=0.125 2023-11-26 12:32:03,244 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.87 vs. limit=12.0 2023-11-26 12:32:06,541 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 12:32:18,082 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.080e+01 8.804e+01 9.534e+01 1.051e+02 1.288e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-26 12:32:20,470 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3389940.0, ans=0.125 2023-11-26 12:32:21,348 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 508500 2023-11-26 12:32:25,072 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 3500, loss[loss=0.06727, simple_loss=0.0861, pruned_loss=0.01538, audio_tagging_loss=0.008845, over 14666.00 frames. ], tot_loss[loss=0.0666, simple_loss=0.09055, pruned_loss=0.01251, audio_tagging_loss=0.008819, over 3048411.82 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:32:25,261 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3390006.6666666665, ans=0.125 2023-11-26 12:32:30,658 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3390006.6666666665, ans=0.0 2023-11-26 12:32:32,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3390006.6666666665, ans=0.125 2023-11-26 12:32:53,628 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3390140.0, ans=0.125 2023-11-26 12:32:54,926 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.24 vs. limit=15.0 2023-11-26 12:32:55,557 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 12:33:02,529 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.74 vs. limit=15.0 2023-11-26 12:33:09,069 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3390273.3333333335, ans=0.125 2023-11-26 12:33:15,296 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3390273.3333333335, ans=0.125 2023-11-26 12:33:17,919 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 508550 2023-11-26 12:33:21,104 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 3550, loss[loss=0.08694, simple_loss=0.1236, pruned_loss=0.01816, audio_tagging_loss=0.00697, over 16067.00 frames. ], tot_loss[loss=0.06675, simple_loss=0.09087, pruned_loss=0.01252, audio_tagging_loss=0.008797, over 3048946.12 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:33:45,035 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3390473.3333333335, ans=0.2 2023-11-26 12:34:03,150 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3390540.0, ans=0.125 2023-11-26 12:34:10,932 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.836e+01 8.734e+01 9.266e+01 9.991e+01 1.201e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-26 12:34:14,222 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 508600 2023-11-26 12:34:17,839 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.69 vs. limit=15.0 2023-11-26 12:34:18,205 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 3600, loss[loss=0.06131, simple_loss=0.08523, pruned_loss=0.01147, audio_tagging_loss=0.007233, over 15834.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.09045, pruned_loss=0.0123, audio_tagging_loss=0.00873, over 3043272.87 frames. ], batch size: 60, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 12:34:55,070 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3390873.3333333335, ans=0.125 2023-11-26 12:35:10,367 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 508650 2023-11-26 12:35:13,511 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 3650, loss[loss=0.07693, simple_loss=0.103, pruned_loss=0.01778, audio_tagging_loss=0.007647, over 15607.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.09111, pruned_loss=0.01243, audio_tagging_loss=0.008667, over 3047287.94 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 12:35:21,680 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3391006.6666666665, ans=0.1 2023-11-26 12:35:28,849 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3391073.3333333335, ans=0.125 2023-11-26 12:35:38,844 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3391140.0, ans=0.07 2023-11-26 12:35:40,866 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3391140.0, ans=0.5 2023-11-26 12:35:45,044 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.25 vs. limit=15.0 2023-11-26 12:36:03,217 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.823e+01 8.759e+01 9.340e+01 1.006e+02 1.350e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-26 12:36:06,539 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 508700 2023-11-26 12:36:08,320 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3391273.3333333335, ans=0.125 2023-11-26 12:36:10,109 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 3700, loss[loss=0.06319, simple_loss=0.08783, pruned_loss=0.01223, audio_tagging_loss=0.007045, over 13719.00 frames. ], tot_loss[loss=0.06705, simple_loss=0.09185, pruned_loss=0.01258, audio_tagging_loss=0.008543, over 3053651.31 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 12:36:34,758 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3391473.3333333335, ans=0.1 2023-11-26 12:36:41,822 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3391473.3333333335, ans=0.0 2023-11-26 12:36:59,313 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.65 vs. limit=15.0 2023-11-26 12:37:02,951 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 508750 2023-11-26 12:37:06,113 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 3750, loss[loss=0.05981, simple_loss=0.0796, pruned_loss=0.01454, audio_tagging_loss=0.005462, over 16190.00 frames. ], tot_loss[loss=0.06688, simple_loss=0.09137, pruned_loss=0.01258, audio_tagging_loss=0.008615, over 3052231.14 frames. ], batch size: 62, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 12:37:31,968 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3391806.6666666665, ans=0.125 2023-11-26 12:37:38,798 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3391873.3333333335, ans=0.0 2023-11-26 12:37:42,383 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3391873.3333333335, ans=0.0 2023-11-26 12:37:46,460 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 12:37:47,023 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.26 vs. limit=6.0 2023-11-26 12:37:50,845 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3391940.0, ans=0.0 2023-11-26 12:37:54,804 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.186e+01 8.936e+01 9.506e+01 1.051e+02 1.254e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-26 12:37:56,640 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3391940.0, ans=0.1 2023-11-26 12:37:57,673 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3391940.0, ans=0.125 2023-11-26 12:37:58,565 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 508800 2023-11-26 12:37:58,666 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3391940.0, ans=0.1 2023-11-26 12:38:01,940 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 3800, loss[loss=0.04616, simple_loss=0.05249, pruned_loss=0.006575, audio_tagging_loss=0.01334, over 15859.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.09096, pruned_loss=0.01265, audio_tagging_loss=0.008772, over 3051882.25 frames. ], batch size: 62, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 12:38:54,736 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 508850 2023-11-26 12:38:57,871 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 3850, loss[loss=0.0653, simple_loss=0.08866, pruned_loss=0.01185, audio_tagging_loss=0.009127, over 15593.00 frames. ], tot_loss[loss=0.0663, simple_loss=0.08986, pruned_loss=0.01253, audio_tagging_loss=0.008834, over 3047942.10 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:38:59,697 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3392340.0, ans=0.125 2023-11-26 12:39:08,660 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3392406.6666666665, ans=0.1 2023-11-26 12:39:08,910 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.01 vs. limit=6.0 2023-11-26 12:39:15,017 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3392406.6666666665, ans=0.0 2023-11-26 12:39:18,265 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3392406.6666666665, ans=0.2 2023-11-26 12:39:26,120 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3392473.3333333335, ans=0.125 2023-11-26 12:39:34,699 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.17 vs. limit=15.0 2023-11-26 12:39:41,394 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3392540.0, ans=0.2 2023-11-26 12:39:49,121 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.401e+01 8.737e+01 9.345e+01 1.016e+02 1.351e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-26 12:39:51,321 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 508900 2023-11-26 12:39:54,435 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 3900, loss[loss=0.05859, simple_loss=0.07869, pruned_loss=0.008568, audio_tagging_loss=0.01068, over 15515.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.08997, pruned_loss=0.01252, audio_tagging_loss=0.008848, over 3040360.03 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:39:55,806 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3392673.3333333335, ans=0.2 2023-11-26 12:40:04,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3392740.0, ans=0.0 2023-11-26 12:40:28,578 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.11 vs. limit=6.0 2023-11-26 12:40:36,376 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3392873.3333333335, ans=0.1 2023-11-26 12:40:40,640 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3392940.0, ans=0.125 2023-11-26 12:40:43,843 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3392940.0, ans=0.0 2023-11-26 12:40:44,768 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 12:40:45,917 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3392940.0, ans=0.0 2023-11-26 12:40:46,852 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 508950 2023-11-26 12:40:50,007 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 3950, loss[loss=0.04793, simple_loss=0.06304, pruned_loss=0.008176, audio_tagging_loss=0.008233, over 14052.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.09021, pruned_loss=0.01244, audio_tagging_loss=0.008922, over 3038438.34 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:40:58,944 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.70 vs. limit=15.0 2023-11-26 12:41:17,195 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.42 vs. limit=15.0 2023-11-26 12:41:18,996 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3393140.0, ans=0.125 2023-11-26 12:41:21,110 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3393140.0, ans=0.125 2023-11-26 12:41:23,563 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.05 vs. limit=15.0 2023-11-26 12:41:28,356 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3393206.6666666665, ans=0.125 2023-11-26 12:41:39,100 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.88 vs. limit=15.0 2023-11-26 12:41:40,954 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.644e+01 8.928e+01 9.625e+01 1.027e+02 1.240e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-26 12:41:43,135 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 509000 2023-11-26 12:41:45,660 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3393340.0, ans=0.125 2023-11-26 12:41:46,551 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 4000, loss[loss=0.06166, simple_loss=0.08626, pruned_loss=0.01197, audio_tagging_loss=0.006554, over 15062.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.09036, pruned_loss=0.01243, audio_tagging_loss=0.008956, over 3039506.37 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 12:41:53,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3393340.0, ans=0.125 2023-11-26 12:42:01,695 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3393406.6666666665, ans=0.0 2023-11-26 12:42:04,915 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3393406.6666666665, ans=0.125 2023-11-26 12:42:09,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3393473.3333333335, ans=0.0 2023-11-26 12:42:39,942 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 509050 2023-11-26 12:42:43,137 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 4050, loss[loss=0.08979, simple_loss=0.1251, pruned_loss=0.01756, audio_tagging_loss=0.009669, over 15483.00 frames. ], tot_loss[loss=0.0669, simple_loss=0.09085, pruned_loss=0.01248, audio_tagging_loss=0.008993, over 3045324.86 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 12:42:47,337 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 12:42:55,033 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 12:42:59,331 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3393740.0, ans=0.1 2023-11-26 12:43:01,625 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3393740.0, ans=0.0 2023-11-26 12:43:05,485 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3393806.6666666665, ans=0.0 2023-11-26 12:43:16,531 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.62 vs. limit=15.0 2023-11-26 12:43:22,981 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3393873.3333333335, ans=0.0 2023-11-26 12:43:33,515 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.514e+01 8.904e+01 9.389e+01 9.930e+01 1.705e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-26 12:43:34,810 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3393940.0, ans=0.2 2023-11-26 12:43:35,700 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 509100 2023-11-26 12:43:38,788 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 4100, loss[loss=0.07591, simple_loss=0.11, pruned_loss=0.0123, audio_tagging_loss=0.008593, over 16215.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.09071, pruned_loss=0.01254, audio_tagging_loss=0.008972, over 3048518.94 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 12:43:38,932 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 12:43:45,415 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3394006.6666666665, ans=0.1 2023-11-26 12:43:56,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3394073.3333333335, ans=0.2 2023-11-26 12:44:12,054 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.09 vs. limit=22.5 2023-11-26 12:44:30,926 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 509150 2023-11-26 12:44:34,589 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 4150, loss[loss=0.04834, simple_loss=0.06595, pruned_loss=0.006184, audio_tagging_loss=0.009181, over 15027.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.09063, pruned_loss=0.01232, audio_tagging_loss=0.008848, over 3051446.08 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 12:44:38,977 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3394340.0, ans=0.2 2023-11-26 12:44:58,464 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3394473.3333333335, ans=0.04949747468305833 2023-11-26 12:45:08,039 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3394540.0, ans=0.0 2023-11-26 12:45:15,058 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3394540.0, ans=0.125 2023-11-26 12:45:17,397 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 12:45:25,393 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.166e+01 8.865e+01 9.444e+01 1.016e+02 1.308e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-26 12:45:27,618 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 509200 2023-11-26 12:45:31,531 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 4200, loss[loss=0.06809, simple_loss=0.09153, pruned_loss=0.01576, audio_tagging_loss=0.006565, over 14936.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.0901, pruned_loss=0.01238, audio_tagging_loss=0.008829, over 3051613.86 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 12:45:41,692 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3394740.0, ans=0.125 2023-11-26 12:45:42,935 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.42 vs. limit=10.0 2023-11-26 12:45:58,186 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3394806.6666666665, ans=0.1 2023-11-26 12:46:02,359 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.21 vs. limit=10.0 2023-11-26 12:46:04,996 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3394873.3333333335, ans=0.0 2023-11-26 12:46:23,997 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 509250 2023-11-26 12:46:27,119 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 4250, loss[loss=0.06733, simple_loss=0.09014, pruned_loss=0.01438, audio_tagging_loss=0.007873, over 14814.00 frames. ], tot_loss[loss=0.06688, simple_loss=0.09144, pruned_loss=0.0125, audio_tagging_loss=0.008657, over 3052238.09 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 12:46:33,566 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3395006.6666666665, ans=0.125 2023-11-26 12:46:33,687 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3395006.6666666665, ans=0.125 2023-11-26 12:46:40,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3395073.3333333335, ans=0.125 2023-11-26 12:46:47,361 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3395073.3333333335, ans=0.125 2023-11-26 12:47:11,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3395273.3333333335, ans=0.125 2023-11-26 12:47:18,028 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.812e+01 8.820e+01 9.502e+01 1.020e+02 1.301e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-26 12:47:19,183 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 509300 2023-11-26 12:47:19,686 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.71 vs. limit=15.0 2023-11-26 12:47:22,848 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 4300, loss[loss=0.06349, simple_loss=0.08207, pruned_loss=0.0131, audio_tagging_loss=0.009358, over 15351.00 frames. ], tot_loss[loss=0.06722, simple_loss=0.09203, pruned_loss=0.01263, audio_tagging_loss=0.00857, over 3051506.73 frames. ], batch size: 60, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:47:22,987 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3395340.0, ans=0.125 2023-11-26 12:47:24,056 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3395340.0, ans=0.125 2023-11-26 12:47:26,247 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3395340.0, ans=0.125 2023-11-26 12:47:27,488 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.57 vs. limit=15.0 2023-11-26 12:47:33,816 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3395406.6666666665, ans=0.0 2023-11-26 12:47:33,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3395406.6666666665, ans=0.125 2023-11-26 12:47:34,230 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.13 vs. limit=15.0 2023-11-26 12:47:41,952 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3395406.6666666665, ans=0.125 2023-11-26 12:48:16,101 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 509350 2023-11-26 12:48:19,150 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 4350, loss[loss=0.06824, simple_loss=0.09717, pruned_loss=0.01041, audio_tagging_loss=0.009241, over 15069.00 frames. ], tot_loss[loss=0.06706, simple_loss=0.09198, pruned_loss=0.01252, audio_tagging_loss=0.008549, over 3049306.09 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:48:23,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3395673.3333333335, ans=0.0 2023-11-26 12:48:36,921 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3395740.0, ans=0.125 2023-11-26 12:48:42,661 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.26 vs. limit=22.5 2023-11-26 12:48:54,550 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3395873.3333333335, ans=0.2 2023-11-26 12:49:04,600 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 12:49:06,825 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3395940.0, ans=0.125 2023-11-26 12:49:10,770 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.277e+01 9.017e+01 9.685e+01 1.042e+02 1.339e+02, threshold=1.937e+02, percent-clipped=0.0 2023-11-26 12:49:11,961 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 509400 2023-11-26 12:49:13,177 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3395940.0, ans=0.125 2023-11-26 12:49:15,348 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 4400, loss[loss=0.05901, simple_loss=0.08793, pruned_loss=0.006933, audio_tagging_loss=0.008115, over 15430.00 frames. ], tot_loss[loss=0.06733, simple_loss=0.09252, pruned_loss=0.01253, audio_tagging_loss=0.008538, over 3047822.15 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 12:49:19,782 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 12:49:30,821 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3396073.3333333335, ans=0.125 2023-11-26 12:49:53,078 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.37 vs. limit=12.0 2023-11-26 12:50:07,541 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 509450 2023-11-26 12:50:10,719 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 4450, loss[loss=0.07746, simple_loss=0.1036, pruned_loss=0.01944, audio_tagging_loss=0.006217, over 14728.00 frames. ], tot_loss[loss=0.06765, simple_loss=0.09318, pruned_loss=0.01261, audio_tagging_loss=0.008455, over 3053543.74 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:50:14,980 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.58 vs. limit=15.0 2023-11-26 12:50:15,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3396340.0, ans=0.125 2023-11-26 12:50:35,372 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3396473.3333333335, ans=0.1 2023-11-26 12:51:03,423 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.338e+01 8.973e+01 9.425e+01 1.012e+02 1.226e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-26 12:51:03,525 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 509500 2023-11-26 12:51:07,259 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 4500, loss[loss=0.07588, simple_loss=0.1065, pruned_loss=0.01512, audio_tagging_loss=0.007511, over 15235.00 frames. ], tot_loss[loss=0.06735, simple_loss=0.09249, pruned_loss=0.01263, audio_tagging_loss=0.008471, over 3059102.03 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:51:12,419 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3396673.3333333335, ans=0.125 2023-11-26 12:51:13,462 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3396673.3333333335, ans=0.125 2023-11-26 12:51:13,554 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3396673.3333333335, ans=0.2 2023-11-26 12:51:19,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3396740.0, ans=0.95 2023-11-26 12:51:40,670 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.51 vs. limit=22.5 2023-11-26 12:51:48,464 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 12:51:59,865 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 509550 2023-11-26 12:52:03,033 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 4550, loss[loss=0.06414, simple_loss=0.08384, pruned_loss=0.01381, audio_tagging_loss=0.0084, over 14575.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.09144, pruned_loss=0.01234, audio_tagging_loss=0.00859, over 3057155.67 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:52:10,025 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.37 vs. limit=15.0 2023-11-26 12:52:12,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3397073.3333333335, ans=0.0 2023-11-26 12:52:32,309 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.81 vs. limit=15.0 2023-11-26 12:52:41,812 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3397206.6666666665, ans=0.125 2023-11-26 12:52:45,938 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3397206.6666666665, ans=0.125 2023-11-26 12:52:47,781 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 12:52:55,263 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 509600 2023-11-26 12:52:56,223 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.419e+01 8.693e+01 9.289e+01 1.006e+02 1.287e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-26 12:52:58,668 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 4600, loss[loss=0.05908, simple_loss=0.08031, pruned_loss=0.01153, audio_tagging_loss=0.0074, over 14233.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.09013, pruned_loss=0.0122, audio_tagging_loss=0.008667, over 3058333.85 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 8.0 2023-11-26 12:53:08,178 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.48 vs. limit=15.0 2023-11-26 12:53:12,721 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3397406.6666666665, ans=0.0 2023-11-26 12:53:27,499 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3397473.3333333335, ans=0.1 2023-11-26 12:53:27,743 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.88 vs. limit=15.0 2023-11-26 12:53:36,287 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.07 vs. limit=15.0 2023-11-26 12:53:38,219 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3397540.0, ans=0.0 2023-11-26 12:53:39,264 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3397540.0, ans=0.2 2023-11-26 12:53:51,231 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 509650 2023-11-26 12:53:54,950 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 4650, loss[loss=0.07492, simple_loss=0.1067, pruned_loss=0.01234, audio_tagging_loss=0.009223, over 15669.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.0897, pruned_loss=0.01225, audio_tagging_loss=0.008739, over 3060671.38 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 8.0 2023-11-26 12:53:59,886 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.60 vs. limit=15.0 2023-11-26 12:54:00,627 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3397673.3333333335, ans=0.0 2023-11-26 12:54:16,554 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3397806.6666666665, ans=0.125 2023-11-26 12:54:27,910 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3397873.3333333335, ans=0.125 2023-11-26 12:54:30,121 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 12:54:36,282 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=3397873.3333333335, ans=15.0 2023-11-26 12:54:48,085 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 509700 2023-11-26 12:54:49,071 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.259e+01 8.687e+01 9.484e+01 1.034e+02 1.399e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-26 12:54:51,720 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 4700, loss[loss=0.08323, simple_loss=0.1109, pruned_loss=0.01978, audio_tagging_loss=0.008015, over 15214.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.09019, pruned_loss=0.01219, audio_tagging_loss=0.008782, over 3063123.80 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 8.0 2023-11-26 12:54:51,979 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3398006.6666666665, ans=0.125 2023-11-26 12:55:25,369 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.75 vs. limit=22.5 2023-11-26 12:55:31,804 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 12:55:44,084 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 509750 2023-11-26 12:55:47,200 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 4750, loss[loss=0.05027, simple_loss=0.06682, pruned_loss=0.00835, audio_tagging_loss=0.00851, over 14479.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08945, pruned_loss=0.01213, audio_tagging_loss=0.008893, over 3064429.76 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 8.0 2023-11-26 12:55:51,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3398340.0, ans=0.125 2023-11-26 12:56:10,405 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3398473.3333333335, ans=0.0 2023-11-26 12:56:38,040 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3398606.6666666665, ans=0.1 2023-11-26 12:56:39,967 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 509800 2023-11-26 12:56:40,894 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.582e+01 8.623e+01 9.231e+01 9.879e+01 9.064e+02, threshold=1.846e+02, percent-clipped=2.0 2023-11-26 12:56:43,264 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 4800, loss[loss=0.08413, simple_loss=0.1123, pruned_loss=0.01966, audio_tagging_loss=0.008312, over 15077.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08879, pruned_loss=0.01206, audio_tagging_loss=0.00902, over 3061918.31 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:56:49,390 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3398673.3333333335, ans=0.125 2023-11-26 12:57:06,117 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3398806.6666666665, ans=0.125 2023-11-26 12:57:11,930 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.64 vs. limit=22.5 2023-11-26 12:57:29,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3398940.0, ans=0.0 2023-11-26 12:57:35,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3398940.0, ans=0.2 2023-11-26 12:57:36,383 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 509850 2023-11-26 12:57:39,520 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 4850, loss[loss=0.07694, simple_loss=0.1036, pruned_loss=0.01551, audio_tagging_loss=0.009649, over 15079.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08836, pruned_loss=0.01205, audio_tagging_loss=0.009145, over 3049670.58 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:57:48,887 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=3399006.6666666665, ans=15.0 2023-11-26 12:57:54,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3399073.3333333335, ans=0.0 2023-11-26 12:58:00,513 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3399140.0, ans=0.125 2023-11-26 12:58:06,749 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.70 vs. limit=15.0 2023-11-26 12:58:31,573 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 509900 2023-11-26 12:58:33,028 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.451e+01 8.785e+01 9.504e+01 1.038e+02 1.484e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-26 12:58:35,206 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 4900, loss[loss=0.07901, simple_loss=0.1098, pruned_loss=0.01855, audio_tagging_loss=0.005563, over 14233.00 frames. ], tot_loss[loss=0.066, simple_loss=0.08934, pruned_loss=0.01226, audio_tagging_loss=0.009061, over 3049758.34 frames. ], batch size: 52, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:58:38,628 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3399340.0, ans=0.125 2023-11-26 12:58:40,963 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=3399340.0, ans=15.0 2023-11-26 12:58:53,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3399406.6666666665, ans=0.0 2023-11-26 12:59:17,845 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3399540.0, ans=0.0 2023-11-26 12:59:18,055 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.86 vs. limit=15.0 2023-11-26 12:59:21,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3399606.6666666665, ans=0.0 2023-11-26 12:59:25,655 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.33 vs. limit=15.0 2023-11-26 12:59:27,654 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 509950 2023-11-26 12:59:30,723 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 4950, loss[loss=0.06424, simple_loss=0.08619, pruned_loss=0.01241, audio_tagging_loss=0.008731, over 14899.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.08972, pruned_loss=0.01223, audio_tagging_loss=0.008905, over 3046973.37 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 12:59:42,479 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3399740.0, ans=0.0 2023-11-26 12:59:55,314 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3399806.6666666665, ans=0.125 2023-11-26 12:59:58,786 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.95 vs. limit=15.0 2023-11-26 13:00:02,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3399806.6666666665, ans=0.125 2023-11-26 13:00:04,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3399873.3333333335, ans=0.125 2023-11-26 13:00:05,271 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3399873.3333333335, ans=0.1 2023-11-26 13:00:12,217 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3399873.3333333335, ans=0.125 2023-11-26 13:00:23,369 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 510000 2023-11-26 13:00:24,319 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.267e+01 8.558e+01 9.135e+01 1.006e+02 1.501e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-26 13:00:27,001 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 5000, loss[loss=0.05392, simple_loss=0.07058, pruned_loss=0.009822, audio_tagging_loss=0.008812, over 14581.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.08983, pruned_loss=0.01233, audio_tagging_loss=0.008853, over 3038196.52 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:00:30,327 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3400006.6666666665, ans=0.0 2023-11-26 13:00:42,146 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.72 vs. limit=10.0 2023-11-26 13:00:46,685 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3400073.3333333335, ans=0.2 2023-11-26 13:01:14,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3400273.3333333335, ans=0.1 2023-11-26 13:01:18,670 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 510050 2023-11-26 13:01:21,752 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 5050, loss[loss=0.06059, simple_loss=0.08823, pruned_loss=0.01002, audio_tagging_loss=0.006454, over 14989.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.08958, pruned_loss=0.01231, audio_tagging_loss=0.008779, over 3039776.90 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:01:51,457 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3400473.3333333335, ans=0.125 2023-11-26 13:01:59,494 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.48 vs. limit=5.0 2023-11-26 13:02:11,520 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3400606.6666666665, ans=0.0 2023-11-26 13:02:14,009 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 510100 2023-11-26 13:02:14,970 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.216e+01 8.485e+01 9.179e+01 9.722e+01 1.214e+02, threshold=1.836e+02, percent-clipped=0.0 2023-11-26 13:02:17,649 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 5100, loss[loss=0.09229, simple_loss=0.1168, pruned_loss=0.02275, audio_tagging_loss=0.01116, over 14933.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08912, pruned_loss=0.01219, audio_tagging_loss=0.008738, over 3043927.00 frames. ], batch size: 52, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:02:18,853 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3400673.3333333335, ans=0.0 2023-11-26 13:02:33,400 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.83 vs. limit=10.0 2023-11-26 13:02:46,578 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3400806.6666666665, ans=0.0 2023-11-26 13:02:55,736 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3400873.3333333335, ans=0.1 2023-11-26 13:03:02,786 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.21 vs. limit=15.0 2023-11-26 13:03:10,330 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 510150 2023-11-26 13:03:11,471 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3400940.0, ans=0.125 2023-11-26 13:03:13,912 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 5150, loss[loss=0.08147, simple_loss=0.1199, pruned_loss=0.0149, audio_tagging_loss=0.006616, over 14508.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08911, pruned_loss=0.01219, audio_tagging_loss=0.008792, over 3042668.35 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:03:31,066 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3401073.3333333335, ans=0.2 2023-11-26 13:03:34,351 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3401140.0, ans=0.125 2023-11-26 13:03:44,193 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.52 vs. limit=22.5 2023-11-26 13:03:59,973 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3401273.3333333335, ans=0.2 2023-11-26 13:04:06,043 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 510200 2023-11-26 13:04:06,966 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.574e+01 8.866e+01 9.575e+01 1.024e+02 1.389e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-26 13:04:09,401 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 5200, loss[loss=0.04963, simple_loss=0.06718, pruned_loss=0.008892, audio_tagging_loss=0.007148, over 14909.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08952, pruned_loss=0.0123, audio_tagging_loss=0.008729, over 3041945.69 frames. ], batch size: 61, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:04:11,870 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3401340.0, ans=0.125 2023-11-26 13:04:16,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3401340.0, ans=0.125 2023-11-26 13:04:29,448 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3401406.6666666665, ans=0.2 2023-11-26 13:04:32,119 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3401473.3333333335, ans=0.2 2023-11-26 13:04:35,249 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3401473.3333333335, ans=0.125 2023-11-26 13:04:36,597 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.36 vs. limit=15.0 2023-11-26 13:04:43,433 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3401540.0, ans=0.125 2023-11-26 13:05:01,179 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 510250 2023-11-26 13:05:04,286 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 5250, loss[loss=0.07231, simple_loss=0.1072, pruned_loss=0.01163, audio_tagging_loss=0.007092, over 14450.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08974, pruned_loss=0.01227, audio_tagging_loss=0.008632, over 3039962.82 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:05:23,960 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.37 vs. limit=15.0 2023-11-26 13:05:24,094 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.96 vs. limit=15.0 2023-11-26 13:05:45,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3401873.3333333335, ans=0.0 2023-11-26 13:05:45,457 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=3401873.3333333335, ans=15.0 2023-11-26 13:05:58,266 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 510300 2023-11-26 13:06:01,316 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.575e+01 8.828e+01 9.543e+01 1.025e+02 1.295e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-26 13:06:01,343 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 5300, loss[loss=0.04557, simple_loss=0.05552, pruned_loss=0.006911, audio_tagging_loss=0.0109, over 13950.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.09036, pruned_loss=0.01241, audio_tagging_loss=0.008736, over 3040415.77 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 8.0 2023-11-26 13:06:19,429 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.36 vs. limit=12.0 2023-11-26 13:06:21,296 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3402073.3333333335, ans=0.0 2023-11-26 13:06:25,537 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3402140.0, ans=0.0 2023-11-26 13:06:28,685 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3402140.0, ans=0.1 2023-11-26 13:06:28,794 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3402140.0, ans=0.07 2023-11-26 13:06:39,996 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.53 vs. limit=15.0 2023-11-26 13:06:47,171 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3402273.3333333335, ans=0.125 2023-11-26 13:06:54,124 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 510350 2023-11-26 13:06:57,228 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 5350, loss[loss=0.05852, simple_loss=0.07976, pruned_loss=0.01088, audio_tagging_loss=0.007761, over 14804.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.0905, pruned_loss=0.01244, audio_tagging_loss=0.00869, over 3038019.85 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 8.0 2023-11-26 13:06:58,490 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 13:07:11,772 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.33 vs. limit=15.0 2023-11-26 13:07:37,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3402540.0, ans=0.2 2023-11-26 13:07:40,206 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.62 vs. limit=6.0 2023-11-26 13:07:49,197 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 510400 2023-11-26 13:07:52,616 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.116e+01 8.825e+01 9.466e+01 1.015e+02 1.457e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 13:07:52,642 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 5400, loss[loss=0.06611, simple_loss=0.09192, pruned_loss=0.01491, audio_tagging_loss=0.005242, over 14189.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.09075, pruned_loss=0.01258, audio_tagging_loss=0.008735, over 3041869.45 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 8.0 2023-11-26 13:08:02,191 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.35 vs. limit=22.5 2023-11-26 13:08:10,432 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3402740.0, ans=0.0 2023-11-26 13:08:15,233 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3402806.6666666665, ans=0.1 2023-11-26 13:08:17,376 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3402806.6666666665, ans=0.125 2023-11-26 13:08:26,902 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3402873.3333333335, ans=0.1 2023-11-26 13:08:34,540 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3402873.3333333335, ans=0.0 2023-11-26 13:08:45,469 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 510450 2023-11-26 13:08:49,164 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 5450, loss[loss=0.04142, simple_loss=0.04735, pruned_loss=0.008161, audio_tagging_loss=0.009581, over 14410.00 frames. ], tot_loss[loss=0.06699, simple_loss=0.09117, pruned_loss=0.01273, audio_tagging_loss=0.008679, over 3041639.90 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 8.0 2023-11-26 13:09:07,174 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.17 vs. limit=5.0 2023-11-26 13:09:14,935 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3403140.0, ans=0.0 2023-11-26 13:09:36,951 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3403273.3333333335, ans=0.0 2023-11-26 13:09:41,530 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 510500 2023-11-26 13:09:44,642 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.143e+01 8.724e+01 9.189e+01 1.004e+02 1.414e+02, threshold=1.838e+02, percent-clipped=0.0 2023-11-26 13:09:44,670 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 5500, loss[loss=0.06605, simple_loss=0.09205, pruned_loss=0.01103, audio_tagging_loss=0.009003, over 15460.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.09002, pruned_loss=0.01255, audio_tagging_loss=0.008882, over 3034770.29 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 8.0 2023-11-26 13:09:52,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3403340.0, ans=0.125 2023-11-26 13:09:58,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3403406.6666666665, ans=0.125 2023-11-26 13:10:03,443 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.49 vs. limit=12.0 2023-11-26 13:10:11,975 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3403473.3333333335, ans=0.2 2023-11-26 13:10:26,783 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3403540.0, ans=0.2 2023-11-26 13:10:30,442 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.04 vs. limit=15.0 2023-11-26 13:10:37,157 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 510550 2023-11-26 13:10:40,237 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 5550, loss[loss=0.0526, simple_loss=0.065, pruned_loss=0.008215, audio_tagging_loss=0.01189, over 15825.00 frames. ], tot_loss[loss=0.06673, simple_loss=0.09041, pruned_loss=0.01258, audio_tagging_loss=0.008944, over 3034399.37 frames. ], batch size: 62, lr: 1.57e-03, grad_scale: 8.0 2023-11-26 13:10:48,473 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3403673.3333333335, ans=0.0 2023-11-26 13:10:56,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3403740.0, ans=0.125 2023-11-26 13:11:10,341 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3403806.6666666665, ans=0.2 2023-11-26 13:11:17,910 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3403873.3333333335, ans=0.0 2023-11-26 13:11:24,224 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 13:11:25,291 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3403940.0, ans=0.1 2023-11-26 13:11:33,009 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 510600 2023-11-26 13:11:35,535 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3404006.6666666665, ans=0.035 2023-11-26 13:11:36,393 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.069e+01 8.941e+01 9.582e+01 1.033e+02 2.288e+02, threshold=1.916e+02, percent-clipped=1.0 2023-11-26 13:11:36,424 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 5600, loss[loss=0.06214, simple_loss=0.08863, pruned_loss=0.009408, audio_tagging_loss=0.008417, over 15456.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.0899, pruned_loss=0.01238, audio_tagging_loss=0.009072, over 3040472.99 frames. ], batch size: 60, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:11:39,334 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3404006.6666666665, ans=0.125 2023-11-26 13:11:44,170 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3404006.6666666665, ans=0.125 2023-11-26 13:11:44,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3404006.6666666665, ans=0.125 2023-11-26 13:11:47,974 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3404073.3333333335, ans=0.2 2023-11-26 13:11:56,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3404073.3333333335, ans=0.0 2023-11-26 13:12:03,932 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3404140.0, ans=10.0 2023-11-26 13:12:17,097 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3404206.6666666665, ans=0.035 2023-11-26 13:12:17,506 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.55 vs. limit=22.5 2023-11-26 13:12:18,030 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 13:12:30,144 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 510650 2023-11-26 13:12:33,244 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 5650, loss[loss=0.05733, simple_loss=0.07112, pruned_loss=0.008317, audio_tagging_loss=0.01345, over 15202.00 frames. ], tot_loss[loss=0.06676, simple_loss=0.09036, pruned_loss=0.01251, audio_tagging_loss=0.009069, over 3047822.82 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:12:41,002 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3404340.0, ans=0.125 2023-11-26 13:12:44,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3404406.6666666665, ans=0.0 2023-11-26 13:12:45,124 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3404406.6666666665, ans=0.0 2023-11-26 13:12:59,499 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3404473.3333333335, ans=0.1 2023-11-26 13:13:02,372 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3404473.3333333335, ans=0.0 2023-11-26 13:13:03,582 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.47 vs. limit=22.5 2023-11-26 13:13:16,086 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3404540.0, ans=0.04949747468305833 2023-11-26 13:13:17,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3404606.6666666665, ans=0.09899494936611666 2023-11-26 13:13:24,485 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3404606.6666666665, ans=0.0 2023-11-26 13:13:25,397 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 510700 2023-11-26 13:13:28,512 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.526e+01 8.672e+01 9.212e+01 9.928e+01 1.414e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-26 13:13:28,552 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 5700, loss[loss=0.07411, simple_loss=0.1064, pruned_loss=0.01305, audio_tagging_loss=0.007877, over 16112.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.08951, pruned_loss=0.01226, audio_tagging_loss=0.009114, over 3052756.46 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:13:35,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3404673.3333333335, ans=0.1 2023-11-26 13:13:42,080 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3404740.0, ans=0.125 2023-11-26 13:13:44,136 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3404740.0, ans=0.0 2023-11-26 13:13:47,024 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3404740.0, ans=0.95 2023-11-26 13:13:51,543 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.93 vs. limit=6.0 2023-11-26 13:14:05,797 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.60 vs. limit=5.0 2023-11-26 13:14:17,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3404940.0, ans=0.0 2023-11-26 13:14:17,771 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3404940.0, ans=0.125 2023-11-26 13:14:21,359 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 510750 2023-11-26 13:14:24,488 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 5750, loss[loss=0.07707, simple_loss=0.106, pruned_loss=0.01776, audio_tagging_loss=0.006297, over 15676.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08904, pruned_loss=0.0122, audio_tagging_loss=0.008994, over 3048282.76 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:14:26,217 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.23 vs. limit=22.5 2023-11-26 13:14:27,534 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3405006.6666666665, ans=0.125 2023-11-26 13:14:37,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3405073.3333333335, ans=0.125 2023-11-26 13:14:41,637 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.84 vs. limit=10.0 2023-11-26 13:14:43,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3405073.3333333335, ans=0.0 2023-11-26 13:15:02,299 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.84 vs. limit=10.0 2023-11-26 13:15:06,278 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3405206.6666666665, ans=0.125 2023-11-26 13:15:17,195 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 510800 2023-11-26 13:15:20,820 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.344e+01 8.469e+01 9.302e+01 1.019e+02 1.569e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-26 13:15:20,848 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 5800, loss[loss=0.06672, simple_loss=0.09758, pruned_loss=0.01109, audio_tagging_loss=0.006839, over 14671.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.08948, pruned_loss=0.01232, audio_tagging_loss=0.008812, over 3047948.38 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:15:35,304 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 13:15:56,465 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3405540.0, ans=0.0 2023-11-26 13:16:10,269 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3405606.6666666665, ans=0.1 2023-11-26 13:16:13,308 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 510850 2023-11-26 13:16:16,492 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 5850, loss[loss=0.0657, simple_loss=0.08386, pruned_loss=0.01296, audio_tagging_loss=0.01082, over 14703.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.08973, pruned_loss=0.01235, audio_tagging_loss=0.008668, over 3043626.96 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:16:27,960 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.94 vs. limit=15.0 2023-11-26 13:16:49,057 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.17 vs. limit=15.0 2023-11-26 13:17:00,881 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3405940.0, ans=0.1 2023-11-26 13:17:08,189 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 510900 2023-11-26 13:17:11,767 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.196e+01 8.774e+01 9.383e+01 1.009e+02 2.236e+02, threshold=1.877e+02, percent-clipped=1.0 2023-11-26 13:17:11,796 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 5900, loss[loss=0.07314, simple_loss=0.1025, pruned_loss=0.01342, audio_tagging_loss=0.008456, over 15716.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08948, pruned_loss=0.01227, audio_tagging_loss=0.008655, over 3048050.46 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:17:20,892 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3406006.6666666665, ans=0.0 2023-11-26 13:17:25,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3406073.3333333335, ans=0.0 2023-11-26 13:18:02,173 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3406273.3333333335, ans=0.2 2023-11-26 13:18:04,172 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 510950 2023-11-26 13:18:07,246 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 5950, loss[loss=0.06492, simple_loss=0.08985, pruned_loss=0.01185, audio_tagging_loss=0.00815, over 15473.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08935, pruned_loss=0.01217, audio_tagging_loss=0.008618, over 3051001.03 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:18:15,525 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3406340.0, ans=0.1 2023-11-26 13:18:43,813 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.73 vs. limit=15.0 2023-11-26 13:18:59,055 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.39 vs. limit=15.0 2023-11-26 13:18:59,789 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 511000 2023-11-26 13:19:03,704 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.880e+01 8.762e+01 9.206e+01 9.891e+01 1.298e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-26 13:19:03,730 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 6000, loss[loss=0.06179, simple_loss=0.08035, pruned_loss=0.01224, audio_tagging_loss=0.009381, over 14988.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08947, pruned_loss=0.01231, audio_tagging_loss=0.008596, over 3052406.34 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:19:03,730 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-26 13:19:26,140 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.3381, 4.3018, 4.4981, 4.4820], device='cuda:3') 2023-11-26 13:19:36,328 INFO [train_asr.py:1267] (3/4) Epoch 43, validation: loss=0.05784, simple_loss=0.05057, pruned_loss=0.005191, audio_tagging_loss=0.02736, over 4681554.00 frames. 2023-11-26 13:19:36,328 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-26 13:19:43,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3406673.3333333335, ans=0.2 2023-11-26 13:19:46,607 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3406740.0, ans=0.125 2023-11-26 13:19:56,690 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3406740.0, ans=0.1 2023-11-26 13:20:06,755 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3406806.6666666665, ans=0.0 2023-11-26 13:20:11,108 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.15 vs. limit=12.0 2023-11-26 13:20:17,566 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 13:20:17,835 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3406873.3333333335, ans=0.04949747468305833 2023-11-26 13:20:24,659 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3406940.0, ans=0.0 2023-11-26 13:20:28,756 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 511050 2023-11-26 13:20:30,540 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3406940.0, ans=0.125 2023-11-26 13:20:32,336 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 6050, loss[loss=0.06668, simple_loss=0.08949, pruned_loss=0.01281, audio_tagging_loss=0.00913, over 14460.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.08958, pruned_loss=0.01244, audio_tagging_loss=0.008644, over 3050346.45 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:20:35,724 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3407006.6666666665, ans=0.2 2023-11-26 13:20:36,866 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3407006.6666666665, ans=0.125 2023-11-26 13:21:01,409 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3407140.0, ans=0.125 2023-11-26 13:21:23,762 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 511100 2023-11-26 13:21:23,876 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3407273.3333333335, ans=0.2 2023-11-26 13:21:27,482 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 6100, loss[loss=0.082, simple_loss=0.1145, pruned_loss=0.01689, audio_tagging_loss=0.007879, over 15882.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.09046, pruned_loss=0.01261, audio_tagging_loss=0.008522, over 3059768.22 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:21:28,499 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.895e+01 8.831e+01 9.526e+01 1.012e+02 1.265e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-26 13:21:28,747 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3407340.0, ans=0.1 2023-11-26 13:21:29,044 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.63 vs. limit=15.0 2023-11-26 13:21:35,019 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3407340.0, ans=0.2 2023-11-26 13:22:19,333 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 511150 2023-11-26 13:22:23,046 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 6150, loss[loss=0.07713, simple_loss=0.1108, pruned_loss=0.01409, audio_tagging_loss=0.007637, over 15558.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08948, pruned_loss=0.01239, audio_tagging_loss=0.008532, over 3054525.66 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:22:24,797 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2023-11-26 13:22:36,946 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3407740.0, ans=0.1 2023-11-26 13:22:40,212 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3407740.0, ans=0.0 2023-11-26 13:22:48,582 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3407806.6666666665, ans=0.125 2023-11-26 13:23:01,801 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3407873.3333333335, ans=0.1 2023-11-26 13:23:15,419 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 511200 2023-11-26 13:23:17,979 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=3408006.6666666665, ans=0.05 2023-11-26 13:23:18,031 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3408006.6666666665, ans=0.125 2023-11-26 13:23:19,317 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 6200, loss[loss=0.05778, simple_loss=0.07702, pruned_loss=0.009267, audio_tagging_loss=0.009996, over 17400.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.0888, pruned_loss=0.01232, audio_tagging_loss=0.008757, over 3053645.74 frames. ], batch size: 66, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:23:19,467 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3408006.6666666665, ans=0.2 2023-11-26 13:23:20,343 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.166e+01 8.535e+01 9.196e+01 1.004e+02 1.259e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-26 13:24:06,411 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3408273.3333333335, ans=0.1 2023-11-26 13:24:08,591 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3408273.3333333335, ans=0.1 2023-11-26 13:24:11,504 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 511250 2023-11-26 13:24:14,670 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 6250, loss[loss=0.07168, simple_loss=0.1014, pruned_loss=0.01383, audio_tagging_loss=0.007176, over 14406.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08872, pruned_loss=0.01213, audio_tagging_loss=0.008871, over 3048187.79 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:24:14,920 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3408340.0, ans=0.125 2023-11-26 13:24:15,990 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3408340.0, ans=0.1 2023-11-26 13:24:18,187 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3408340.0, ans=0.125 2023-11-26 13:24:24,167 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3408340.0, ans=0.1 2023-11-26 13:24:31,693 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.01 vs. limit=12.0 2023-11-26 13:24:42,104 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3408473.3333333335, ans=0.1 2023-11-26 13:25:02,851 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3408606.6666666665, ans=0.0 2023-11-26 13:25:07,720 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 511300 2023-11-26 13:25:10,842 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 6300, loss[loss=0.06308, simple_loss=0.08304, pruned_loss=0.01224, audio_tagging_loss=0.009323, over 14388.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08859, pruned_loss=0.01219, audio_tagging_loss=0.009046, over 3056551.82 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:25:12,444 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.768e+01 8.818e+01 9.508e+01 1.037e+02 1.214e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-26 13:25:12,662 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3408673.3333333335, ans=0.125 2023-11-26 13:25:27,452 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.05 vs. limit=10.0 2023-11-26 13:25:30,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3408740.0, ans=0.125 2023-11-26 13:25:46,740 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3408873.3333333335, ans=0.0 2023-11-26 13:25:50,726 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.29 vs. limit=12.0 2023-11-26 13:26:04,127 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 511350 2023-11-26 13:26:05,352 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3408940.0, ans=0.2 2023-11-26 13:26:07,269 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 6350, loss[loss=0.07366, simple_loss=0.1013, pruned_loss=0.01546, audio_tagging_loss=0.007569, over 15754.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08881, pruned_loss=0.01221, audio_tagging_loss=0.009057, over 3047722.23 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:26:09,674 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3409006.6666666665, ans=0.125 2023-11-26 13:26:25,062 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3409073.3333333335, ans=0.0 2023-11-26 13:26:41,395 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.99 vs. limit=10.0 2023-11-26 13:26:55,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3409273.3333333335, ans=0.09899494936611666 2023-11-26 13:26:59,927 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 511400 2023-11-26 13:27:03,272 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 6400, loss[loss=0.07372, simple_loss=0.1024, pruned_loss=0.01556, audio_tagging_loss=0.006947, over 15850.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08818, pruned_loss=0.01217, audio_tagging_loss=0.009134, over 3045239.98 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:27:04,273 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.869e+01 8.943e+01 9.499e+01 1.031e+02 1.393e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-26 13:27:16,220 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3409406.6666666665, ans=0.2 2023-11-26 13:27:33,171 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.19 vs. limit=12.0 2023-11-26 13:27:52,400 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3409606.6666666665, ans=0.125 2023-11-26 13:27:55,513 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 511450 2023-11-26 13:27:58,092 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.24 vs. limit=12.0 2023-11-26 13:27:58,636 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 6450, loss[loss=0.05682, simple_loss=0.07671, pruned_loss=0.01073, audio_tagging_loss=0.007731, over 16353.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08809, pruned_loss=0.01219, audio_tagging_loss=0.009188, over 3034073.01 frames. ], batch size: 66, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:28:00,455 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.09 vs. limit=22.5 2023-11-26 13:28:51,668 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 511500 2023-11-26 13:28:54,734 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 6500, loss[loss=0.07418, simple_loss=0.1085, pruned_loss=0.01256, audio_tagging_loss=0.007354, over 16094.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08827, pruned_loss=0.01217, audio_tagging_loss=0.009155, over 3041591.19 frames. ], batch size: 65, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:28:55,798 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.162e+01 8.812e+01 9.257e+01 9.962e+01 1.246e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-26 13:29:03,035 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3410006.6666666665, ans=0.5 2023-11-26 13:29:29,719 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3410206.6666666665, ans=0.0 2023-11-26 13:29:47,562 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 511550 2023-11-26 13:29:50,641 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 6550, loss[loss=0.06164, simple_loss=0.08049, pruned_loss=0.01366, audio_tagging_loss=0.007735, over 15105.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.08921, pruned_loss=0.01236, audio_tagging_loss=0.009003, over 3041885.37 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:29:52,016 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3410340.0, ans=0.1 2023-11-26 13:29:59,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3410340.0, ans=0.125 2023-11-26 13:30:02,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3410406.6666666665, ans=0.2 2023-11-26 13:30:05,739 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3410406.6666666665, ans=0.125 2023-11-26 13:30:09,819 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.46 vs. limit=22.5 2023-11-26 13:30:23,392 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3410540.0, ans=0.05 2023-11-26 13:30:38,846 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3410606.6666666665, ans=0.2 2023-11-26 13:30:38,854 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3410606.6666666665, ans=0.2 2023-11-26 13:30:42,992 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 511600 2023-11-26 13:30:46,410 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 6600, loss[loss=0.06516, simple_loss=0.087, pruned_loss=0.012, audio_tagging_loss=0.009667, over 14821.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.08932, pruned_loss=0.0124, audio_tagging_loss=0.008862, over 3045970.66 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:30:47,485 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.564e+01 8.677e+01 9.398e+01 1.026e+02 1.405e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-26 13:31:00,034 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3410740.0, ans=0.2 2023-11-26 13:31:10,144 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3410806.6666666665, ans=0.95 2023-11-26 13:31:10,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3410806.6666666665, ans=0.0 2023-11-26 13:31:33,372 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3410940.0, ans=0.125 2023-11-26 13:31:39,613 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 511650 2023-11-26 13:31:42,721 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 6650, loss[loss=0.0514, simple_loss=0.07, pruned_loss=0.007895, audio_tagging_loss=0.008505, over 14903.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.08949, pruned_loss=0.01232, audio_tagging_loss=0.00876, over 3045133.50 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:31:45,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3411006.6666666665, ans=0.1 2023-11-26 13:31:45,114 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3411006.6666666665, ans=0.125 2023-11-26 13:32:00,502 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3411073.3333333335, ans=0.0 2023-11-26 13:32:06,982 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3411140.0, ans=0.125 2023-11-26 13:32:18,164 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3411206.6666666665, ans=0.0 2023-11-26 13:32:26,660 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 13:32:35,537 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 511700 2023-11-26 13:32:38,679 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 6700, loss[loss=0.04597, simple_loss=0.05522, pruned_loss=0.008854, audio_tagging_loss=0.009503, over 15202.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08921, pruned_loss=0.01209, audio_tagging_loss=0.008627, over 3046886.22 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:32:40,756 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.817e+01 8.754e+01 9.381e+01 1.004e+02 1.497e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-26 13:32:41,051 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3411340.0, ans=0.2 2023-11-26 13:32:51,188 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.87 vs. limit=15.0 2023-11-26 13:32:56,119 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3411406.6666666665, ans=0.125 2023-11-26 13:33:03,173 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3411473.3333333335, ans=0.125 2023-11-26 13:33:08,837 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3411473.3333333335, ans=0.125 2023-11-26 13:33:20,843 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3411540.0, ans=0.1 2023-11-26 13:33:31,088 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 511750 2023-11-26 13:33:34,181 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 6750, loss[loss=0.06839, simple_loss=0.09566, pruned_loss=0.01196, audio_tagging_loss=0.008592, over 16073.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08896, pruned_loss=0.01207, audio_tagging_loss=0.008737, over 3041407.63 frames. ], batch size: 60, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:33:35,447 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3411673.3333333335, ans=0.125 2023-11-26 13:33:50,632 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.22 vs. limit=15.0 2023-11-26 13:34:15,923 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.68 vs. limit=6.0 2023-11-26 13:34:26,692 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 511800 2023-11-26 13:34:27,861 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3411940.0, ans=0.0 2023-11-26 13:34:30,027 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 6800, loss[loss=0.04878, simple_loss=0.05935, pruned_loss=0.006832, audio_tagging_loss=0.01227, over 15663.00 frames. ], tot_loss[loss=0.06456, simple_loss=0.08758, pruned_loss=0.01202, audio_tagging_loss=0.008746, over 3033798.50 frames. ], batch size: 64, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:34:32,716 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.165e+01 8.841e+01 9.365e+01 1.006e+02 1.409e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-26 13:34:34,141 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3412006.6666666665, ans=0.125 2023-11-26 13:34:43,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3412073.3333333335, ans=0.125 2023-11-26 13:34:46,831 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3412073.3333333335, ans=0.0 2023-11-26 13:34:50,317 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.70 vs. limit=22.5 2023-11-26 13:34:58,938 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.93 vs. limit=12.0 2023-11-26 13:35:00,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3412140.0, ans=0.125 2023-11-26 13:35:08,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3412206.6666666665, ans=0.125 2023-11-26 13:35:12,001 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3412206.6666666665, ans=0.0 2023-11-26 13:35:24,234 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 511850 2023-11-26 13:35:27,373 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 6850, loss[loss=0.06728, simple_loss=0.08601, pruned_loss=0.01408, audio_tagging_loss=0.0102, over 15446.00 frames. ], tot_loss[loss=0.0644, simple_loss=0.08731, pruned_loss=0.01196, audio_tagging_loss=0.00879, over 3039166.98 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:35:42,412 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3412406.6666666665, ans=0.125 2023-11-26 13:35:43,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3412406.6666666665, ans=0.125 2023-11-26 13:35:54,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3412473.3333333335, ans=0.125 2023-11-26 13:36:05,689 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.05 vs. limit=15.0 2023-11-26 13:36:19,388 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 511900 2023-11-26 13:36:22,523 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 6900, loss[loss=0.05639, simple_loss=0.08073, pruned_loss=0.007707, audio_tagging_loss=0.008313, over 15386.00 frames. ], tot_loss[loss=0.06455, simple_loss=0.08787, pruned_loss=0.01189, audio_tagging_loss=0.008723, over 3034044.42 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:36:24,649 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.426e+01 8.611e+01 9.198e+01 9.954e+01 1.491e+02, threshold=1.840e+02, percent-clipped=0.0 2023-11-26 13:36:32,937 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3412740.0, ans=0.125 2023-11-26 13:36:36,207 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3412740.0, ans=0.125 2023-11-26 13:36:40,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3412740.0, ans=0.0 2023-11-26 13:36:43,276 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3412740.0, ans=0.0 2023-11-26 13:36:56,430 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.59 vs. limit=22.5 2023-11-26 13:37:08,634 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 13:37:15,715 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 511950 2023-11-26 13:37:18,932 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 6950, loss[loss=0.07593, simple_loss=0.1085, pruned_loss=0.01449, audio_tagging_loss=0.007193, over 14753.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08927, pruned_loss=0.01218, audio_tagging_loss=0.008617, over 3037124.28 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:37:29,529 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3413073.3333333335, ans=0.0 2023-11-26 13:37:30,448 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3413073.3333333335, ans=0.1 2023-11-26 13:37:30,889 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.99 vs. limit=10.0 2023-11-26 13:37:42,257 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3413140.0, ans=0.125 2023-11-26 13:37:44,351 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3413140.0, ans=0.125 2023-11-26 13:37:48,482 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3413140.0, ans=0.125 2023-11-26 13:37:48,712 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3413140.0, ans=0.1 2023-11-26 13:38:02,133 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.03 vs. limit=15.0 2023-11-26 13:38:11,802 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 512000 2023-11-26 13:38:17,718 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 7000, loss[loss=0.0602, simple_loss=0.07752, pruned_loss=0.01181, audio_tagging_loss=0.009622, over 14432.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08909, pruned_loss=0.01221, audio_tagging_loss=0.008681, over 3039893.66 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:38:20,426 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.849e+01 8.732e+01 9.495e+01 1.005e+02 2.082e+02, threshold=1.899e+02, percent-clipped=1.0 2023-11-26 13:38:26,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3413340.0, ans=0.09899494936611666 2023-11-26 13:38:35,601 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3413406.6666666665, ans=0.125 2023-11-26 13:38:44,548 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3413473.3333333335, ans=0.125 2023-11-26 13:38:47,770 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3413473.3333333335, ans=0.125 2023-11-26 13:39:09,719 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3413606.6666666665, ans=0.0 2023-11-26 13:39:10,702 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 512050 2023-11-26 13:39:10,882 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3413606.6666666665, ans=0.0 2023-11-26 13:39:11,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3413606.6666666665, ans=0.125 2023-11-26 13:39:13,843 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 7050, loss[loss=0.08177, simple_loss=0.1174, pruned_loss=0.01561, audio_tagging_loss=0.007471, over 15588.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.08868, pruned_loss=0.01218, audio_tagging_loss=0.008855, over 3037811.36 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:39:18,772 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.52 vs. limit=10.0 2023-11-26 13:39:21,508 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3413673.3333333335, ans=0.1 2023-11-26 13:39:43,455 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3413806.6666666665, ans=0.125 2023-11-26 13:39:54,676 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3413873.3333333335, ans=0.125 2023-11-26 13:40:06,121 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 512100 2023-11-26 13:40:06,353 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3413940.0, ans=0.125 2023-11-26 13:40:09,751 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 7100, loss[loss=0.073, simple_loss=0.1015, pruned_loss=0.01498, audio_tagging_loss=0.007288, over 15882.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.0891, pruned_loss=0.01209, audio_tagging_loss=0.00889, over 3047169.22 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:40:12,859 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.221e+01 8.711e+01 9.572e+01 1.021e+02 1.655e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-26 13:40:15,005 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.40 vs. limit=22.5 2023-11-26 13:40:16,827 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3414006.6666666665, ans=0.2 2023-11-26 13:40:30,231 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3414073.3333333335, ans=0.0 2023-11-26 13:40:38,260 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3414140.0, ans=0.04949747468305833 2023-11-26 13:41:02,408 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 512150 2023-11-26 13:41:05,580 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 7150, loss[loss=0.07598, simple_loss=0.09813, pruned_loss=0.01774, audio_tagging_loss=0.009176, over 14772.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08962, pruned_loss=0.0122, audio_tagging_loss=0.008895, over 3044479.51 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:41:22,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3414406.6666666665, ans=0.125 2023-11-26 13:41:40,836 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3414540.0, ans=0.0 2023-11-26 13:41:43,718 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.92 vs. limit=8.0 2023-11-26 13:41:52,264 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.35 vs. limit=15.0 2023-11-26 13:41:58,314 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 512200 2023-11-26 13:42:02,392 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 7200, loss[loss=0.06837, simple_loss=0.08806, pruned_loss=0.01034, audio_tagging_loss=0.01399, over 15287.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.09031, pruned_loss=0.01221, audio_tagging_loss=0.008849, over 3044186.46 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:42:05,619 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.870e+01 8.947e+01 9.542e+01 1.037e+02 1.437e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-26 13:42:11,157 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3414673.3333333335, ans=0.125 2023-11-26 13:42:14,369 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3414740.0, ans=0.0 2023-11-26 13:42:30,462 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3414806.6666666665, ans=0.125 2023-11-26 13:42:34,125 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3414806.6666666665, ans=0.5 2023-11-26 13:42:47,493 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3414940.0, ans=0.0 2023-11-26 13:42:54,845 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 512250 2023-11-26 13:42:57,948 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 7250, loss[loss=0.05365, simple_loss=0.07695, pruned_loss=0.007202, audio_tagging_loss=0.00798, over 15122.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.09098, pruned_loss=0.01239, audio_tagging_loss=0.008814, over 3042651.69 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:43:02,893 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3415006.6666666665, ans=0.0 2023-11-26 13:43:14,196 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3415073.3333333335, ans=0.125 2023-11-26 13:43:25,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3415140.0, ans=0.125 2023-11-26 13:43:33,152 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3415206.6666666665, ans=0.125 2023-11-26 13:43:36,603 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.17 vs. limit=22.5 2023-11-26 13:43:43,128 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=3415273.3333333335, ans=10.0 2023-11-26 13:43:51,088 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 512300 2023-11-26 13:43:51,231 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3415273.3333333335, ans=0.125 2023-11-26 13:43:52,363 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3415273.3333333335, ans=0.05 2023-11-26 13:43:54,218 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 7300, loss[loss=0.058, simple_loss=0.0756, pruned_loss=0.01024, audio_tagging_loss=0.009961, over 16338.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.09036, pruned_loss=0.01227, audio_tagging_loss=0.008808, over 3044735.02 frames. ], batch size: 63, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:43:59,019 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.665e+01 8.745e+01 9.385e+01 1.003e+02 1.402e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-26 13:44:07,682 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3415406.6666666665, ans=0.0 2023-11-26 13:44:10,945 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3415406.6666666665, ans=0.0 2023-11-26 13:44:29,769 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3415540.0, ans=0.0 2023-11-26 13:44:30,835 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3415540.0, ans=0.2 2023-11-26 13:44:32,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3415540.0, ans=0.0 2023-11-26 13:44:33,490 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.65 vs. limit=5.0 2023-11-26 13:44:39,297 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3415606.6666666665, ans=0.125 2023-11-26 13:44:47,372 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 512350 2023-11-26 13:44:50,500 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 7350, loss[loss=0.06411, simple_loss=0.09448, pruned_loss=0.01021, audio_tagging_loss=0.00665, over 15272.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.09069, pruned_loss=0.0125, audio_tagging_loss=0.008722, over 3041235.86 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:44:52,913 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3415673.3333333335, ans=0.125 2023-11-26 13:44:54,004 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 13:45:09,976 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=3415740.0, ans=10.0 2023-11-26 13:45:41,428 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3415940.0, ans=0.2 2023-11-26 13:45:43,362 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 512400 2023-11-26 13:45:46,739 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 7400, loss[loss=0.05757, simple_loss=0.07946, pruned_loss=0.009249, audio_tagging_loss=0.008586, over 14657.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.09008, pruned_loss=0.01232, audio_tagging_loss=0.008593, over 3039412.48 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:45:50,935 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.547e+01 8.921e+01 9.521e+01 1.008e+02 1.264e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-26 13:45:57,628 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3416073.3333333335, ans=0.1 2023-11-26 13:46:07,951 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.26 vs. limit=15.0 2023-11-26 13:46:23,768 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3416206.6666666665, ans=0.0 2023-11-26 13:46:40,234 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 512450 2023-11-26 13:46:43,237 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 7450, loss[loss=0.07312, simple_loss=0.1063, pruned_loss=0.01523, audio_tagging_loss=0.004746, over 16201.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.09054, pruned_loss=0.01228, audio_tagging_loss=0.008459, over 3039478.56 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:46:43,466 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3416340.0, ans=0.125 2023-11-26 13:46:59,191 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.51 vs. limit=15.0 2023-11-26 13:47:06,105 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3416473.3333333335, ans=0.0 2023-11-26 13:47:13,591 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.06 vs. limit=15.0 2023-11-26 13:47:35,961 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 512500 2023-11-26 13:47:39,111 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 7500, loss[loss=0.0874, simple_loss=0.1265, pruned_loss=0.01902, audio_tagging_loss=0.005128, over 15411.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.0899, pruned_loss=0.01215, audio_tagging_loss=0.008451, over 3040272.63 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:47:43,322 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.533e+01 8.713e+01 9.201e+01 9.904e+01 1.159e+02, threshold=1.840e+02, percent-clipped=0.0 2023-11-26 13:47:46,717 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3416673.3333333335, ans=0.125 2023-11-26 13:47:47,047 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.25 vs. limit=10.0 2023-11-26 13:47:57,160 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.08 vs. limit=15.0 2023-11-26 13:47:58,019 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3416740.0, ans=0.2 2023-11-26 13:48:00,616 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3416806.6666666665, ans=0.125 2023-11-26 13:48:01,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3416806.6666666665, ans=0.125 2023-11-26 13:48:01,699 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3416806.6666666665, ans=0.125 2023-11-26 13:48:02,686 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=3416806.6666666665, ans=0.5 2023-11-26 13:48:31,290 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 512550 2023-11-26 13:48:34,365 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 7550, loss[loss=0.04631, simple_loss=0.05952, pruned_loss=0.005239, audio_tagging_loss=0.01131, over 15371.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08988, pruned_loss=0.0122, audio_tagging_loss=0.008448, over 3043863.92 frames. ], batch size: 60, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:48:44,190 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3417006.6666666665, ans=0.125 2023-11-26 13:48:53,335 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3417073.3333333335, ans=0.125 2023-11-26 13:49:10,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3417206.6666666665, ans=0.1 2023-11-26 13:49:22,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3417273.3333333335, ans=0.125 2023-11-26 13:49:27,947 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 512600 2023-11-26 13:49:31,663 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 7600, loss[loss=0.05832, simple_loss=0.07789, pruned_loss=0.008629, audio_tagging_loss=0.01075, over 14097.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08865, pruned_loss=0.01216, audio_tagging_loss=0.008508, over 3042086.12 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:49:34,135 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.33 vs. limit=15.0 2023-11-26 13:49:35,805 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.220e+01 8.691e+01 9.310e+01 9.815e+01 1.310e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-26 13:49:38,709 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3417340.0, ans=0.125 2023-11-26 13:49:47,301 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3417406.6666666665, ans=0.125 2023-11-26 13:50:06,964 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3417540.0, ans=0.125 2023-11-26 13:50:24,317 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 512650 2023-11-26 13:50:27,356 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 7650, loss[loss=0.06403, simple_loss=0.0848, pruned_loss=0.01259, audio_tagging_loss=0.009048, over 14833.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08956, pruned_loss=0.01228, audio_tagging_loss=0.008489, over 3044320.39 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:50:48,322 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3417806.6666666665, ans=0.2 2023-11-26 13:51:04,937 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3417873.3333333335, ans=0.125 2023-11-26 13:51:19,496 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 512700 2023-11-26 13:51:20,598 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3417940.0, ans=0.0 2023-11-26 13:51:22,602 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 7700, loss[loss=0.09997, simple_loss=0.132, pruned_loss=0.02637, audio_tagging_loss=0.007628, over 15454.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.09033, pruned_loss=0.01244, audio_tagging_loss=0.008548, over 3045666.97 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:51:22,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3418006.6666666665, ans=0.04949747468305833 2023-11-26 13:51:26,881 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.358e+01 8.781e+01 9.620e+01 1.045e+02 1.417e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-26 13:51:52,118 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.97 vs. limit=15.0 2023-11-26 13:52:10,079 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.64 vs. limit=15.0 2023-11-26 13:52:15,480 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 512750 2023-11-26 13:52:18,706 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.55 vs. limit=15.0 2023-11-26 13:52:19,055 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 7750, loss[loss=0.0573, simple_loss=0.07299, pruned_loss=0.01015, audio_tagging_loss=0.01066, over 16582.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.09051, pruned_loss=0.01248, audio_tagging_loss=0.008528, over 3041812.33 frames. ], batch size: 64, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:52:31,986 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3418406.6666666665, ans=0.1 2023-11-26 13:52:39,670 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.91 vs. limit=10.0 2023-11-26 13:53:06,806 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.76 vs. limit=15.0 2023-11-26 13:53:11,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3418606.6666666665, ans=0.04949747468305833 2023-11-26 13:53:12,074 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 512800 2023-11-26 13:53:15,467 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 7800, loss[loss=0.05675, simple_loss=0.0707, pruned_loss=0.009819, audio_tagging_loss=0.01158, over 14032.00 frames. ], tot_loss[loss=0.0666, simple_loss=0.09084, pruned_loss=0.01258, audio_tagging_loss=0.008606, over 3039152.73 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:53:15,713 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3418673.3333333335, ans=10.0 2023-11-26 13:53:19,688 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.359e+01 9.103e+01 9.758e+01 1.031e+02 1.342e+02, threshold=1.952e+02, percent-clipped=0.0 2023-11-26 13:53:25,187 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3418740.0, ans=0.125 2023-11-26 13:53:36,953 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3418806.6666666665, ans=0.0 2023-11-26 13:53:59,581 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.07 vs. limit=15.0 2023-11-26 13:54:07,468 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 512850 2023-11-26 13:54:08,756 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3418940.0, ans=0.1 2023-11-26 13:54:10,518 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 7850, loss[loss=0.06371, simple_loss=0.09413, pruned_loss=0.009051, audio_tagging_loss=0.007596, over 15174.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.09052, pruned_loss=0.01244, audio_tagging_loss=0.008659, over 3038172.10 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:54:19,532 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.65 vs. limit=22.5 2023-11-26 13:54:25,189 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3419073.3333333335, ans=0.0 2023-11-26 13:54:29,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3419073.3333333335, ans=0.0 2023-11-26 13:54:44,954 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3419206.6666666665, ans=0.125 2023-11-26 13:55:02,780 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 512900 2023-11-26 13:55:06,487 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 7900, loss[loss=0.07562, simple_loss=0.1148, pruned_loss=0.01231, audio_tagging_loss=0.005895, over 14983.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.09018, pruned_loss=0.01239, audio_tagging_loss=0.0088, over 3037488.64 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:55:12,348 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.382e+01 9.008e+01 9.612e+01 1.015e+02 1.376e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-26 13:55:18,369 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3419406.6666666665, ans=0.0 2023-11-26 13:55:21,625 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3419406.6666666665, ans=0.125 2023-11-26 13:55:38,835 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3419540.0, ans=0.0 2023-11-26 13:55:59,425 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 512950 2023-11-26 13:56:03,131 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 7950, loss[loss=0.07335, simple_loss=0.1025, pruned_loss=0.01371, audio_tagging_loss=0.00837, over 17169.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.08983, pruned_loss=0.01219, audio_tagging_loss=0.008906, over 3044658.77 frames. ], batch size: 64, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 13:56:08,719 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 13:56:17,931 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 13:56:23,729 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.39 vs. limit=22.5 2023-11-26 13:56:55,004 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 513000 2023-11-26 13:56:55,220 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3419940.0, ans=0.0 2023-11-26 13:56:58,392 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 8000, loss[loss=0.06109, simple_loss=0.07133, pruned_loss=0.01352, audio_tagging_loss=0.01191, over 14351.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.08826, pruned_loss=0.01187, audio_tagging_loss=0.009041, over 3044826.24 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:57:03,150 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.76 vs. limit=22.5 2023-11-26 13:57:03,736 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.322e+01 8.642e+01 9.203e+01 9.908e+01 1.245e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-26 13:57:04,945 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3420006.6666666665, ans=0.125 2023-11-26 13:57:15,222 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3420073.3333333335, ans=0.0 2023-11-26 13:57:18,982 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=3420073.3333333335, ans=10.0 2023-11-26 13:57:30,096 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.88 vs. limit=22.5 2023-11-26 13:57:50,848 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 513050 2023-11-26 13:57:54,565 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 8050, loss[loss=0.06203, simple_loss=0.08073, pruned_loss=0.01374, audio_tagging_loss=0.007927, over 14142.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08865, pruned_loss=0.01211, audio_tagging_loss=0.009087, over 3047217.96 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:58:43,805 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.43 vs. limit=15.0 2023-11-26 13:58:46,627 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 513100 2023-11-26 13:58:46,813 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3420606.6666666665, ans=0.1 2023-11-26 13:58:50,317 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 8100, loss[loss=0.0723, simple_loss=0.08047, pruned_loss=0.01512, audio_tagging_loss=0.01694, over 13930.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08886, pruned_loss=0.0122, audio_tagging_loss=0.009025, over 3047672.07 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:58:52,763 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=14.20 vs. limit=15.0 2023-11-26 13:58:56,152 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.455e+01 8.587e+01 9.236e+01 9.771e+01 1.279e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-26 13:58:56,330 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3420673.3333333335, ans=0.125 2023-11-26 13:59:03,858 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3420740.0, ans=0.125 2023-11-26 13:59:08,360 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.35 vs. limit=22.5 2023-11-26 13:59:13,282 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3420806.6666666665, ans=0.125 2023-11-26 13:59:13,426 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 13:59:13,666 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.97 vs. limit=15.0 2023-11-26 13:59:26,680 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=3420873.3333333335, ans=15.0 2023-11-26 13:59:34,426 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2023-11-26 13:59:43,034 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 513150 2023-11-26 13:59:46,085 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 8150, loss[loss=0.07575, simple_loss=0.09743, pruned_loss=0.01618, audio_tagging_loss=0.01086, over 14522.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.09041, pruned_loss=0.01248, audio_tagging_loss=0.008745, over 3051507.34 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 13:59:54,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3421006.6666666665, ans=0.2 2023-11-26 14:00:11,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3421140.0, ans=0.035 2023-11-26 14:00:32,057 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.51 vs. limit=15.0 2023-11-26 14:00:36,134 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.07 vs. limit=15.0 2023-11-26 14:00:37,938 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 513200 2023-11-26 14:00:41,344 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 8200, loss[loss=0.0601, simple_loss=0.08825, pruned_loss=0.009851, audio_tagging_loss=0.006124, over 15344.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.09107, pruned_loss=0.01245, audio_tagging_loss=0.008553, over 3049665.84 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:00:43,498 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 14:00:47,119 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.432e+01 8.881e+01 9.462e+01 1.023e+02 1.490e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-26 14:00:52,202 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3421406.6666666665, ans=0.0 2023-11-26 14:00:54,269 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3421406.6666666665, ans=0.2 2023-11-26 14:01:05,115 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3421473.3333333335, ans=0.125 2023-11-26 14:01:05,156 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3421473.3333333335, ans=0.125 2023-11-26 14:01:23,172 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3421540.0, ans=0.09899494936611666 2023-11-26 14:01:34,706 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 513250 2023-11-26 14:01:37,851 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 8250, loss[loss=0.08058, simple_loss=0.09726, pruned_loss=0.02079, audio_tagging_loss=0.01116, over 13970.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08954, pruned_loss=0.01229, audio_tagging_loss=0.008633, over 3046651.71 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:01:38,055 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3421673.3333333335, ans=0.125 2023-11-26 14:02:16,534 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3421873.3333333335, ans=0.125 2023-11-26 14:02:19,735 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3421873.3333333335, ans=0.125 2023-11-26 14:02:24,440 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3421940.0, ans=0.0 2023-11-26 14:02:30,603 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 513300 2023-11-26 14:02:34,278 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 8300, loss[loss=0.06253, simple_loss=0.08558, pruned_loss=0.009933, audio_tagging_loss=0.009807, over 15126.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.0898, pruned_loss=0.01235, audio_tagging_loss=0.008761, over 3045209.82 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:02:40,662 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.251e+01 8.839e+01 9.487e+01 1.004e+02 1.588e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-26 14:03:07,969 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3422206.6666666665, ans=0.125 2023-11-26 14:03:12,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3422206.6666666665, ans=0.0 2023-11-26 14:03:12,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3422206.6666666665, ans=0.125 2023-11-26 14:03:13,270 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3422206.6666666665, ans=0.1 2023-11-26 14:03:25,803 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.16 vs. limit=15.0 2023-11-26 14:03:26,305 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 513350 2023-11-26 14:03:28,666 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3422340.0, ans=0.125 2023-11-26 14:03:29,410 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 8350, loss[loss=0.04864, simple_loss=0.07657, pruned_loss=0.005253, audio_tagging_loss=0.005102, over 13878.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.08986, pruned_loss=0.0123, audio_tagging_loss=0.008669, over 3039476.51 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 8.0 2023-11-26 14:03:40,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3422406.6666666665, ans=0.125 2023-11-26 14:03:50,064 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3422406.6666666665, ans=0.125 2023-11-26 14:03:53,489 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3422473.3333333335, ans=0.1 2023-11-26 14:03:56,613 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3422473.3333333335, ans=0.125 2023-11-26 14:04:21,266 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3422606.6666666665, ans=0.2 2023-11-26 14:04:22,055 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 513400 2023-11-26 14:04:25,982 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 8400, loss[loss=0.06942, simple_loss=0.1021, pruned_loss=0.01173, audio_tagging_loss=0.006649, over 15782.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08918, pruned_loss=0.01223, audio_tagging_loss=0.008653, over 3034287.55 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:04:33,771 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.236e+01 8.557e+01 9.224e+01 9.865e+01 1.202e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-26 14:04:55,499 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3422806.6666666665, ans=0.1 2023-11-26 14:04:58,841 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3422873.3333333335, ans=0.0 2023-11-26 14:05:14,128 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff3.min_abs, batch_count=3422940.0, ans=0.2 2023-11-26 14:05:18,111 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 513450 2023-11-26 14:05:21,189 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 8450, loss[loss=0.06896, simple_loss=0.09371, pruned_loss=0.01252, audio_tagging_loss=0.009583, over 14983.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08956, pruned_loss=0.01221, audio_tagging_loss=0.008697, over 3043145.26 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:05:37,798 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3423073.3333333335, ans=0.0 2023-11-26 14:06:13,882 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 513500 2023-11-26 14:06:16,121 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3423340.0, ans=0.0 2023-11-26 14:06:16,996 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 8500, loss[loss=0.05567, simple_loss=0.07961, pruned_loss=0.008204, audio_tagging_loss=0.007658, over 16722.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08949, pruned_loss=0.01221, audio_tagging_loss=0.008707, over 3050300.24 frames. ], batch size: 62, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:06:22,362 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3423340.0, ans=0.0 2023-11-26 14:06:24,288 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.353e+01 8.916e+01 9.646e+01 1.037e+02 1.510e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-26 14:06:45,269 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3423473.3333333335, ans=0.015 2023-11-26 14:07:09,549 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 513550 2023-11-26 14:07:10,698 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3423606.6666666665, ans=0.125 2023-11-26 14:07:12,639 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 8550, loss[loss=0.06877, simple_loss=0.09898, pruned_loss=0.01202, audio_tagging_loss=0.007256, over 14548.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.09015, pruned_loss=0.01228, audio_tagging_loss=0.008746, over 3051576.11 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:07:15,571 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3423673.3333333335, ans=0.125 2023-11-26 14:07:32,166 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.27 vs. limit=12.0 2023-11-26 14:07:41,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3423806.6666666665, ans=0.0 2023-11-26 14:08:05,903 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 513600 2023-11-26 14:08:09,286 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 8600, loss[loss=0.07862, simple_loss=0.1059, pruned_loss=0.01707, audio_tagging_loss=0.00859, over 14634.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.0912, pruned_loss=0.01249, audio_tagging_loss=0.008713, over 3052535.77 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:08:12,703 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3424006.6666666665, ans=0.125 2023-11-26 14:08:16,722 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.106e+01 8.755e+01 9.267e+01 1.010e+02 1.487e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-26 14:08:30,671 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.85 vs. limit=15.0 2023-11-26 14:08:33,935 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3424140.0, ans=0.2 2023-11-26 14:09:01,412 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 513650 2023-11-26 14:09:05,085 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 8650, loss[loss=0.05802, simple_loss=0.07538, pruned_loss=0.009897, audio_tagging_loss=0.01043, over 14576.00 frames. ], tot_loss[loss=0.06727, simple_loss=0.09191, pruned_loss=0.0126, audio_tagging_loss=0.008718, over 3054556.48 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:09:13,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3424340.0, ans=0.1 2023-11-26 14:09:22,268 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.96 vs. limit=15.0 2023-11-26 14:09:23,060 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3424406.6666666665, ans=0.0 2023-11-26 14:09:32,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3424473.3333333335, ans=0.025 2023-11-26 14:09:33,119 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3424473.3333333335, ans=10.0 2023-11-26 14:09:37,465 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3424540.0, ans=0.125 2023-11-26 14:09:51,711 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3424606.6666666665, ans=0.125 2023-11-26 14:09:56,944 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 513700 2023-11-26 14:10:00,532 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 8700, loss[loss=0.07399, simple_loss=0.09431, pruned_loss=0.01536, audio_tagging_loss=0.01148, over 14946.00 frames. ], tot_loss[loss=0.0672, simple_loss=0.09169, pruned_loss=0.01258, audio_tagging_loss=0.008778, over 3059869.05 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:10:08,468 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.016e+01 8.732e+01 9.410e+01 1.013e+02 1.633e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-26 14:10:23,168 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3424806.6666666665, ans=0.125 2023-11-26 14:10:43,357 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3424873.3333333335, ans=0.125 2023-11-26 14:10:49,658 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.74 vs. limit=15.0 2023-11-26 14:10:53,888 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 513750 2023-11-26 14:10:54,093 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3424940.0, ans=0.125 2023-11-26 14:10:57,012 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 8750, loss[loss=0.06578, simple_loss=0.08154, pruned_loss=0.0138, audio_tagging_loss=0.01122, over 16426.00 frames. ], tot_loss[loss=0.06792, simple_loss=0.09237, pruned_loss=0.01287, audio_tagging_loss=0.008864, over 3054198.08 frames. ], batch size: 62, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:11:03,440 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3425006.6666666665, ans=0.125 2023-11-26 14:11:03,530 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3425006.6666666665, ans=0.1 2023-11-26 14:11:06,658 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3425073.3333333335, ans=0.1 2023-11-26 14:11:13,168 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3425073.3333333335, ans=0.125 2023-11-26 14:11:35,406 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3425206.6666666665, ans=0.035 2023-11-26 14:11:36,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3425206.6666666665, ans=0.0 2023-11-26 14:11:49,051 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 513800 2023-11-26 14:11:52,395 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 8800, loss[loss=0.06131, simple_loss=0.09202, pruned_loss=0.009632, audio_tagging_loss=0.005661, over 14817.00 frames. ], tot_loss[loss=0.06762, simple_loss=0.09204, pruned_loss=0.01264, audio_tagging_loss=0.008958, over 3051759.28 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:12:00,262 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.281e+01 8.965e+01 9.351e+01 9.840e+01 1.391e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-26 14:12:01,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3425340.0, ans=0.125 2023-11-26 14:12:13,365 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.53 vs. limit=15.0 2023-11-26 14:12:20,109 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3425473.3333333335, ans=0.0 2023-11-26 14:12:25,297 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3425540.0, ans=0.0 2023-11-26 14:12:26,496 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3425540.0, ans=0.125 2023-11-26 14:12:26,855 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.67 vs. limit=15.0 2023-11-26 14:12:28,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3425540.0, ans=0.1 2023-11-26 14:12:36,900 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3425606.6666666665, ans=0.125 2023-11-26 14:12:37,898 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3425606.6666666665, ans=0.125 2023-11-26 14:12:39,072 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 14:12:44,787 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 513850 2023-11-26 14:12:48,465 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 8850, loss[loss=0.06953, simple_loss=0.09118, pruned_loss=0.01019, audio_tagging_loss=0.01376, over 15138.00 frames. ], tot_loss[loss=0.06726, simple_loss=0.09154, pruned_loss=0.01257, audio_tagging_loss=0.008919, over 3054339.35 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:12:51,886 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3425673.3333333335, ans=0.0 2023-11-26 14:12:56,686 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3425673.3333333335, ans=0.125 2023-11-26 14:12:57,938 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=3425673.3333333335, ans=15.0 2023-11-26 14:13:01,196 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 14:13:08,748 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3425740.0, ans=0.1 2023-11-26 14:13:25,600 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 14:13:33,937 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3425940.0, ans=0.0 2023-11-26 14:13:40,750 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 513900 2023-11-26 14:13:44,392 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 8900, loss[loss=0.04115, simple_loss=0.05451, pruned_loss=0.003458, audio_tagging_loss=0.01043, over 15080.00 frames. ], tot_loss[loss=0.06688, simple_loss=0.09127, pruned_loss=0.01239, audio_tagging_loss=0.008855, over 3061109.12 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:13:52,704 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.145e+01 8.670e+01 9.413e+01 1.057e+02 1.382e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-26 14:14:36,339 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 513950 2023-11-26 14:14:39,371 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 8950, loss[loss=0.06299, simple_loss=0.08599, pruned_loss=0.01181, audio_tagging_loss=0.008182, over 15085.00 frames. ], tot_loss[loss=0.0669, simple_loss=0.09142, pruned_loss=0.01249, audio_tagging_loss=0.008695, over 3053626.27 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:14:40,710 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3426340.0, ans=0.09899494936611666 2023-11-26 14:15:08,005 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.62 vs. limit=10.0 2023-11-26 14:15:10,945 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3426473.3333333335, ans=0.07 2023-11-26 14:15:19,270 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3426540.0, ans=0.2 2023-11-26 14:15:30,700 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 514000 2023-11-26 14:15:31,084 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.14 vs. limit=6.0 2023-11-26 14:15:34,076 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 9000, loss[loss=0.0638, simple_loss=0.0782, pruned_loss=0.01505, audio_tagging_loss=0.009652, over 13867.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.09072, pruned_loss=0.01242, audio_tagging_loss=0.008713, over 3050168.68 frames. ], batch size: 52, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:15:34,076 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-26 14:16:06,626 INFO [train_asr.py:1267] (3/4) Epoch 43, validation: loss=0.05882, simple_loss=0.0506, pruned_loss=0.005335, audio_tagging_loss=0.02819, over 4681554.00 frames. 2023-11-26 14:16:06,627 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-26 14:16:15,120 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.126e+01 8.986e+01 9.503e+01 1.043e+02 1.217e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-26 14:16:33,951 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3426806.6666666665, ans=0.0 2023-11-26 14:16:50,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3426940.0, ans=0.125 2023-11-26 14:16:53,760 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3426940.0, ans=0.125 2023-11-26 14:16:57,859 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3426940.0, ans=0.035 2023-11-26 14:16:58,885 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 514050 2023-11-26 14:17:00,305 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=3426940.0, ans=15.0 2023-11-26 14:17:01,970 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 9050, loss[loss=0.09455, simple_loss=0.1231, pruned_loss=0.02501, audio_tagging_loss=0.007978, over 15000.00 frames. ], tot_loss[loss=0.06674, simple_loss=0.09109, pruned_loss=0.01249, audio_tagging_loss=0.008707, over 3046887.60 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:17:04,353 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3427006.6666666665, ans=0.125 2023-11-26 14:17:08,526 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3427006.6666666665, ans=10.0 2023-11-26 14:17:09,608 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3427006.6666666665, ans=0.0 2023-11-26 14:17:28,962 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.94 vs. limit=15.0 2023-11-26 14:17:54,116 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 514100 2023-11-26 14:17:57,790 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 9100, loss[loss=0.07303, simple_loss=0.0939, pruned_loss=0.01601, audio_tagging_loss=0.01007, over 16126.00 frames. ], tot_loss[loss=0.06679, simple_loss=0.09135, pruned_loss=0.01245, audio_tagging_loss=0.008671, over 3058370.30 frames. ], batch size: 60, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:18:04,654 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.34 vs. limit=10.0 2023-11-26 14:18:07,457 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.029e+01 8.883e+01 9.542e+01 1.028e+02 1.451e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-26 14:18:12,537 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3427406.6666666665, ans=0.0 2023-11-26 14:18:13,573 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3427406.6666666665, ans=10.0 2023-11-26 14:18:16,698 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3427406.6666666665, ans=0.2 2023-11-26 14:18:20,026 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3427473.3333333335, ans=0.125 2023-11-26 14:18:20,063 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3427473.3333333335, ans=0.0 2023-11-26 14:18:21,413 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.61 vs. limit=15.0 2023-11-26 14:18:36,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3427540.0, ans=0.125 2023-11-26 14:18:43,025 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 14:18:47,433 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.73 vs. limit=15.0 2023-11-26 14:18:51,495 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 514150 2023-11-26 14:18:55,202 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 9150, loss[loss=0.07428, simple_loss=0.0995, pruned_loss=0.01546, audio_tagging_loss=0.009068, over 16410.00 frames. ], tot_loss[loss=0.06719, simple_loss=0.09196, pruned_loss=0.01263, audio_tagging_loss=0.008582, over 3060740.82 frames. ], batch size: 61, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:19:01,806 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3427673.3333333335, ans=0.0 2023-11-26 14:19:31,828 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.08 vs. limit=22.5 2023-11-26 14:19:32,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3427873.3333333335, ans=0.2 2023-11-26 14:19:33,590 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3427873.3333333335, ans=0.09899494936611666 2023-11-26 14:19:46,764 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 514200 2023-11-26 14:19:50,129 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 9200, loss[loss=0.06021, simple_loss=0.08147, pruned_loss=0.009911, audio_tagging_loss=0.009564, over 15888.00 frames. ], tot_loss[loss=0.06735, simple_loss=0.09222, pruned_loss=0.01272, audio_tagging_loss=0.008521, over 3057922.31 frames. ], batch size: 61, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:19:57,952 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3428006.6666666665, ans=0.1 2023-11-26 14:19:58,735 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.474e+01 8.729e+01 9.387e+01 1.004e+02 1.309e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-26 14:20:27,705 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3428206.6666666665, ans=0.5 2023-11-26 14:20:42,619 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 514250 2023-11-26 14:20:44,842 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3428340.0, ans=0.1 2023-11-26 14:20:45,752 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 9250, loss[loss=0.0528, simple_loss=0.08053, pruned_loss=0.004291, audio_tagging_loss=0.008241, over 14577.00 frames. ], tot_loss[loss=0.06709, simple_loss=0.09198, pruned_loss=0.01267, audio_tagging_loss=0.008431, over 3063372.36 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:20:51,806 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3428340.0, ans=0.0 2023-11-26 14:21:05,245 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3428406.6666666665, ans=0.0 2023-11-26 14:21:38,504 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 514300 2023-11-26 14:21:42,682 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 9300, loss[loss=0.07289, simple_loss=0.09938, pruned_loss=0.01601, audio_tagging_loss=0.007194, over 13901.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.09104, pruned_loss=0.01258, audio_tagging_loss=0.008545, over 3053183.99 frames. ], batch size: 52, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:21:46,227 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3428673.3333333335, ans=0.125 2023-11-26 14:21:51,813 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.291e+01 8.838e+01 9.271e+01 1.020e+02 1.264e+02, threshold=1.854e+02, percent-clipped=0.0 2023-11-26 14:22:01,732 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.88 vs. limit=15.0 2023-11-26 14:22:35,728 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 514350 2023-11-26 14:22:36,943 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3428940.0, ans=0.125 2023-11-26 14:22:38,795 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 9350, loss[loss=0.06657, simple_loss=0.0933, pruned_loss=0.01314, audio_tagging_loss=0.006784, over 15989.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.08997, pruned_loss=0.01244, audio_tagging_loss=0.008665, over 3046262.22 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:22:42,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3429006.6666666665, ans=0.125 2023-11-26 14:22:44,745 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.89 vs. limit=15.0 2023-11-26 14:22:46,649 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.04 vs. limit=15.0 2023-11-26 14:22:46,686 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.25 vs. limit=15.0 2023-11-26 14:22:48,537 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 14:22:48,554 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3429073.3333333335, ans=0.125 2023-11-26 14:23:02,961 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3429140.0, ans=0.1 2023-11-26 14:23:25,720 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3429273.3333333335, ans=0.05 2023-11-26 14:23:25,793 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=3429273.3333333335, ans=10.0 2023-11-26 14:23:30,923 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 514400 2023-11-26 14:23:34,340 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 9400, loss[loss=0.0683, simple_loss=0.09126, pruned_loss=0.0128, audio_tagging_loss=0.009876, over 14283.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.09058, pruned_loss=0.01255, audio_tagging_loss=0.008749, over 3052357.83 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:23:35,092 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.65 vs. limit=12.0 2023-11-26 14:23:44,423 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.222e+01 8.705e+01 9.718e+01 1.044e+02 1.326e+02, threshold=1.944e+02, percent-clipped=0.0 2023-11-26 14:23:52,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=3429406.6666666665, ans=15.0 2023-11-26 14:24:09,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3429540.0, ans=0.0 2023-11-26 14:24:12,287 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.78 vs. limit=15.0 2023-11-26 14:24:15,192 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3429540.0, ans=0.1 2023-11-26 14:24:19,740 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.05 vs. limit=15.0 2023-11-26 14:24:27,198 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 514450 2023-11-26 14:24:30,860 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 9450, loss[loss=0.05825, simple_loss=0.07983, pruned_loss=0.009846, audio_tagging_loss=0.008491, over 14112.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.09033, pruned_loss=0.01251, audio_tagging_loss=0.008827, over 3049925.74 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:24:30,904 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 14:24:50,193 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3429740.0, ans=0.125 2023-11-26 14:24:52,405 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3429806.6666666665, ans=0.2 2023-11-26 14:24:57,783 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 14:25:13,289 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.68 vs. limit=22.5 2023-11-26 14:25:23,937 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 514500 2023-11-26 14:25:24,071 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3429940.0, ans=0.125 2023-11-26 14:25:27,576 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 9500, loss[loss=0.07398, simple_loss=0.1074, pruned_loss=0.0121, audio_tagging_loss=0.00821, over 16317.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.09031, pruned_loss=0.01255, audio_tagging_loss=0.008906, over 3055840.12 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:25:31,134 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3430006.6666666665, ans=0.125 2023-11-26 14:25:37,195 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.279e+01 8.962e+01 9.623e+01 1.023e+02 1.293e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-26 14:26:13,482 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3430273.3333333335, ans=0.125 2023-11-26 14:26:19,715 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 514550 2023-11-26 14:26:22,821 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 9550, loss[loss=0.06932, simple_loss=0.09377, pruned_loss=0.01392, audio_tagging_loss=0.008518, over 14368.00 frames. ], tot_loss[loss=0.06677, simple_loss=0.09031, pruned_loss=0.0126, audio_tagging_loss=0.009007, over 3054658.74 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 16.0 2023-11-26 14:26:22,963 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3430340.0, ans=0.2 2023-11-26 14:26:27,292 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3430340.0, ans=0.125 2023-11-26 14:26:45,038 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=3430473.3333333335, ans=0.02 2023-11-26 14:26:50,343 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3430473.3333333335, ans=0.125 2023-11-26 14:26:52,453 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3430473.3333333335, ans=0.0 2023-11-26 14:26:56,197 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3430540.0, ans=0.1 2023-11-26 14:26:58,420 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.27 vs. limit=15.0 2023-11-26 14:27:02,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3430540.0, ans=0.0 2023-11-26 14:27:06,816 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3430606.6666666665, ans=0.125 2023-11-26 14:27:07,759 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3430606.6666666665, ans=0.1 2023-11-26 14:27:15,632 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 514600 2023-11-26 14:27:16,800 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3430606.6666666665, ans=0.125 2023-11-26 14:27:19,025 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 9600, loss[loss=0.05456, simple_loss=0.07328, pruned_loss=0.01056, audio_tagging_loss=0.007356, over 16109.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.0899, pruned_loss=0.01252, audio_tagging_loss=0.008997, over 3053536.14 frames. ], batch size: 60, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:27:28,378 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.93 vs. limit=15.0 2023-11-26 14:27:29,641 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.491e+01 8.862e+01 9.478e+01 1.011e+02 2.091e+02, threshold=1.896e+02, percent-clipped=1.0 2023-11-26 14:27:45,279 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-26 14:27:45,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3430806.6666666665, ans=0.125 2023-11-26 14:27:55,866 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.45 vs. limit=10.0 2023-11-26 14:28:11,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3430940.0, ans=0.125 2023-11-26 14:28:12,614 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 514650 2023-11-26 14:28:15,363 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=3431006.6666666665, ans=22.5 2023-11-26 14:28:15,753 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 9650, loss[loss=0.07792, simple_loss=0.1124, pruned_loss=0.01623, audio_tagging_loss=0.005489, over 15223.00 frames. ], tot_loss[loss=0.06663, simple_loss=0.09015, pruned_loss=0.01257, audio_tagging_loss=0.008984, over 3054071.92 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:28:20,563 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.84 vs. limit=15.0 2023-11-26 14:28:28,265 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3431073.3333333335, ans=0.1 2023-11-26 14:28:36,136 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.88 vs. limit=22.5 2023-11-26 14:28:43,838 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.63 vs. limit=15.0 2023-11-26 14:28:55,217 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3431206.6666666665, ans=0.04949747468305833 2023-11-26 14:29:05,225 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.97 vs. limit=15.0 2023-11-26 14:29:08,402 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 514700 2023-11-26 14:29:11,532 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 9700, loss[loss=0.07867, simple_loss=0.102, pruned_loss=0.01571, audio_tagging_loss=0.01196, over 15434.00 frames. ], tot_loss[loss=0.06668, simple_loss=0.09055, pruned_loss=0.01257, audio_tagging_loss=0.008836, over 3054531.16 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:29:14,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3431340.0, ans=0.125 2023-11-26 14:29:21,813 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.600e+01 8.878e+01 9.480e+01 1.018e+02 1.289e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-26 14:29:26,675 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.81 vs. limit=10.0 2023-11-26 14:29:33,099 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.65 vs. limit=10.0 2023-11-26 14:29:39,204 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3431473.3333333335, ans=0.0 2023-11-26 14:29:46,554 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3431540.0, ans=0.2 2023-11-26 14:29:47,684 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 14:29:51,292 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3431540.0, ans=0.0 2023-11-26 14:29:54,571 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3431540.0, ans=0.125 2023-11-26 14:29:54,741 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.00 vs. limit=22.5 2023-11-26 14:30:04,725 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 514750 2023-11-26 14:30:04,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3431606.6666666665, ans=0.0 2023-11-26 14:30:07,874 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 9750, loss[loss=0.04662, simple_loss=0.05673, pruned_loss=0.008617, audio_tagging_loss=0.009639, over 15995.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.08994, pruned_loss=0.01255, audio_tagging_loss=0.00873, over 3050393.01 frames. ], batch size: 61, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:30:11,261 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3431673.3333333335, ans=0.0 2023-11-26 14:30:15,709 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 14:30:20,044 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3431740.0, ans=0.0 2023-11-26 14:30:36,721 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.02 vs. limit=22.5 2023-11-26 14:30:42,342 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.89 vs. limit=15.0 2023-11-26 14:30:56,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3431940.0, ans=0.125 2023-11-26 14:30:59,270 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3431940.0, ans=0.09899494936611666 2023-11-26 14:31:01,271 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 514800 2023-11-26 14:31:03,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3432006.6666666665, ans=0.125 2023-11-26 14:31:04,621 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 9800, loss[loss=0.05686, simple_loss=0.07191, pruned_loss=0.01218, audio_tagging_loss=0.008718, over 15633.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.09034, pruned_loss=0.01264, audio_tagging_loss=0.008556, over 3050474.33 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:31:10,180 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3432006.6666666665, ans=0.025 2023-11-26 14:31:14,168 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.491e+01 8.979e+01 9.504e+01 1.025e+02 1.204e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-26 14:31:15,791 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.75 vs. limit=15.0 2023-11-26 14:31:17,740 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3432073.3333333335, ans=0.2 2023-11-26 14:31:32,091 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.41 vs. limit=10.0 2023-11-26 14:31:56,023 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 14:31:57,134 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 514850 2023-11-26 14:32:00,295 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 9850, loss[loss=0.07969, simple_loss=0.1056, pruned_loss=0.01919, audio_tagging_loss=0.007693, over 15327.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.0898, pruned_loss=0.01256, audio_tagging_loss=0.008526, over 3045273.30 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:32:13,305 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3432406.6666666665, ans=0.1 2023-11-26 14:32:14,470 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3432406.6666666665, ans=0.125 2023-11-26 14:32:19,685 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3432406.6666666665, ans=0.125 2023-11-26 14:32:22,470 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3432473.3333333335, ans=0.0 2023-11-26 14:32:22,507 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3432473.3333333335, ans=0.1 2023-11-26 14:32:23,905 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3432473.3333333335, ans=0.2 2023-11-26 14:32:31,370 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3432473.3333333335, ans=0.125 2023-11-26 14:32:37,560 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3432540.0, ans=0.1 2023-11-26 14:32:37,735 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3432540.0, ans=0.2 2023-11-26 14:32:41,239 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.22 vs. limit=15.0 2023-11-26 14:32:42,980 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3432540.0, ans=0.125 2023-11-26 14:32:53,113 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 514900 2023-11-26 14:32:54,860 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3432606.6666666665, ans=0.125 2023-11-26 14:32:54,981 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3432606.6666666665, ans=0.125 2023-11-26 14:32:56,784 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 9900, loss[loss=0.05, simple_loss=0.06938, pruned_loss=0.006059, audio_tagging_loss=0.009253, over 14562.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.08988, pruned_loss=0.01242, audio_tagging_loss=0.008543, over 3045848.24 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 32.0 2023-11-26 14:33:07,502 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.992e+01 8.539e+01 9.208e+01 1.007e+02 1.176e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-26 14:33:33,160 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.34 vs. limit=15.0 2023-11-26 14:33:50,429 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 514950 2023-11-26 14:33:53,510 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 9950, loss[loss=0.04378, simple_loss=0.05797, pruned_loss=0.00797, audio_tagging_loss=0.006831, over 14003.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.09003, pruned_loss=0.01238, audio_tagging_loss=0.008508, over 3045243.24 frames. ], batch size: 53, lr: 1.56e-03, grad_scale: 32.0 2023-11-26 14:33:58,144 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.40 vs. limit=15.0 2023-11-26 14:34:00,382 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.25 vs. limit=15.0 2023-11-26 14:34:18,582 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3433140.0, ans=0.1 2023-11-26 14:34:18,667 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3433140.0, ans=0.07 2023-11-26 14:34:34,386 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.76 vs. limit=12.0 2023-11-26 14:34:45,656 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 515000 2023-11-26 14:34:49,098 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 10000, loss[loss=0.05721, simple_loss=0.07223, pruned_loss=0.01238, audio_tagging_loss=0.00871, over 14819.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.08989, pruned_loss=0.01237, audio_tagging_loss=0.008517, over 3045246.35 frames. ], batch size: 58, lr: 1.56e-03, grad_scale: 32.0 2023-11-26 14:34:50,372 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3433340.0, ans=0.0 2023-11-26 14:34:59,094 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.754e+01 8.778e+01 9.390e+01 1.020e+02 2.265e+02, threshold=1.878e+02, percent-clipped=1.0 2023-11-26 14:34:59,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3433406.6666666665, ans=0.0 2023-11-26 14:35:03,034 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3433406.6666666665, ans=0.125 2023-11-26 14:35:05,199 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3433406.6666666665, ans=0.0 2023-11-26 14:35:23,227 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3433540.0, ans=0.0 2023-11-26 14:35:24,319 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3433540.0, ans=0.125 2023-11-26 14:35:41,085 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 515050 2023-11-26 14:35:45,415 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 10050, loss[loss=0.0446, simple_loss=0.05402, pruned_loss=0.007732, audio_tagging_loss=0.009858, over 15632.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08953, pruned_loss=0.01228, audio_tagging_loss=0.008487, over 3041832.97 frames. ], batch size: 65, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:35:49,065 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.72 vs. limit=15.0 2023-11-26 14:35:50,877 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3433673.3333333335, ans=0.125 2023-11-26 14:36:18,395 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3433873.3333333335, ans=0.125 2023-11-26 14:36:18,794 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.26 vs. limit=15.0 2023-11-26 14:36:22,623 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3433873.3333333335, ans=0.2 2023-11-26 14:36:22,655 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3433873.3333333335, ans=0.1 2023-11-26 14:36:37,428 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 515100 2023-11-26 14:36:41,169 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 10100, loss[loss=0.0694, simple_loss=0.09091, pruned_loss=0.01443, audio_tagging_loss=0.009509, over 15348.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.09045, pruned_loss=0.01251, audio_tagging_loss=0.008558, over 3046866.33 frames. ], batch size: 56, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:36:51,887 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.484e+01 8.526e+01 9.238e+01 9.912e+01 1.286e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-26 14:36:54,183 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3434073.3333333335, ans=0.125 2023-11-26 14:37:00,910 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.74 vs. limit=10.0 2023-11-26 14:37:27,304 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3434273.3333333335, ans=0.0 2023-11-26 14:37:28,163 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 14:37:33,541 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 515150 2023-11-26 14:37:36,673 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 10150, loss[loss=0.06707, simple_loss=0.08711, pruned_loss=0.0136, audio_tagging_loss=0.009915, over 16656.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.09062, pruned_loss=0.01254, audio_tagging_loss=0.008514, over 3053899.87 frames. ], batch size: 63, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:37:42,220 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3434340.0, ans=0.125 2023-11-26 14:37:42,246 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3434340.0, ans=0.1 2023-11-26 14:37:49,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3434406.6666666665, ans=0.125 2023-11-26 14:38:05,442 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 14:38:19,445 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3434540.0, ans=0.0 2023-11-26 14:38:24,647 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3434606.6666666665, ans=0.125 2023-11-26 14:38:28,852 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 515200 2023-11-26 14:38:32,191 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 10200, loss[loss=0.07289, simple_loss=0.1013, pruned_loss=0.01372, audio_tagging_loss=0.008527, over 16430.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.09011, pruned_loss=0.01247, audio_tagging_loss=0.008609, over 3051123.50 frames. ], batch size: 60, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:38:43,918 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3434740.0, ans=0.125 2023-11-26 14:38:44,745 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.947e+01 8.939e+01 9.563e+01 1.037e+02 1.347e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-26 14:38:45,963 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3434740.0, ans=0.1 2023-11-26 14:38:55,895 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 14:38:58,133 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3434806.6666666665, ans=0.0 2023-11-26 14:39:10,042 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3434873.3333333335, ans=0.05 2023-11-26 14:39:12,567 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.02 vs. limit=22.5 2023-11-26 14:39:13,362 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.54 vs. limit=6.0 2023-11-26 14:39:14,359 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.40 vs. limit=12.0 2023-11-26 14:39:17,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3434940.0, ans=0.07 2023-11-26 14:39:21,187 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 14:39:26,337 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 515250 2023-11-26 14:39:29,463 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 10250, loss[loss=0.05351, simple_loss=0.07051, pruned_loss=0.008932, audio_tagging_loss=0.009321, over 14542.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.09043, pruned_loss=0.01258, audio_tagging_loss=0.008614, over 3059153.75 frames. ], batch size: 56, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:39:36,602 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3435006.6666666665, ans=0.125 2023-11-26 14:39:59,300 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.57 vs. limit=22.5 2023-11-26 14:40:18,407 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 14:40:22,403 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 515300 2023-11-26 14:40:25,492 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 10300, loss[loss=0.07218, simple_loss=0.1016, pruned_loss=0.01158, audio_tagging_loss=0.00982, over 16246.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.08973, pruned_loss=0.01257, audio_tagging_loss=0.008759, over 3054883.02 frames. ], batch size: 58, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:40:25,789 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3435340.0, ans=0.125 2023-11-26 14:40:36,200 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.928e+01 8.849e+01 9.518e+01 1.017e+02 1.480e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-26 14:40:57,077 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.45 vs. limit=15.0 2023-11-26 14:41:13,826 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3435606.6666666665, ans=0.125 2023-11-26 14:41:15,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3435606.6666666665, ans=0.07 2023-11-26 14:41:18,049 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 515350 2023-11-26 14:41:21,159 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 10350, loss[loss=0.09339, simple_loss=0.1339, pruned_loss=0.0192, audio_tagging_loss=0.007227, over 15311.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.09006, pruned_loss=0.01267, audio_tagging_loss=0.008822, over 3051069.25 frames. ], batch size: 54, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:41:28,276 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3435673.3333333335, ans=0.125 2023-11-26 14:41:33,473 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3435740.0, ans=0.0 2023-11-26 14:41:39,982 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3435740.0, ans=0.125 2023-11-26 14:41:42,185 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3435740.0, ans=0.2 2023-11-26 14:41:49,782 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.02 vs. limit=15.0 2023-11-26 14:42:13,253 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 515400 2023-11-26 14:42:17,226 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 10400, loss[loss=0.06722, simple_loss=0.08882, pruned_loss=0.01156, audio_tagging_loss=0.01126, over 15966.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.08979, pruned_loss=0.01254, audio_tagging_loss=0.008988, over 3047755.21 frames. ], batch size: 63, lr: 1.56e-03, grad_scale: 32.0 2023-11-26 14:42:28,184 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3436073.3333333335, ans=0.125 2023-11-26 14:42:29,076 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.250e+01 8.791e+01 9.464e+01 1.006e+02 1.363e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 14:42:37,938 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3436073.3333333335, ans=0.0 2023-11-26 14:42:39,933 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3436140.0, ans=0.125 2023-11-26 14:42:44,152 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3436140.0, ans=0.025 2023-11-26 14:42:46,377 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3436140.0, ans=0.125 2023-11-26 14:42:47,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3436140.0, ans=0.1 2023-11-26 14:42:48,592 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3436140.0, ans=0.0 2023-11-26 14:42:59,239 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3436206.6666666665, ans=0.125 2023-11-26 14:43:10,156 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 515450 2023-11-26 14:43:13,301 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 10450, loss[loss=0.09498, simple_loss=0.1329, pruned_loss=0.02146, audio_tagging_loss=0.007047, over 14896.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.08997, pruned_loss=0.01248, audio_tagging_loss=0.008908, over 3048321.84 frames. ], batch size: 52, lr: 1.56e-03, grad_scale: 32.0 2023-11-26 14:43:21,357 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.79 vs. limit=10.0 2023-11-26 14:43:23,125 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3436406.6666666665, ans=0.1 2023-11-26 14:43:26,145 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3436406.6666666665, ans=0.0 2023-11-26 14:43:27,356 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3436406.6666666665, ans=10.0 2023-11-26 14:43:47,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3436540.0, ans=0.125 2023-11-26 14:43:49,627 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3436540.0, ans=0.0 2023-11-26 14:44:05,408 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 515500 2023-11-26 14:44:08,651 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 10500, loss[loss=0.0584, simple_loss=0.07132, pruned_loss=0.0118, audio_tagging_loss=0.01095, over 15176.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.08967, pruned_loss=0.01229, audio_tagging_loss=0.008851, over 3050319.29 frames. ], batch size: 56, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:44:08,923 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3436673.3333333335, ans=0.2 2023-11-26 14:44:12,115 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3436673.3333333335, ans=0.1 2023-11-26 14:44:20,951 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.233e+01 8.625e+01 9.527e+01 1.023e+02 1.211e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-26 14:44:28,860 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3436740.0, ans=0.125 2023-11-26 14:44:35,705 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3436806.6666666665, ans=0.125 2023-11-26 14:44:51,704 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3436873.3333333335, ans=0.2 2023-11-26 14:44:52,130 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.91 vs. limit=15.0 2023-11-26 14:44:59,654 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3436940.0, ans=0.0 2023-11-26 14:45:01,592 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 515550 2023-11-26 14:45:04,717 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 10550, loss[loss=0.08138, simple_loss=0.1086, pruned_loss=0.01908, audio_tagging_loss=0.007981, over 15749.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.09039, pruned_loss=0.01233, audio_tagging_loss=0.008718, over 3051999.28 frames. ], batch size: 57, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:45:05,565 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.67 vs. limit=5.0 2023-11-26 14:45:29,615 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.80 vs. limit=15.0 2023-11-26 14:45:47,844 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff3.min_abs, batch_count=3437206.6666666665, ans=0.2 2023-11-26 14:45:56,545 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.96 vs. limit=15.0 2023-11-26 14:45:58,747 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 515600 2023-11-26 14:45:58,985 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 14:46:02,167 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 10600, loss[loss=0.04934, simple_loss=0.07428, pruned_loss=0.005653, audio_tagging_loss=0.006552, over 17150.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09148, pruned_loss=0.01251, audio_tagging_loss=0.008613, over 3062231.35 frames. ], batch size: 64, lr: 1.56e-03, grad_scale: 8.0 2023-11-26 14:46:06,732 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3437340.0, ans=0.0 2023-11-26 14:46:14,991 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.016e+01 9.031e+01 9.613e+01 1.032e+02 1.237e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-26 14:46:18,353 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3437406.6666666665, ans=0.1 2023-11-26 14:46:28,547 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3437473.3333333335, ans=0.125 2023-11-26 14:46:29,622 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3437473.3333333335, ans=0.0 2023-11-26 14:46:36,065 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3437540.0, ans=0.2 2023-11-26 14:46:54,451 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 515650 2023-11-26 14:46:55,972 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.34 vs. limit=15.0 2023-11-26 14:46:57,556 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 10650, loss[loss=0.06178, simple_loss=0.09053, pruned_loss=0.00944, audio_tagging_loss=0.00708, over 14928.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.09043, pruned_loss=0.01255, audio_tagging_loss=0.00865, over 3058858.63 frames. ], batch size: 56, lr: 1.56e-03, grad_scale: 8.0 2023-11-26 14:47:01,044 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3437673.3333333335, ans=0.0 2023-11-26 14:47:35,193 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3437873.3333333335, ans=0.1 2023-11-26 14:47:38,340 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3437873.3333333335, ans=0.1 2023-11-26 14:47:42,457 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3437940.0, ans=0.125 2023-11-26 14:47:50,322 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 515700 2023-11-26 14:47:53,415 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 10700, loss[loss=0.06774, simple_loss=0.08873, pruned_loss=0.01362, audio_tagging_loss=0.009759, over 15557.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.09025, pruned_loss=0.01249, audio_tagging_loss=0.008607, over 3054129.77 frames. ], batch size: 59, lr: 1.56e-03, grad_scale: 8.0 2023-11-26 14:48:01,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3438006.6666666665, ans=0.125 2023-11-26 14:48:05,599 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.89 vs. limit=15.0 2023-11-26 14:48:07,267 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.864e+01 8.817e+01 9.509e+01 1.036e+02 1.497e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-26 14:48:17,731 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3438140.0, ans=0.125 2023-11-26 14:48:39,517 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3438273.3333333335, ans=0.125 2023-11-26 14:48:46,725 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 515750 2023-11-26 14:48:49,892 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 10750, loss[loss=0.0476, simple_loss=0.06109, pruned_loss=0.006848, audio_tagging_loss=0.0102, over 14990.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.09, pruned_loss=0.01243, audio_tagging_loss=0.008608, over 3050849.84 frames. ], batch size: 60, lr: 1.56e-03, grad_scale: 8.0 2023-11-26 14:49:11,866 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3438473.3333333335, ans=10.0 2023-11-26 14:49:12,982 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3438473.3333333335, ans=0.0 2023-11-26 14:49:15,667 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3438473.3333333335, ans=0.125 2023-11-26 14:49:24,819 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3438540.0, ans=0.0 2023-11-26 14:49:24,938 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3438540.0, ans=0.0 2023-11-26 14:49:31,145 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.70 vs. limit=15.0 2023-11-26 14:49:37,113 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3438606.6666666665, ans=0.125 2023-11-26 14:49:42,678 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 515800 2023-11-26 14:49:46,057 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 10800, loss[loss=0.06707, simple_loss=0.08789, pruned_loss=0.01464, audio_tagging_loss=0.008483, over 14668.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.09024, pruned_loss=0.01244, audio_tagging_loss=0.008584, over 3053490.77 frames. ], batch size: 54, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:49:48,364 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3438673.3333333335, ans=0.125 2023-11-26 14:49:50,478 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3438673.3333333335, ans=0.125 2023-11-26 14:49:59,614 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.532e+01 8.847e+01 9.608e+01 1.038e+02 1.531e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-26 14:50:33,131 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.04 vs. limit=6.0 2023-11-26 14:50:38,876 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 515850 2023-11-26 14:50:39,003 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3438940.0, ans=0.125 2023-11-26 14:50:42,498 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 10850, loss[loss=0.07128, simple_loss=0.09792, pruned_loss=0.01518, audio_tagging_loss=0.007143, over 15392.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.09113, pruned_loss=0.0125, audio_tagging_loss=0.008559, over 3057689.09 frames. ], batch size: 58, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:50:48,606 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3439006.6666666665, ans=0.0 2023-11-26 14:50:50,608 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3439006.6666666665, ans=0.125 2023-11-26 14:50:55,888 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.64 vs. limit=15.0 2023-11-26 14:50:57,531 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3439073.3333333335, ans=0.125 2023-11-26 14:51:35,721 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 515900 2023-11-26 14:51:35,973 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3439273.3333333335, ans=0.0 2023-11-26 14:51:36,716 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 14:51:36,904 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3439273.3333333335, ans=0.0 2023-11-26 14:51:38,897 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 10900, loss[loss=0.05789, simple_loss=0.08269, pruned_loss=0.009252, audio_tagging_loss=0.007296, over 15997.00 frames. ], tot_loss[loss=0.06663, simple_loss=0.09103, pruned_loss=0.01247, audio_tagging_loss=0.00864, over 3058948.06 frames. ], batch size: 58, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:51:39,488 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.11 vs. limit=15.0 2023-11-26 14:51:52,181 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.363e+01 9.005e+01 9.638e+01 1.044e+02 1.421e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-26 14:51:53,653 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3439406.6666666665, ans=0.125 2023-11-26 14:52:09,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3439473.3333333335, ans=0.0 2023-11-26 14:52:11,059 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3439540.0, ans=0.05 2023-11-26 14:52:18,770 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3439540.0, ans=0.125 2023-11-26 14:52:18,778 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3439540.0, ans=0.0 2023-11-26 14:52:26,035 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3439606.6666666665, ans=0.125 2023-11-26 14:52:31,167 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 515950 2023-11-26 14:52:34,997 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 10950, loss[loss=0.07374, simple_loss=0.1071, pruned_loss=0.01283, audio_tagging_loss=0.007349, over 15170.00 frames. ], tot_loss[loss=0.066, simple_loss=0.08985, pruned_loss=0.01239, audio_tagging_loss=0.008688, over 3053066.86 frames. ], batch size: 54, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:52:46,774 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3439740.0, ans=0.015 2023-11-26 14:52:50,663 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3439740.0, ans=0.5 2023-11-26 14:52:51,253 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2023-11-26 14:53:00,496 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3439806.6666666665, ans=0.1 2023-11-26 14:53:07,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3439873.3333333335, ans=0.125 2023-11-26 14:53:09,997 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3439873.3333333335, ans=0.1 2023-11-26 14:53:20,755 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.87 vs. limit=22.5 2023-11-26 14:53:24,424 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3439940.0, ans=0.0 2023-11-26 14:53:27,475 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 516000 2023-11-26 14:53:28,592 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3439940.0, ans=0.125 2023-11-26 14:53:32,848 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 11000, loss[loss=0.06806, simple_loss=0.08811, pruned_loss=0.01537, audio_tagging_loss=0.008634, over 16378.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08921, pruned_loss=0.0122, audio_tagging_loss=0.008804, over 3045928.11 frames. ], batch size: 60, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:53:43,518 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 14:53:47,227 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.917e+01 8.791e+01 9.278e+01 1.005e+02 1.404e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-26 14:54:21,178 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3440273.3333333335, ans=0.1 2023-11-26 14:54:24,884 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3440273.3333333335, ans=0.0 2023-11-26 14:54:25,781 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 516050 2023-11-26 14:54:29,438 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 11050, loss[loss=0.07349, simple_loss=0.09327, pruned_loss=0.01571, audio_tagging_loss=0.01115, over 15585.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.09012, pruned_loss=0.01243, audio_tagging_loss=0.008845, over 3050722.20 frames. ], batch size: 59, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:54:36,413 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.62 vs. limit=15.0 2023-11-26 14:54:39,339 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 14:54:47,682 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.whiten.whitening_limit, batch_count=3440406.6666666665, ans=12.0 2023-11-26 14:55:07,257 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3440540.0, ans=0.035 2023-11-26 14:55:07,415 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3440540.0, ans=0.125 2023-11-26 14:55:18,610 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3440606.6666666665, ans=0.0 2023-11-26 14:55:20,672 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3440606.6666666665, ans=0.125 2023-11-26 14:55:21,583 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 516100 2023-11-26 14:55:24,785 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 11100, loss[loss=0.06876, simple_loss=0.08676, pruned_loss=0.01504, audio_tagging_loss=0.01034, over 15168.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.09024, pruned_loss=0.01248, audio_tagging_loss=0.008976, over 3051375.58 frames. ], batch size: 57, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:55:24,962 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3440673.3333333335, ans=0.2 2023-11-26 14:55:27,006 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3440673.3333333335, ans=0.125 2023-11-26 14:55:27,123 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3440673.3333333335, ans=0.125 2023-11-26 14:55:38,068 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.080e+01 8.786e+01 9.291e+01 1.014e+02 1.274e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-26 14:56:17,658 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 516150 2023-11-26 14:56:20,720 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 11150, loss[loss=0.0792, simple_loss=0.1048, pruned_loss=0.01678, audio_tagging_loss=0.01002, over 14282.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.09014, pruned_loss=0.01253, audio_tagging_loss=0.008981, over 3057801.66 frames. ], batch size: 54, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:56:35,388 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3441073.3333333335, ans=0.1 2023-11-26 14:56:37,444 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3441073.3333333335, ans=0.1 2023-11-26 14:56:39,072 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.34 vs. limit=22.5 2023-11-26 14:56:45,125 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.48 vs. limit=12.0 2023-11-26 14:56:46,043 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3441140.0, ans=0.0 2023-11-26 14:56:54,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3441206.6666666665, ans=0.2 2023-11-26 14:56:59,423 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3441206.6666666665, ans=0.07 2023-11-26 14:57:13,681 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 516200 2023-11-26 14:57:18,227 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 11200, loss[loss=0.07712, simple_loss=0.1037, pruned_loss=0.01322, audio_tagging_loss=0.01204, over 14772.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.08983, pruned_loss=0.01241, audio_tagging_loss=0.009084, over 3050463.43 frames. ], batch size: 56, lr: 1.56e-03, grad_scale: 32.0 2023-11-26 14:57:30,983 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.675e+01 8.740e+01 9.384e+01 1.028e+02 1.331e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-26 14:57:47,658 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3441473.3333333335, ans=0.0 2023-11-26 14:57:54,584 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3441540.0, ans=0.2 2023-11-26 14:58:09,499 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3441606.6666666665, ans=10.0 2023-11-26 14:58:10,338 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 516250 2023-11-26 14:58:12,691 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3441673.3333333335, ans=0.1 2023-11-26 14:58:13,522 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 11250, loss[loss=0.05543, simple_loss=0.07136, pruned_loss=0.007853, audio_tagging_loss=0.0119, over 14698.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.08959, pruned_loss=0.01232, audio_tagging_loss=0.009043, over 3055352.55 frames. ], batch size: 56, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:58:32,608 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3441740.0, ans=0.125 2023-11-26 14:58:48,143 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3441873.3333333335, ans=0.125 2023-11-26 14:58:58,657 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3441940.0, ans=0.125 2023-11-26 14:58:59,061 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.58 vs. limit=15.0 2023-11-26 14:59:05,954 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 516300 2023-11-26 14:59:09,214 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 11300, loss[loss=0.06689, simple_loss=0.09992, pruned_loss=0.01046, audio_tagging_loss=0.00648, over 14843.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.08996, pruned_loss=0.01235, audio_tagging_loss=0.008794, over 3054030.44 frames. ], batch size: 59, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 14:59:17,322 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.32 vs. limit=15.0 2023-11-26 14:59:24,718 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.246e+01 8.684e+01 9.357e+01 1.017e+02 1.284e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-26 15:00:02,085 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 516350 2023-11-26 15:00:04,871 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3442340.0, ans=0.2 2023-11-26 15:00:05,789 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 11350, loss[loss=0.06326, simple_loss=0.09146, pruned_loss=0.01115, audio_tagging_loss=0.006383, over 14996.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08919, pruned_loss=0.01215, audio_tagging_loss=0.008668, over 3050155.75 frames. ], batch size: 58, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 15:00:06,046 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3442340.0, ans=0.125 2023-11-26 15:00:54,305 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3442606.6666666665, ans=0.2 2023-11-26 15:00:58,406 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 516400 2023-11-26 15:01:01,799 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 11400, loss[loss=0.07298, simple_loss=0.09885, pruned_loss=0.01574, audio_tagging_loss=0.007823, over 16506.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08914, pruned_loss=0.01217, audio_tagging_loss=0.008618, over 3046849.97 frames. ], batch size: 64, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 15:01:15,666 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.526e+01 8.777e+01 9.213e+01 1.005e+02 1.277e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-26 15:01:29,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3442806.6666666665, ans=0.125 2023-11-26 15:01:29,819 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.85 vs. limit=6.0 2023-11-26 15:01:35,088 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3442873.3333333335, ans=0.0 2023-11-26 15:01:53,934 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 516450 2023-11-26 15:01:57,081 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 11450, loss[loss=0.0826, simple_loss=0.1166, pruned_loss=0.01718, audio_tagging_loss=0.00712, over 15332.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08917, pruned_loss=0.01221, audio_tagging_loss=0.008633, over 3043902.23 frames. ], batch size: 57, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 15:02:09,161 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3443073.3333333335, ans=0.125 2023-11-26 15:02:34,625 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3443206.6666666665, ans=0.1 2023-11-26 15:02:35,073 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.98 vs. limit=10.0 2023-11-26 15:02:35,901 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.17 vs. limit=15.0 2023-11-26 15:02:43,057 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3443273.3333333335, ans=0.125 2023-11-26 15:02:49,878 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 516500 2023-11-26 15:02:52,648 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3443340.0, ans=0.125 2023-11-26 15:02:53,542 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 11500, loss[loss=0.06599, simple_loss=0.0944, pruned_loss=0.01294, audio_tagging_loss=0.005851, over 16112.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08908, pruned_loss=0.01223, audio_tagging_loss=0.008663, over 3044864.39 frames. ], batch size: 62, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 15:02:55,286 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3443340.0, ans=0.2 2023-11-26 15:03:05,467 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3443406.6666666665, ans=0.125 2023-11-26 15:03:08,392 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.464e+01 8.894e+01 9.338e+01 1.016e+02 1.234e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-26 15:03:15,349 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.83 vs. limit=15.0 2023-11-26 15:03:26,705 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3443540.0, ans=0.1 2023-11-26 15:03:40,849 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3443606.6666666665, ans=0.1 2023-11-26 15:03:46,778 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 516550 2023-11-26 15:03:49,914 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 11550, loss[loss=0.06427, simple_loss=0.09464, pruned_loss=0.009737, audio_tagging_loss=0.007209, over 15200.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08925, pruned_loss=0.012, audio_tagging_loss=0.008651, over 3049982.06 frames. ], batch size: 55, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 15:04:07,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3443740.0, ans=0.125 2023-11-26 15:04:16,273 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3443806.6666666665, ans=0.2 2023-11-26 15:04:25,131 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 15:04:32,794 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3443873.3333333335, ans=10.0 2023-11-26 15:04:40,104 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3443940.0, ans=0.0 2023-11-26 15:04:42,161 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 516600 2023-11-26 15:04:44,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3444006.6666666665, ans=0.125 2023-11-26 15:04:45,629 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 11600, loss[loss=0.07018, simple_loss=0.1028, pruned_loss=0.012, audio_tagging_loss=0.006756, over 15186.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.08934, pruned_loss=0.01202, audio_tagging_loss=0.008642, over 3046930.06 frames. ], batch size: 56, lr: 1.56e-03, grad_scale: 32.0 2023-11-26 15:04:53,742 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.55 vs. limit=15.0 2023-11-26 15:05:00,073 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.205e+01 8.905e+01 9.507e+01 1.006e+02 1.398e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-26 15:05:11,321 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3444140.0, ans=0.125 2023-11-26 15:05:15,662 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3444140.0, ans=0.0 2023-11-26 15:05:22,029 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3444206.6666666665, ans=0.0 2023-11-26 15:05:24,087 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3444206.6666666665, ans=0.1 2023-11-26 15:05:25,726 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.88 vs. limit=15.0 2023-11-26 15:05:37,053 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3444273.3333333335, ans=0.07 2023-11-26 15:05:37,842 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 516650 2023-11-26 15:05:41,438 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 11650, loss[loss=0.07192, simple_loss=0.103, pruned_loss=0.01247, audio_tagging_loss=0.007945, over 14816.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08963, pruned_loss=0.01216, audio_tagging_loss=0.008695, over 3042529.18 frames. ], batch size: 57, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 15:06:15,692 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3444540.0, ans=0.1 2023-11-26 15:06:28,544 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3444606.6666666665, ans=0.0 2023-11-26 15:06:31,291 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3444606.6666666665, ans=0.0 2023-11-26 15:06:33,903 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3444606.6666666665, ans=0.1 2023-11-26 15:06:34,856 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 516700 2023-11-26 15:06:37,991 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 11700, loss[loss=0.07343, simple_loss=0.09965, pruned_loss=0.01332, audio_tagging_loss=0.01028, over 15652.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08888, pruned_loss=0.01208, audio_tagging_loss=0.008745, over 3036681.63 frames. ], batch size: 56, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 15:06:52,889 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.614e+01 8.754e+01 9.292e+01 9.879e+01 2.063e+02, threshold=1.858e+02, percent-clipped=1.0 2023-11-26 15:06:56,167 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3444740.0, ans=0.035 2023-11-26 15:07:29,500 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 516750 2023-11-26 15:07:32,640 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 11750, loss[loss=0.06261, simple_loss=0.09288, pruned_loss=0.00932, audio_tagging_loss=0.006846, over 15228.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08863, pruned_loss=0.01214, audio_tagging_loss=0.008793, over 3046088.21 frames. ], batch size: 56, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 15:07:43,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3445073.3333333335, ans=0.0 2023-11-26 15:07:48,381 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3445073.3333333335, ans=0.0 2023-11-26 15:07:55,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3445140.0, ans=0.1 2023-11-26 15:07:56,948 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 15:08:07,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3445206.6666666665, ans=0.125 2023-11-26 15:08:15,649 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3445206.6666666665, ans=0.1 2023-11-26 15:08:24,951 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 516800 2023-11-26 15:08:28,238 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 11800, loss[loss=0.05433, simple_loss=0.06757, pruned_loss=0.01235, audio_tagging_loss=0.0082, over 14760.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.08817, pruned_loss=0.01211, audio_tagging_loss=0.008865, over 3049233.00 frames. ], batch size: 55, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 15:08:32,051 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3445340.0, ans=0.1 2023-11-26 15:08:33,263 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3445340.0, ans=0.1 2023-11-26 15:08:36,481 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.93 vs. limit=15.0 2023-11-26 15:08:45,370 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.466e+01 8.777e+01 9.316e+01 1.001e+02 1.366e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-26 15:08:46,695 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3445406.6666666665, ans=0.2 2023-11-26 15:08:51,939 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3445473.3333333335, ans=0.1 2023-11-26 15:09:08,951 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3445540.0, ans=0.1 2023-11-26 15:09:19,684 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3445606.6666666665, ans=0.125 2023-11-26 15:09:22,276 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 516850 2023-11-26 15:09:25,456 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 11850, loss[loss=0.05412, simple_loss=0.06691, pruned_loss=0.01134, audio_tagging_loss=0.009327, over 15738.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08858, pruned_loss=0.01223, audio_tagging_loss=0.00888, over 3045250.38 frames. ], batch size: 62, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 15:09:42,146 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3445740.0, ans=0.2 2023-11-26 15:10:03,587 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.23 vs. limit=15.0 2023-11-26 15:10:08,398 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.34 vs. limit=15.0 2023-11-26 15:10:09,486 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.00 vs. limit=15.0 2023-11-26 15:10:17,986 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 516900 2023-11-26 15:10:21,061 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 11900, loss[loss=0.06913, simple_loss=0.09309, pruned_loss=0.01191, audio_tagging_loss=0.01068, over 16239.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.0891, pruned_loss=0.01225, audio_tagging_loss=0.008962, over 3043952.38 frames. ], batch size: 60, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 15:10:35,765 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.403e+01 8.570e+01 9.365e+01 9.968e+01 1.257e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-26 15:10:58,253 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3446206.6666666665, ans=0.125 2023-11-26 15:10:59,813 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.89 vs. limit=10.0 2023-11-26 15:11:01,543 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3446206.6666666665, ans=0.07 2023-11-26 15:11:06,839 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3446273.3333333335, ans=0.1 2023-11-26 15:11:13,056 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 516950 2023-11-26 15:11:16,148 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 11950, loss[loss=0.06902, simple_loss=0.1015, pruned_loss=0.01171, audio_tagging_loss=0.006574, over 16338.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08922, pruned_loss=0.01225, audio_tagging_loss=0.008918, over 3049340.74 frames. ], batch size: 58, lr: 1.56e-03, grad_scale: 16.0 2023-11-26 15:11:26,292 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.31 vs. limit=15.0 2023-11-26 15:11:42,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3446473.3333333335, ans=0.2 2023-11-26 15:11:42,120 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3446473.3333333335, ans=0.1 2023-11-26 15:11:50,469 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3446540.0, ans=0.125 2023-11-26 15:11:50,561 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3446540.0, ans=0.125 2023-11-26 15:11:52,414 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 15:12:04,107 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.98 vs. limit=10.0 2023-11-26 15:12:05,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3446606.6666666665, ans=0.125 2023-11-26 15:12:06,911 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3446606.6666666665, ans=0.125 2023-11-26 15:12:07,686 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 517000 2023-11-26 15:12:07,811 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 15:12:11,028 INFO [train_asr.py:1235] (3/4) Epoch 43, batch 12000, loss[loss=0.05437, simple_loss=0.07119, pruned_loss=0.007663, audio_tagging_loss=0.01111, over 15487.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.08987, pruned_loss=0.01226, audio_tagging_loss=0.009034, over 3048876.77 frames. ], batch size: 60, lr: 1.56e-03, grad_scale: 32.0 2023-11-26 15:12:11,029 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-26 15:12:43,906 INFO [train_asr.py:1267] (3/4) Epoch 43, validation: loss=0.05829, simple_loss=0.05056, pruned_loss=0.00528, audio_tagging_loss=0.02773, over 4681554.00 frames. 2023-11-26 15:12:43,907 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-26 15:12:44,119 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 15:12:58,687 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.129e+01 8.906e+01 9.562e+01 1.016e+02 1.213e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-26 15:13:06,373 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.68 vs. limit=22.5 2023-11-26 15:13:38,090 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 0, loss[loss=0.07007, simple_loss=0.0801, pruned_loss=0.01047, audio_tagging_loss=0.01955, over 14977.00 frames. ], tot_loss[loss=0.07007, simple_loss=0.0801, pruned_loss=0.01047, audio_tagging_loss=0.01955, over 14977.00 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 15:13:38,091 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-26 15:13:59,023 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.4840, 3.8744, 3.1108, 3.7829], device='cuda:3') 2023-11-26 15:14:09,398 INFO [train_asr.py:1267] (3/4) Epoch 44, validation: loss=0.05821, simple_loss=0.05063, pruned_loss=0.005319, audio_tagging_loss=0.02758, over 4681554.00 frames. 2023-11-26 15:14:09,399 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-26 15:14:14,787 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3446840.0, ans=0.0 2023-11-26 15:14:34,464 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 517050 2023-11-26 15:14:34,656 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3446973.3333333335, ans=0.1 2023-11-26 15:14:51,973 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3447040.0, ans=0.0 2023-11-26 15:14:58,478 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3447106.6666666665, ans=0.125 2023-11-26 15:15:05,080 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 50, loss[loss=0.06566, simple_loss=0.08112, pruned_loss=0.01119, audio_tagging_loss=0.0139, over 15079.00 frames. ], tot_loss[loss=0.07372, simple_loss=0.08945, pruned_loss=0.01231, audio_tagging_loss=0.01669, over 686710.86 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 15:15:29,240 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3447306.6666666665, ans=0.1 2023-11-26 15:15:30,266 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 517100 2023-11-26 15:15:32,516 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3447306.6666666665, ans=0.1 2023-11-26 15:15:41,637 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.77 vs. limit=22.5 2023-11-26 15:15:48,797 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.397e+01 9.647e+01 1.037e+02 1.149e+02 1.439e+02, threshold=2.073e+02, percent-clipped=0.0 2023-11-26 15:15:57,594 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3447440.0, ans=0.1 2023-11-26 15:16:01,676 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 100, loss[loss=0.08269, simple_loss=0.1128, pruned_loss=0.01241, audio_tagging_loss=0.01387, over 15498.00 frames. ], tot_loss[loss=0.07258, simple_loss=0.08859, pruned_loss=0.01209, audio_tagging_loss=0.01619, over 1204782.84 frames. ], batch size: 54, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:16:04,645 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.29 vs. limit=15.0 2023-11-26 15:16:12,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3447573.3333333335, ans=0.1 2023-11-26 15:16:21,703 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3447573.3333333335, ans=0.125 2023-11-26 15:16:21,712 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3447573.3333333335, ans=0.05 2023-11-26 15:16:23,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3447640.0, ans=0.125 2023-11-26 15:16:26,512 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 517150 2023-11-26 15:16:31,119 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3447640.0, ans=0.1 2023-11-26 15:16:33,219 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3447640.0, ans=0.1 2023-11-26 15:16:35,393 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3447706.6666666665, ans=0.1 2023-11-26 15:16:38,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3447706.6666666665, ans=0.125 2023-11-26 15:16:40,865 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3447706.6666666665, ans=0.125 2023-11-26 15:16:45,150 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3447706.6666666665, ans=0.0 2023-11-26 15:16:58,402 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 150, loss[loss=0.06561, simple_loss=0.09254, pruned_loss=0.007959, audio_tagging_loss=0.01138, over 15772.00 frames. ], tot_loss[loss=0.07195, simple_loss=0.09018, pruned_loss=0.01231, audio_tagging_loss=0.01455, over 1613496.35 frames. ], batch size: 60, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:16:58,599 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3447840.0, ans=0.125 2023-11-26 15:17:10,979 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3447906.6666666665, ans=0.1 2023-11-26 15:17:10,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3447906.6666666665, ans=0.125 2023-11-26 15:17:11,937 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3447906.6666666665, ans=0.0 2023-11-26 15:17:23,556 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 517200 2023-11-26 15:17:38,376 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3448040.0, ans=0.2 2023-11-26 15:17:43,503 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.508e+01 9.130e+01 9.675e+01 1.049e+02 1.216e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-26 15:17:48,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3448106.6666666665, ans=0.125 2023-11-26 15:17:50,310 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.62 vs. limit=6.0 2023-11-26 15:17:54,481 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 200, loss[loss=0.0777, simple_loss=0.1077, pruned_loss=0.01379, audio_tagging_loss=0.01006, over 15794.00 frames. ], tot_loss[loss=0.07043, simple_loss=0.09063, pruned_loss=0.01234, audio_tagging_loss=0.01278, over 1927014.07 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:18:00,882 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3448173.3333333335, ans=0.015 2023-11-26 15:18:19,032 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 517250 2023-11-26 15:18:25,040 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3448306.6666666665, ans=0.2 2023-11-26 15:18:31,460 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3448373.3333333335, ans=0.125 2023-11-26 15:18:37,537 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3448373.3333333335, ans=0.125 2023-11-26 15:18:51,359 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 250, loss[loss=0.05378, simple_loss=0.07491, pruned_loss=0.008058, audio_tagging_loss=0.008261, over 14893.00 frames. ], tot_loss[loss=0.06921, simple_loss=0.0906, pruned_loss=0.01225, audio_tagging_loss=0.01166, over 2171412.26 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:19:07,390 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.86 vs. limit=15.0 2023-11-26 15:19:14,855 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.31 vs. limit=15.0 2023-11-26 15:19:15,442 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 517300 2023-11-26 15:19:21,400 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3448640.0, ans=0.05 2023-11-26 15:19:36,415 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.235e+01 8.917e+01 9.750e+01 1.047e+02 1.492e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-26 15:19:46,943 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 300, loss[loss=0.06435, simple_loss=0.08503, pruned_loss=0.01403, audio_tagging_loss=0.007808, over 15248.00 frames. ], tot_loss[loss=0.06857, simple_loss=0.09111, pruned_loss=0.01233, audio_tagging_loss=0.01069, over 2370012.11 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:19:49,398 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3448840.0, ans=0.2 2023-11-26 15:19:53,093 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3448840.0, ans=0.0 2023-11-26 15:20:12,215 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 517350 2023-11-26 15:20:43,526 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 350, loss[loss=0.06293, simple_loss=0.08493, pruned_loss=0.01206, audio_tagging_loss=0.008416, over 14317.00 frames. ], tot_loss[loss=0.06722, simple_loss=0.08995, pruned_loss=0.01213, audio_tagging_loss=0.01011, over 2515409.87 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:20:51,107 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3449173.3333333335, ans=0.1 2023-11-26 15:20:56,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3449240.0, ans=0.125 2023-11-26 15:20:57,567 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3449240.0, ans=0.0 2023-11-26 15:21:00,442 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3449240.0, ans=0.125 2023-11-26 15:21:07,963 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 517400 2023-11-26 15:21:28,360 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.480e+01 9.037e+01 9.510e+01 1.047e+02 1.188e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-26 15:21:33,503 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.94 vs. limit=5.0 2023-11-26 15:21:39,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3449506.6666666665, ans=0.125 2023-11-26 15:21:40,187 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 400, loss[loss=0.0741, simple_loss=0.1055, pruned_loss=0.01323, audio_tagging_loss=0.008142, over 14362.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.08875, pruned_loss=0.01198, audio_tagging_loss=0.009731, over 2629110.60 frames. ], batch size: 53, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 15:22:03,996 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 517450 2023-11-26 15:22:24,945 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3449773.3333333335, ans=0.0 2023-11-26 15:22:27,064 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3449773.3333333335, ans=0.0 2023-11-26 15:22:35,244 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 450, loss[loss=0.05279, simple_loss=0.07277, pruned_loss=0.006904, audio_tagging_loss=0.009499, over 14937.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.08968, pruned_loss=0.01218, audio_tagging_loss=0.009516, over 2715069.07 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 15:22:41,142 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.98 vs. limit=12.0 2023-11-26 15:22:47,963 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3449906.6666666665, ans=0.0 2023-11-26 15:23:00,152 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 517500 2023-11-26 15:23:08,205 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.55 vs. limit=15.0 2023-11-26 15:23:20,513 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.561e+01 8.956e+01 9.480e+01 1.000e+02 1.239e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-26 15:23:31,846 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 500, loss[loss=0.08514, simple_loss=0.1207, pruned_loss=0.01753, audio_tagging_loss=0.00726, over 16284.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.08949, pruned_loss=0.01201, audio_tagging_loss=0.009274, over 2784748.16 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:23:32,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3450173.3333333335, ans=0.1 2023-11-26 15:23:35,810 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3450173.3333333335, ans=0.1 2023-11-26 15:23:57,011 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 517550 2023-11-26 15:24:05,678 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3450373.3333333335, ans=0.1 2023-11-26 15:24:07,761 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3450373.3333333335, ans=0.0 2023-11-26 15:24:09,861 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3450373.3333333335, ans=0.0 2023-11-26 15:24:28,951 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 550, loss[loss=0.07479, simple_loss=0.1042, pruned_loss=0.01431, audio_tagging_loss=0.008405, over 14469.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.08956, pruned_loss=0.01206, audio_tagging_loss=0.009237, over 2839451.39 frames. ], batch size: 54, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:24:29,241 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3450506.6666666665, ans=0.2 2023-11-26 15:24:52,325 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 517600 2023-11-26 15:25:14,504 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.620e+01 8.780e+01 9.554e+01 1.038e+02 1.321e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-26 15:25:24,060 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 600, loss[loss=0.05207, simple_loss=0.07266, pruned_loss=0.005735, audio_tagging_loss=0.01001, over 13594.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.09004, pruned_loss=0.01224, audio_tagging_loss=0.009126, over 2886151.73 frames. ], batch size: 53, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:25:48,499 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 517650 2023-11-26 15:25:57,278 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3451040.0, ans=0.0 2023-11-26 15:26:15,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3451106.6666666665, ans=0.1 2023-11-26 15:26:19,361 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 650, loss[loss=0.06958, simple_loss=0.0971, pruned_loss=0.01418, audio_tagging_loss=0.006851, over 15402.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.0898, pruned_loss=0.01222, audio_tagging_loss=0.009089, over 2920581.64 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:26:22,670 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3451173.3333333335, ans=0.125 2023-11-26 15:26:40,002 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3451240.0, ans=0.125 2023-11-26 15:26:45,081 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 517700 2023-11-26 15:27:06,894 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.760e+01 8.698e+01 9.351e+01 9.946e+01 1.223e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-26 15:27:15,788 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 700, loss[loss=0.05265, simple_loss=0.06747, pruned_loss=0.006193, audio_tagging_loss=0.01272, over 15542.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.09039, pruned_loss=0.01227, audio_tagging_loss=0.008898, over 2952809.01 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 8.0 2023-11-26 15:27:25,588 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3451506.6666666665, ans=0.04949747468305833 2023-11-26 15:27:40,477 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 517750 2023-11-26 15:27:46,029 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3451640.0, ans=0.0 2023-11-26 15:27:59,438 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3451706.6666666665, ans=0.125 2023-11-26 15:28:12,465 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 750, loss[loss=0.07798, simple_loss=0.1105, pruned_loss=0.01402, audio_tagging_loss=0.008677, over 15328.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.08997, pruned_loss=0.01225, audio_tagging_loss=0.008936, over 2976883.41 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 8.0 2023-11-26 15:28:26,566 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3451906.6666666665, ans=0.125 2023-11-26 15:28:32,654 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.58 vs. limit=15.0 2023-11-26 15:28:36,631 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 517800 2023-11-26 15:28:45,249 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3451973.3333333335, ans=0.1 2023-11-26 15:28:59,977 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.252e+01 8.944e+01 9.681e+01 1.076e+02 1.736e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-26 15:29:06,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3452106.6666666665, ans=0.0 2023-11-26 15:29:08,442 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 800, loss[loss=0.0468, simple_loss=0.06212, pruned_loss=0.006571, audio_tagging_loss=0.009175, over 14563.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.09095, pruned_loss=0.01238, audio_tagging_loss=0.008986, over 2992904.26 frames. ], batch size: 54, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:29:09,161 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.09 vs. limit=10.0 2023-11-26 15:29:17,814 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3452173.3333333335, ans=0.2 2023-11-26 15:29:19,055 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.24 vs. limit=15.0 2023-11-26 15:29:19,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3452240.0, ans=0.0 2023-11-26 15:29:20,940 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3452240.0, ans=0.0 2023-11-26 15:29:34,023 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 517850 2023-11-26 15:29:48,935 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3452373.3333333335, ans=0.125 2023-11-26 15:30:01,018 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3452440.0, ans=0.0 2023-11-26 15:30:01,057 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3452440.0, ans=0.1 2023-11-26 15:30:04,070 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 850, loss[loss=0.08037, simple_loss=0.1168, pruned_loss=0.01424, audio_tagging_loss=0.007715, over 15348.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.09037, pruned_loss=0.01221, audio_tagging_loss=0.009037, over 3007034.70 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:30:04,260 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3452506.6666666665, ans=0.2 2023-11-26 15:30:05,740 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3452506.6666666665, ans=0.1 2023-11-26 15:30:19,833 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3452573.3333333335, ans=0.125 2023-11-26 15:30:26,266 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3452640.0, ans=0.2 2023-11-26 15:30:29,235 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 517900 2023-11-26 15:30:52,037 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.628e+01 8.827e+01 9.589e+01 1.017e+02 1.364e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-26 15:31:00,599 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 900, loss[loss=0.05518, simple_loss=0.07774, pruned_loss=0.008516, audio_tagging_loss=0.007799, over 15847.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.09038, pruned_loss=0.01224, audio_tagging_loss=0.00907, over 3009289.16 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:31:17,762 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3452906.6666666665, ans=0.125 2023-11-26 15:31:19,097 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.23 vs. limit=15.0 2023-11-26 15:31:24,585 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 517950 2023-11-26 15:31:42,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3453040.0, ans=0.2 2023-11-26 15:31:43,326 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3453106.6666666665, ans=0.07 2023-11-26 15:31:54,755 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 950, loss[loss=0.07051, simple_loss=0.096, pruned_loss=0.0115, audio_tagging_loss=0.01101, over 14349.00 frames. ], tot_loss[loss=0.06675, simple_loss=0.09091, pruned_loss=0.01233, audio_tagging_loss=0.008971, over 3016088.59 frames. ], batch size: 54, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:32:03,997 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3453173.3333333335, ans=0.1 2023-11-26 15:32:08,308 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3453240.0, ans=0.1 2023-11-26 15:32:11,489 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3453240.0, ans=0.1 2023-11-26 15:32:20,086 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 518000 2023-11-26 15:32:31,303 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3453373.3333333335, ans=0.5 2023-11-26 15:32:32,240 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3453373.3333333335, ans=0.125 2023-11-26 15:32:32,321 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3453373.3333333335, ans=0.125 2023-11-26 15:32:41,772 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.555e+01 8.765e+01 9.471e+01 9.957e+01 1.208e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-26 15:32:44,185 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3453440.0, ans=0.125 2023-11-26 15:32:50,866 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 1000, loss[loss=0.07292, simple_loss=0.09985, pruned_loss=0.01683, audio_tagging_loss=0.006165, over 14564.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08947, pruned_loss=0.01212, audio_tagging_loss=0.008833, over 3015917.54 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:33:14,241 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 15:33:14,488 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3453640.0, ans=0.07 2023-11-26 15:33:15,314 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 518050 2023-11-26 15:33:35,349 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3453773.3333333335, ans=0.0 2023-11-26 15:33:36,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3453773.3333333335, ans=0.125 2023-11-26 15:33:39,339 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3453773.3333333335, ans=0.125 2023-11-26 15:33:43,465 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3453773.3333333335, ans=0.0 2023-11-26 15:33:46,940 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 1050, loss[loss=0.05948, simple_loss=0.07234, pruned_loss=0.01057, audio_tagging_loss=0.01274, over 15143.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.08976, pruned_loss=0.01232, audio_tagging_loss=0.00873, over 3014550.57 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:33:52,543 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3453840.0, ans=0.1 2023-11-26 15:33:56,022 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.98 vs. limit=15.0 2023-11-26 15:34:00,928 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3453906.6666666665, ans=0.125 2023-11-26 15:34:01,195 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.86 vs. limit=22.5 2023-11-26 15:34:05,390 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3453906.6666666665, ans=0.0 2023-11-26 15:34:11,135 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 518100 2023-11-26 15:34:32,608 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff3.min_abs, batch_count=3454106.6666666665, ans=0.2 2023-11-26 15:34:33,432 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.459e+01 8.861e+01 9.465e+01 1.011e+02 1.415e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 15:34:37,226 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.83 vs. limit=15.0 2023-11-26 15:34:42,002 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 1100, loss[loss=0.05278, simple_loss=0.06859, pruned_loss=0.01111, audio_tagging_loss=0.007381, over 15123.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.09002, pruned_loss=0.01237, audio_tagging_loss=0.008637, over 3019182.25 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:34:45,234 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 15:34:51,851 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3454240.0, ans=0.1 2023-11-26 15:34:54,654 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3454240.0, ans=0.125 2023-11-26 15:35:05,648 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3454306.6666666665, ans=0.0 2023-11-26 15:35:06,599 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 518150 2023-11-26 15:35:13,040 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.64 vs. limit=15.0 2023-11-26 15:35:15,700 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3454373.3333333335, ans=0.05 2023-11-26 15:35:32,700 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3454440.0, ans=0.1 2023-11-26 15:35:33,791 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3454440.0, ans=0.1 2023-11-26 15:35:37,333 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 1150, loss[loss=0.05185, simple_loss=0.07272, pruned_loss=0.007378, audio_tagging_loss=0.008112, over 15842.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.09058, pruned_loss=0.0124, audio_tagging_loss=0.008649, over 3025538.93 frames. ], batch size: 61, lr: 1.54e-03, grad_scale: 8.0 2023-11-26 15:35:50,856 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3454573.3333333335, ans=0.2 2023-11-26 15:35:51,886 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-26 15:36:01,834 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 518200 2023-11-26 15:36:05,518 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3454640.0, ans=0.125 2023-11-26 15:36:13,152 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.88 vs. limit=6.0 2023-11-26 15:36:14,088 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3454706.6666666665, ans=0.125 2023-11-26 15:36:21,017 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3454773.3333333335, ans=0.0 2023-11-26 15:36:24,926 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.587e+01 8.766e+01 9.295e+01 9.999e+01 1.209e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-26 15:36:32,879 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 1200, loss[loss=0.06628, simple_loss=0.09024, pruned_loss=0.01256, audio_tagging_loss=0.008598, over 14981.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08979, pruned_loss=0.01231, audio_tagging_loss=0.008588, over 3031255.97 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:36:33,016 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3454840.0, ans=0.125 2023-11-26 15:36:33,112 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3454840.0, ans=0.125 2023-11-26 15:36:41,212 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3454840.0, ans=0.125 2023-11-26 15:36:46,992 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.10 vs. limit=15.0 2023-11-26 15:36:57,047 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 518250 2023-11-26 15:37:23,569 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.50 vs. limit=15.0 2023-11-26 15:37:28,281 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 1250, loss[loss=0.05815, simple_loss=0.07426, pruned_loss=0.01263, audio_tagging_loss=0.008384, over 14518.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08907, pruned_loss=0.01237, audio_tagging_loss=0.008558, over 3027156.24 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 8.0 2023-11-26 15:37:33,792 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3455173.3333333335, ans=0.1 2023-11-26 15:37:52,875 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 518300 2023-11-26 15:38:14,442 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.18 vs. limit=22.5 2023-11-26 15:38:16,818 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.326e+01 8.858e+01 9.436e+01 1.015e+02 1.276e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-26 15:38:23,784 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 1300, loss[loss=0.06271, simple_loss=0.08958, pruned_loss=0.01106, audio_tagging_loss=0.006855, over 15254.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08947, pruned_loss=0.01231, audio_tagging_loss=0.008542, over 3028354.81 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 8.0 2023-11-26 15:38:25,119 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3455506.6666666665, ans=0.1 2023-11-26 15:38:26,500 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.76 vs. limit=22.5 2023-11-26 15:38:29,802 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.24 vs. limit=10.0 2023-11-26 15:38:37,304 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3455573.3333333335, ans=0.1 2023-11-26 15:38:41,477 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3455573.3333333335, ans=0.1 2023-11-26 15:38:44,598 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3455640.0, ans=0.2 2023-11-26 15:38:48,245 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 518350 2023-11-26 15:39:06,219 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3455706.6666666665, ans=0.125 2023-11-26 15:39:19,313 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 1350, loss[loss=0.08059, simple_loss=0.1107, pruned_loss=0.0183, audio_tagging_loss=0.006941, over 15225.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.08985, pruned_loss=0.01242, audio_tagging_loss=0.0086, over 3033386.40 frames. ], batch size: 54, lr: 1.54e-03, grad_scale: 8.0 2023-11-26 15:39:43,426 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 518400 2023-11-26 15:39:43,495 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3455973.3333333335, ans=0.125 2023-11-26 15:39:43,983 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.61 vs. limit=12.0 2023-11-26 15:40:01,173 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 15:40:04,694 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3456106.6666666665, ans=0.125 2023-11-26 15:40:08,634 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.705e+01 8.757e+01 9.290e+01 1.016e+02 1.312e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-26 15:40:15,037 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 1400, loss[loss=0.0656, simple_loss=0.08436, pruned_loss=0.01403, audio_tagging_loss=0.009384, over 15432.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08918, pruned_loss=0.01219, audio_tagging_loss=0.008719, over 3036709.68 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 8.0 2023-11-26 15:40:18,076 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3456173.3333333335, ans=0.125 2023-11-26 15:40:26,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3456240.0, ans=0.1 2023-11-26 15:40:39,869 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 518450 2023-11-26 15:40:52,588 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3456373.3333333335, ans=0.1 2023-11-26 15:41:04,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3456440.0, ans=0.1 2023-11-26 15:41:04,943 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3456440.0, ans=0.1 2023-11-26 15:41:10,961 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 1450, loss[loss=0.07509, simple_loss=0.1041, pruned_loss=0.01595, audio_tagging_loss=0.007105, over 15162.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08924, pruned_loss=0.01217, audio_tagging_loss=0.008741, over 3043219.76 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 8.0 2023-11-26 15:41:15,056 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3456506.6666666665, ans=0.025 2023-11-26 15:41:17,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3456506.6666666665, ans=0.5 2023-11-26 15:41:18,438 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.32 vs. limit=22.5 2023-11-26 15:41:27,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3456573.3333333335, ans=0.0 2023-11-26 15:41:27,933 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3456573.3333333335, ans=0.0 2023-11-26 15:41:36,259 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 518500 2023-11-26 15:41:45,595 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3456706.6666666665, ans=0.1 2023-11-26 15:41:55,281 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3456773.3333333335, ans=0.125 2023-11-26 15:42:00,970 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.759e+01 9.010e+01 9.704e+01 1.035e+02 1.675e+02, threshold=1.941e+02, percent-clipped=0.0 2023-11-26 15:42:08,024 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 1500, loss[loss=0.05339, simple_loss=0.07898, pruned_loss=0.00845, audio_tagging_loss=0.005453, over 14860.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08956, pruned_loss=0.01224, audio_tagging_loss=0.008735, over 3051650.18 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 8.0 2023-11-26 15:42:09,906 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3456840.0, ans=0.125 2023-11-26 15:42:12,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3456840.0, ans=0.125 2023-11-26 15:42:18,327 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3456906.6666666665, ans=0.2 2023-11-26 15:42:18,536 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.32 vs. limit=15.0 2023-11-26 15:42:24,679 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3456906.6666666665, ans=0.125 2023-11-26 15:42:30,742 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3456973.3333333335, ans=0.0 2023-11-26 15:42:32,709 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 518550 2023-11-26 15:42:53,094 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3457106.6666666665, ans=0.125 2023-11-26 15:42:57,184 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3457106.6666666665, ans=0.125 2023-11-26 15:43:03,424 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 1550, loss[loss=0.07797, simple_loss=0.1073, pruned_loss=0.01452, audio_tagging_loss=0.009783, over 15448.00 frames. ], tot_loss[loss=0.066, simple_loss=0.08999, pruned_loss=0.01219, audio_tagging_loss=0.008819, over 3047519.72 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 8.0 2023-11-26 15:43:07,986 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3457173.3333333335, ans=0.1 2023-11-26 15:43:11,121 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3457173.3333333335, ans=0.0 2023-11-26 15:43:27,895 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 518600 2023-11-26 15:43:34,708 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3457306.6666666665, ans=0.2 2023-11-26 15:43:37,876 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3457373.3333333335, ans=0.1 2023-11-26 15:43:40,149 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3457373.3333333335, ans=0.125 2023-11-26 15:43:52,542 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.520e+01 8.861e+01 9.494e+01 1.024e+02 1.186e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-26 15:43:58,686 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3457506.6666666665, ans=0.0 2023-11-26 15:43:59,620 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 1600, loss[loss=0.05497, simple_loss=0.07674, pruned_loss=0.008882, audio_tagging_loss=0.00772, over 14404.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.09006, pruned_loss=0.01223, audio_tagging_loss=0.008819, over 3038899.09 frames. ], batch size: 53, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:43:59,896 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3457506.6666666665, ans=0.125 2023-11-26 15:44:03,000 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3457506.6666666665, ans=0.125 2023-11-26 15:44:05,835 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3457506.6666666665, ans=0.125 2023-11-26 15:44:08,844 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3457506.6666666665, ans=0.2 2023-11-26 15:44:16,251 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3457573.3333333335, ans=0.2 2023-11-26 15:44:24,891 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 518650 2023-11-26 15:44:48,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=3457773.3333333335, ans=0.1 2023-11-26 15:44:55,865 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 1650, loss[loss=0.07289, simple_loss=0.09929, pruned_loss=0.01441, audio_tagging_loss=0.008831, over 16092.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.0893, pruned_loss=0.01211, audio_tagging_loss=0.008903, over 3040550.24 frames. ], batch size: 61, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:45:04,694 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3457840.0, ans=0.5 2023-11-26 15:45:20,417 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 518700 2023-11-26 15:45:31,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3458040.0, ans=0.0 2023-11-26 15:45:34,766 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.43 vs. limit=15.0 2023-11-26 15:45:42,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3458106.6666666665, ans=0.09899494936611666 2023-11-26 15:45:43,919 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3458106.6666666665, ans=0.125 2023-11-26 15:45:45,762 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.390e+01 9.023e+01 9.377e+01 1.009e+02 1.256e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-26 15:45:48,207 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3458106.6666666665, ans=0.125 2023-11-26 15:45:49,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3458106.6666666665, ans=0.1 2023-11-26 15:45:52,308 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 1700, loss[loss=0.06902, simple_loss=0.09655, pruned_loss=0.01086, audio_tagging_loss=0.009885, over 15682.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08906, pruned_loss=0.01198, audio_tagging_loss=0.008995, over 3050850.36 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:45:53,629 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 15:46:05,274 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3458240.0, ans=0.125 2023-11-26 15:46:16,689 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 518750 2023-11-26 15:46:39,465 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3458440.0, ans=0.0 2023-11-26 15:46:41,719 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.03 vs. limit=15.0 2023-11-26 15:46:47,672 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 1750, loss[loss=0.05491, simple_loss=0.07718, pruned_loss=0.01053, audio_tagging_loss=0.005782, over 14048.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.08959, pruned_loss=0.01213, audio_tagging_loss=0.008879, over 3049483.69 frames. ], batch size: 53, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:47:01,644 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3458573.3333333335, ans=0.1 2023-11-26 15:47:08,006 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3458573.3333333335, ans=0.1 2023-11-26 15:47:13,186 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 518800 2023-11-26 15:47:24,469 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3458706.6666666665, ans=0.04949747468305833 2023-11-26 15:47:26,414 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 15:47:32,073 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.49 vs. limit=22.5 2023-11-26 15:47:36,691 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3458773.3333333335, ans=0.0 2023-11-26 15:47:38,072 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.678e+01 8.660e+01 9.290e+01 1.019e+02 1.190e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-26 15:47:44,455 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 1800, loss[loss=0.06906, simple_loss=0.1011, pruned_loss=0.01386, audio_tagging_loss=0.004672, over 15139.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.08845, pruned_loss=0.01198, audio_tagging_loss=0.008799, over 3047157.28 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:48:08,965 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 518850 2023-11-26 15:48:09,092 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 15:48:21,384 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3459040.0, ans=0.125 2023-11-26 15:48:40,879 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 1850, loss[loss=0.05908, simple_loss=0.08882, pruned_loss=0.007016, audio_tagging_loss=0.007659, over 14818.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08961, pruned_loss=0.01219, audio_tagging_loss=0.0087, over 3048452.33 frames. ], batch size: 53, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:48:47,984 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.28 vs. limit=15.0 2023-11-26 15:48:49,866 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3459173.3333333335, ans=0.125 2023-11-26 15:48:49,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3459173.3333333335, ans=0.125 2023-11-26 15:49:04,369 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3459306.6666666665, ans=0.2 2023-11-26 15:49:05,178 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 518900 2023-11-26 15:49:05,247 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3459306.6666666665, ans=0.125 2023-11-26 15:49:14,498 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.87 vs. limit=15.0 2023-11-26 15:49:30,368 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.036e+01 8.755e+01 9.422e+01 1.025e+02 1.230e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-26 15:49:32,665 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3459440.0, ans=0.05 2023-11-26 15:49:36,705 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 1900, loss[loss=0.06302, simple_loss=0.08555, pruned_loss=0.0126, audio_tagging_loss=0.007644, over 13721.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08867, pruned_loss=0.01193, audio_tagging_loss=0.008639, over 3049991.24 frames. ], batch size: 53, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:49:42,283 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3459506.6666666665, ans=0.1 2023-11-26 15:50:02,395 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 518950 2023-11-26 15:50:05,907 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3459640.0, ans=0.0 2023-11-26 15:50:09,261 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.19 vs. limit=15.0 2023-11-26 15:50:12,149 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3459706.6666666665, ans=0.125 2023-11-26 15:50:12,239 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3459706.6666666665, ans=0.125 2023-11-26 15:50:18,707 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3459706.6666666665, ans=0.2 2023-11-26 15:50:30,486 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.29 vs. limit=15.0 2023-11-26 15:50:33,064 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 1950, loss[loss=0.06177, simple_loss=0.08884, pruned_loss=0.01034, audio_tagging_loss=0.007007, over 15146.00 frames. ], tot_loss[loss=0.06435, simple_loss=0.08762, pruned_loss=0.01178, audio_tagging_loss=0.008761, over 3045838.94 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:50:37,510 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.44 vs. limit=6.0 2023-11-26 15:50:51,066 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3459906.6666666665, ans=0.0 2023-11-26 15:50:58,245 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 519000 2023-11-26 15:51:00,947 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.63 vs. limit=15.0 2023-11-26 15:51:13,975 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3460040.0, ans=0.0 2023-11-26 15:51:23,437 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.206e+01 8.668e+01 9.341e+01 1.000e+02 1.329e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-26 15:51:26,264 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3460106.6666666665, ans=0.07 2023-11-26 15:51:30,362 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 2000, loss[loss=0.07294, simple_loss=0.09683, pruned_loss=0.01425, audio_tagging_loss=0.01028, over 16508.00 frames. ], tot_loss[loss=0.06416, simple_loss=0.08711, pruned_loss=0.0118, audio_tagging_loss=0.008799, over 3049012.32 frames. ], batch size: 61, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 15:51:35,953 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3460173.3333333335, ans=0.0 2023-11-26 15:51:38,197 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3460173.3333333335, ans=0.1 2023-11-26 15:51:53,324 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=3460306.6666666665, ans=0.025 2023-11-26 15:51:54,331 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 519050 2023-11-26 15:51:55,616 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3460306.6666666665, ans=0.1 2023-11-26 15:52:26,023 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 2050, loss[loss=0.06607, simple_loss=0.09884, pruned_loss=0.009338, audio_tagging_loss=0.007312, over 15824.00 frames. ], tot_loss[loss=0.06465, simple_loss=0.08801, pruned_loss=0.01193, audio_tagging_loss=0.008711, over 3040983.05 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 15:52:26,160 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3460506.6666666665, ans=0.125 2023-11-26 15:52:27,376 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3460506.6666666665, ans=0.05 2023-11-26 15:52:30,581 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3460506.6666666665, ans=0.1 2023-11-26 15:52:51,673 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 519100 2023-11-26 15:52:58,567 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.02 vs. limit=15.0 2023-11-26 15:53:07,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3460706.6666666665, ans=0.125 2023-11-26 15:53:11,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3460773.3333333335, ans=0.0 2023-11-26 15:53:15,086 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.943e+01 8.697e+01 9.387e+01 1.014e+02 2.680e+02, threshold=1.877e+02, percent-clipped=1.0 2023-11-26 15:53:20,291 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.06 vs. limit=22.5 2023-11-26 15:53:21,944 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 2100, loss[loss=0.06375, simple_loss=0.09275, pruned_loss=0.01143, audio_tagging_loss=0.005939, over 14206.00 frames. ], tot_loss[loss=0.06412, simple_loss=0.08735, pruned_loss=0.01178, audio_tagging_loss=0.008659, over 3036332.23 frames. ], batch size: 54, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 15:53:28,031 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.48 vs. limit=6.0 2023-11-26 15:53:44,599 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3460973.3333333335, ans=0.0 2023-11-26 15:53:46,621 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 519150 2023-11-26 15:54:18,649 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 2150, loss[loss=0.0837, simple_loss=0.118, pruned_loss=0.01524, audio_tagging_loss=0.009437, over 15210.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08888, pruned_loss=0.01212, audio_tagging_loss=0.00856, over 3036068.18 frames. ], batch size: 54, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 15:54:19,922 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3461173.3333333335, ans=0.125 2023-11-26 15:54:22,214 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3461173.3333333335, ans=0.0 2023-11-26 15:54:32,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3461240.0, ans=0.125 2023-11-26 15:54:36,071 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3461240.0, ans=0.0 2023-11-26 15:54:42,979 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 519200 2023-11-26 15:54:47,679 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3461306.6666666665, ans=0.125 2023-11-26 15:54:50,841 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3461373.3333333335, ans=0.125 2023-11-26 15:54:52,772 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 15:54:53,048 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3461373.3333333335, ans=0.0 2023-11-26 15:55:07,732 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.539e+01 9.113e+01 9.715e+01 1.044e+02 1.389e+02, threshold=1.943e+02, percent-clipped=0.0 2023-11-26 15:55:14,124 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 2200, loss[loss=0.07708, simple_loss=0.1126, pruned_loss=0.01444, audio_tagging_loss=0.006337, over 15338.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08888, pruned_loss=0.01222, audio_tagging_loss=0.008612, over 3039859.67 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 15:55:28,879 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3461573.3333333335, ans=0.125 2023-11-26 15:55:39,016 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 519250 2023-11-26 15:55:39,231 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3461640.0, ans=0.125 2023-11-26 15:56:00,698 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3461773.3333333335, ans=0.125 2023-11-26 15:56:02,731 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3461773.3333333335, ans=0.125 2023-11-26 15:56:10,446 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 2250, loss[loss=0.06601, simple_loss=0.08732, pruned_loss=0.0115, audio_tagging_loss=0.01085, over 15891.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.08898, pruned_loss=0.01215, audio_tagging_loss=0.008668, over 3041937.73 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:56:11,752 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3461840.0, ans=0.1 2023-11-26 15:56:21,162 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.11 vs. limit=15.0 2023-11-26 15:56:26,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3461906.6666666665, ans=0.05 2023-11-26 15:56:35,495 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 519300 2023-11-26 15:56:39,891 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3461973.3333333335, ans=0.0 2023-11-26 15:56:41,987 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3461973.3333333335, ans=0.04949747468305833 2023-11-26 15:56:44,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3462040.0, ans=0.125 2023-11-26 15:56:46,357 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3462040.0, ans=0.125 2023-11-26 15:56:51,148 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3462040.0, ans=0.125 2023-11-26 15:57:00,477 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.672e+01 8.882e+01 9.360e+01 1.008e+02 1.473e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-26 15:57:06,966 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 2300, loss[loss=0.07427, simple_loss=0.1038, pruned_loss=0.01316, audio_tagging_loss=0.009193, over 15337.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08977, pruned_loss=0.01224, audio_tagging_loss=0.008666, over 3040275.22 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:57:30,735 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 519350 2023-11-26 15:57:39,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3462373.3333333335, ans=0.125 2023-11-26 15:57:40,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3462373.3333333335, ans=0.125 2023-11-26 15:57:42,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3462373.3333333335, ans=0.125 2023-11-26 15:57:44,406 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3462373.3333333335, ans=0.2 2023-11-26 15:57:53,297 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.03 vs. limit=15.0 2023-11-26 15:57:56,360 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 15:57:58,891 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.63 vs. limit=15.0 2023-11-26 15:57:59,777 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3462440.0, ans=0.125 2023-11-26 15:57:59,902 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.77 vs. limit=10.0 2023-11-26 15:58:02,744 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 2350, loss[loss=0.05613, simple_loss=0.05983, pruned_loss=0.01328, audio_tagging_loss=0.01293, over 15036.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.0889, pruned_loss=0.01205, audio_tagging_loss=0.00884, over 3046617.51 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 15:58:09,396 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3462506.6666666665, ans=0.125 2023-11-26 15:58:19,590 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3462573.3333333335, ans=0.125 2023-11-26 15:58:26,781 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 519400 2023-11-26 15:58:34,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3462640.0, ans=0.0 2023-11-26 15:58:35,554 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3462706.6666666665, ans=0.2 2023-11-26 15:58:52,978 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.279e+01 8.835e+01 9.579e+01 1.049e+02 1.967e+02, threshold=1.916e+02, percent-clipped=1.0 2023-11-26 15:58:58,917 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 2400, loss[loss=0.08945, simple_loss=0.1244, pruned_loss=0.02001, audio_tagging_loss=0.007243, over 14171.00 frames. ], tot_loss[loss=0.0663, simple_loss=0.0903, pruned_loss=0.01233, audio_tagging_loss=0.008822, over 3040583.12 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 15:59:24,245 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 519450 2023-11-26 15:59:27,605 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3462973.3333333335, ans=0.125 2023-11-26 15:59:32,977 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3463040.0, ans=0.5 2023-11-26 15:59:40,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3463040.0, ans=0.125 2023-11-26 15:59:46,328 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3463106.6666666665, ans=0.1 2023-11-26 15:59:47,475 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 15:59:50,565 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3463106.6666666665, ans=0.0 2023-11-26 15:59:54,843 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 2450, loss[loss=0.06755, simple_loss=0.09745, pruned_loss=0.01245, audio_tagging_loss=0.00638, over 15065.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.09011, pruned_loss=0.01227, audio_tagging_loss=0.008848, over 3036772.65 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:00:03,771 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3463173.3333333335, ans=0.125 2023-11-26 16:00:20,191 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 519500 2023-11-26 16:00:22,781 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.38 vs. limit=15.0 2023-11-26 16:00:46,233 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.797e+01 8.970e+01 9.604e+01 1.012e+02 1.518e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-26 16:00:46,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3463440.0, ans=0.0 2023-11-26 16:00:52,115 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 2500, loss[loss=0.05035, simple_loss=0.06598, pruned_loss=0.008104, audio_tagging_loss=0.009258, over 14804.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.08958, pruned_loss=0.01212, audio_tagging_loss=0.00896, over 3031600.78 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:00:52,247 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3463506.6666666665, ans=0.1 2023-11-26 16:00:52,640 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.46 vs. limit=15.0 2023-11-26 16:01:10,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3463573.3333333335, ans=0.125 2023-11-26 16:01:12,230 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3463573.3333333335, ans=0.95 2023-11-26 16:01:16,333 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 519550 2023-11-26 16:01:21,227 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3463640.0, ans=0.125 2023-11-26 16:01:47,785 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 2550, loss[loss=0.07091, simple_loss=0.09628, pruned_loss=0.01433, audio_tagging_loss=0.008445, over 15926.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08925, pruned_loss=0.01207, audio_tagging_loss=0.008951, over 3035282.44 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:02:12,979 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 519600 2023-11-26 16:02:16,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3463973.3333333335, ans=0.125 2023-11-26 16:02:20,454 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3463973.3333333335, ans=0.0 2023-11-26 16:02:22,635 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3464040.0, ans=0.125 2023-11-26 16:02:36,019 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3464106.6666666665, ans=0.0 2023-11-26 16:02:40,008 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.369e+01 8.652e+01 9.307e+01 9.985e+01 1.166e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-26 16:02:44,316 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 2600, loss[loss=0.06545, simple_loss=0.09369, pruned_loss=0.01032, audio_tagging_loss=0.008285, over 15404.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08892, pruned_loss=0.01187, audio_tagging_loss=0.008831, over 3037086.08 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:02:55,276 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3464240.0, ans=0.0 2023-11-26 16:03:02,839 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3464240.0, ans=0.0 2023-11-26 16:03:09,104 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.29 vs. limit=10.0 2023-11-26 16:03:09,626 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 519650 2023-11-26 16:03:22,242 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.35 vs. limit=15.0 2023-11-26 16:03:39,569 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.20 vs. limit=22.5 2023-11-26 16:03:40,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3464506.6666666665, ans=0.125 2023-11-26 16:03:40,919 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 2650, loss[loss=0.05726, simple_loss=0.08116, pruned_loss=0.008921, audio_tagging_loss=0.007757, over 14780.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08855, pruned_loss=0.01192, audio_tagging_loss=0.008779, over 3031600.80 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:03:55,829 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.01 vs. limit=15.0 2023-11-26 16:03:57,674 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.74 vs. limit=15.0 2023-11-26 16:04:05,224 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 519700 2023-11-26 16:04:06,451 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3464640.0, ans=0.125 2023-11-26 16:04:08,666 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3464640.0, ans=0.125 2023-11-26 16:04:12,967 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.86 vs. limit=10.0 2023-11-26 16:04:18,334 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3464706.6666666665, ans=0.125 2023-11-26 16:04:22,469 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3464706.6666666665, ans=0.5 2023-11-26 16:04:32,547 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.592e+01 8.790e+01 9.468e+01 1.013e+02 1.366e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-26 16:04:36,901 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 2700, loss[loss=0.08561, simple_loss=0.1267, pruned_loss=0.01651, audio_tagging_loss=0.00577, over 16138.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08912, pruned_loss=0.01195, audio_tagging_loss=0.008627, over 3034505.53 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:04:52,508 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3464906.6666666665, ans=0.1 2023-11-26 16:04:52,727 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 16:05:00,292 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3464973.3333333335, ans=0.5 2023-11-26 16:05:00,731 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.82 vs. limit=15.0 2023-11-26 16:05:02,151 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 519750 2023-11-26 16:05:03,378 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 16:05:04,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3464973.3333333335, ans=0.2 2023-11-26 16:05:12,870 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3465040.0, ans=0.2 2023-11-26 16:05:18,803 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3465040.0, ans=0.0 2023-11-26 16:05:32,319 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3465173.3333333335, ans=0.125 2023-11-26 16:05:33,066 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 2750, loss[loss=0.06399, simple_loss=0.08511, pruned_loss=0.01007, audio_tagging_loss=0.01137, over 15359.00 frames. ], tot_loss[loss=0.06489, simple_loss=0.08842, pruned_loss=0.01199, audio_tagging_loss=0.008693, over 3036035.14 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:05:37,192 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3465173.3333333335, ans=0.2 2023-11-26 16:05:48,147 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3465240.0, ans=0.0 2023-11-26 16:05:57,563 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 519800 2023-11-26 16:06:08,684 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3465373.3333333335, ans=0.2 2023-11-26 16:06:11,755 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3465373.3333333335, ans=0.0 2023-11-26 16:06:18,196 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3465440.0, ans=0.125 2023-11-26 16:06:18,221 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3465440.0, ans=0.125 2023-11-26 16:06:22,656 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 16:06:25,290 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.739e+01 8.811e+01 9.283e+01 1.006e+02 1.287e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-26 16:06:28,786 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3465506.6666666665, ans=0.125 2023-11-26 16:06:29,588 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 2800, loss[loss=0.04791, simple_loss=0.06443, pruned_loss=0.007176, audio_tagging_loss=0.008518, over 15152.00 frames. ], tot_loss[loss=0.06464, simple_loss=0.08797, pruned_loss=0.01199, audio_tagging_loss=0.008665, over 3037472.67 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:06:39,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3465573.3333333335, ans=0.125 2023-11-26 16:06:45,409 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.16 vs. limit=15.0 2023-11-26 16:06:54,056 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 519850 2023-11-26 16:06:54,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3465640.0, ans=0.04949747468305833 2023-11-26 16:07:08,175 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3465706.6666666665, ans=0.0 2023-11-26 16:07:14,568 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3465773.3333333335, ans=0.1 2023-11-26 16:07:24,880 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 2850, loss[loss=0.05665, simple_loss=0.07976, pruned_loss=0.006628, audio_tagging_loss=0.01014, over 14415.00 frames. ], tot_loss[loss=0.06466, simple_loss=0.08819, pruned_loss=0.012, audio_tagging_loss=0.008571, over 3041896.09 frames. ], batch size: 54, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:07:26,244 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3465840.0, ans=0.125 2023-11-26 16:07:50,756 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 519900 2023-11-26 16:08:06,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=3466040.0, ans=0.95 2023-11-26 16:08:15,309 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=7.99 vs. limit=12.0 2023-11-26 16:08:18,472 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.391e+01 8.838e+01 9.404e+01 1.047e+02 1.303e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-26 16:08:21,802 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 2900, loss[loss=0.06215, simple_loss=0.08733, pruned_loss=0.009389, audio_tagging_loss=0.009095, over 15281.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.08877, pruned_loss=0.01206, audio_tagging_loss=0.008599, over 3043089.68 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:08:46,480 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 519950 2023-11-26 16:09:07,074 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3466440.0, ans=0.125 2023-11-26 16:09:18,646 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 2950, loss[loss=0.07166, simple_loss=0.1051, pruned_loss=0.01274, audio_tagging_loss=0.006365, over 16108.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.08919, pruned_loss=0.01215, audio_tagging_loss=0.008638, over 3040681.36 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:09:43,091 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 520000 2023-11-26 16:10:03,814 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.86 vs. limit=15.0 2023-11-26 16:10:08,856 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3466773.3333333335, ans=0.125 2023-11-26 16:10:12,942 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.519e+01 8.851e+01 9.554e+01 1.014e+02 1.213e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-26 16:10:16,215 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 3000, loss[loss=0.05442, simple_loss=0.06966, pruned_loss=0.0111, audio_tagging_loss=0.008489, over 15404.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.08976, pruned_loss=0.01215, audio_tagging_loss=0.008712, over 3049300.28 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:10:16,216 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-26 16:10:45,111 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.1945, 3.9725, 3.7691, 3.3715], device='cuda:3') 2023-11-26 16:10:45,297 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.9934, 5.8676, 5.6588, 5.5824], device='cuda:3') 2023-11-26 16:10:45,849 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.5156, 3.3622, 3.7526, 3.6029], device='cuda:3') 2023-11-26 16:10:48,829 INFO [train_asr.py:1267] (3/4) Epoch 44, validation: loss=0.05748, simple_loss=0.05058, pruned_loss=0.005287, audio_tagging_loss=0.02691, over 4681554.00 frames. 2023-11-26 16:10:48,830 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-26 16:11:13,571 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 520050 2023-11-26 16:11:27,063 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3467040.0, ans=0.125 2023-11-26 16:11:29,629 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3467040.0, ans=0.0 2023-11-26 16:11:45,576 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 3050, loss[loss=0.07531, simple_loss=0.1025, pruned_loss=0.01467, audio_tagging_loss=0.009362, over 14749.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08939, pruned_loss=0.01206, audio_tagging_loss=0.008873, over 3044966.65 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:11:46,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3467173.3333333335, ans=0.1 2023-11-26 16:11:48,090 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3467173.3333333335, ans=0.125 2023-11-26 16:11:59,605 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3467240.0, ans=0.0 2023-11-26 16:12:09,707 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 520100 2023-11-26 16:12:11,736 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.84 vs. limit=15.0 2023-11-26 16:12:19,135 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 16:12:24,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3467373.3333333335, ans=0.1 2023-11-26 16:12:37,802 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.611e+01 8.992e+01 9.720e+01 1.054e+02 1.278e+02, threshold=1.944e+02, percent-clipped=0.0 2023-11-26 16:12:41,152 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 3100, loss[loss=0.05206, simple_loss=0.06811, pruned_loss=0.008596, audio_tagging_loss=0.009411, over 14691.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.0905, pruned_loss=0.01229, audio_tagging_loss=0.008815, over 3045929.59 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:13:06,285 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 520150 2023-11-26 16:13:16,085 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3467706.6666666665, ans=0.125 2023-11-26 16:13:17,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3467706.6666666665, ans=0.0 2023-11-26 16:13:36,585 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 3150, loss[loss=0.05974, simple_loss=0.0776, pruned_loss=0.01028, audio_tagging_loss=0.01066, over 14889.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.0905, pruned_loss=0.01229, audio_tagging_loss=0.008823, over 3036895.57 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 8.0 2023-11-26 16:13:36,880 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3467840.0, ans=0.125 2023-11-26 16:14:01,387 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 520200 2023-11-26 16:14:02,960 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.31 vs. limit=15.0 2023-11-26 16:14:20,385 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3468106.6666666665, ans=0.125 2023-11-26 16:14:31,090 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.410e+01 8.852e+01 9.512e+01 1.032e+02 1.320e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-26 16:14:33,236 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 3200, loss[loss=0.06775, simple_loss=0.09302, pruned_loss=0.01266, audio_tagging_loss=0.008572, over 15202.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.09049, pruned_loss=0.01239, audio_tagging_loss=0.008917, over 3043906.13 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:14:52,723 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 16:14:56,923 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 520250 2023-11-26 16:15:14,922 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3468373.3333333335, ans=0.0 2023-11-26 16:15:20,038 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3468440.0, ans=0.1 2023-11-26 16:15:28,512 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 3250, loss[loss=0.07963, simple_loss=0.1131, pruned_loss=0.0166, audio_tagging_loss=0.006496, over 14896.00 frames. ], tot_loss[loss=0.06711, simple_loss=0.09137, pruned_loss=0.01251, audio_tagging_loss=0.008917, over 3047861.79 frames. ], batch size: 53, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:15:42,518 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3468573.3333333335, ans=0.0 2023-11-26 16:15:45,070 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3468573.3333333335, ans=0.2 2023-11-26 16:15:54,098 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 520300 2023-11-26 16:16:04,288 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.66 vs. limit=10.0 2023-11-26 16:16:21,811 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.327e+01 8.992e+01 9.345e+01 1.022e+02 1.465e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-26 16:16:23,922 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 3300, loss[loss=0.06079, simple_loss=0.08321, pruned_loss=0.01081, audio_tagging_loss=0.008374, over 15344.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.09021, pruned_loss=0.01234, audio_tagging_loss=0.009024, over 3054839.04 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:16:33,130 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3468840.0, ans=0.125 2023-11-26 16:16:36,803 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3468906.6666666665, ans=0.125 2023-11-26 16:16:49,503 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 520350 2023-11-26 16:16:55,681 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.70 vs. limit=5.0 2023-11-26 16:17:06,868 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3469040.0, ans=0.125 2023-11-26 16:17:15,931 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.06 vs. limit=22.5 2023-11-26 16:17:21,075 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 3350, loss[loss=0.05968, simple_loss=0.08933, pruned_loss=0.008118, audio_tagging_loss=0.006893, over 15838.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.09047, pruned_loss=0.01234, audio_tagging_loss=0.00889, over 3051990.59 frames. ], batch size: 60, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:17:23,300 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3469173.3333333335, ans=0.0 2023-11-26 16:17:41,341 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3469306.6666666665, ans=0.125 2023-11-26 16:17:44,295 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 520400 2023-11-26 16:17:47,841 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3469306.6666666665, ans=0.125 2023-11-26 16:17:50,008 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3469306.6666666665, ans=0.1 2023-11-26 16:17:59,994 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3469373.3333333335, ans=0.125 2023-11-26 16:18:02,146 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3469373.3333333335, ans=0.0 2023-11-26 16:18:04,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3469440.0, ans=0.125 2023-11-26 16:18:10,742 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3469440.0, ans=0.125 2023-11-26 16:18:13,708 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.397e+01 8.575e+01 9.293e+01 1.025e+02 1.253e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-26 16:18:14,913 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3469506.6666666665, ans=0.1 2023-11-26 16:18:15,813 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 3400, loss[loss=0.07478, simple_loss=0.1004, pruned_loss=0.01624, audio_tagging_loss=0.008357, over 15914.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.09005, pruned_loss=0.01215, audio_tagging_loss=0.008816, over 3051599.07 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:18:16,442 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.30 vs. limit=15.0 2023-11-26 16:18:38,028 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 16:18:40,615 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 520450 2023-11-26 16:18:48,961 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3469706.6666666665, ans=0.125 2023-11-26 16:18:54,257 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3469706.6666666665, ans=0.125 2023-11-26 16:19:03,782 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3469773.3333333335, ans=0.1 2023-11-26 16:19:09,529 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.24 vs. limit=15.0 2023-11-26 16:19:10,989 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 3450, loss[loss=0.06331, simple_loss=0.08983, pruned_loss=0.008954, audio_tagging_loss=0.009443, over 16857.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.0908, pruned_loss=0.01231, audio_tagging_loss=0.008716, over 3055428.90 frames. ], batch size: 62, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:19:15,828 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.16 vs. limit=15.0 2023-11-26 16:19:34,562 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.95 vs. limit=10.0 2023-11-26 16:19:36,326 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 520500 2023-11-26 16:19:51,793 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3470040.0, ans=0.1 2023-11-26 16:20:05,288 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.719e+01 8.921e+01 9.582e+01 1.025e+02 1.197e+02, threshold=1.916e+02, percent-clipped=0.0 2023-11-26 16:20:07,465 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 3500, loss[loss=0.03408, simple_loss=0.04088, pruned_loss=0.004921, audio_tagging_loss=0.008716, over 14254.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08985, pruned_loss=0.01223, audio_tagging_loss=0.008643, over 3045977.17 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:20:31,638 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 520550 2023-11-26 16:20:31,779 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3470306.6666666665, ans=0.05 2023-11-26 16:20:35,905 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 16:20:37,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3470306.6666666665, ans=0.0 2023-11-26 16:20:45,153 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3470373.3333333335, ans=0.125 2023-11-26 16:20:52,069 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3470440.0, ans=0.0 2023-11-26 16:20:55,297 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.42 vs. limit=15.0 2023-11-26 16:21:02,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3470506.6666666665, ans=0.125 2023-11-26 16:21:03,046 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 3550, loss[loss=0.06294, simple_loss=0.08314, pruned_loss=0.01241, audio_tagging_loss=0.008962, over 14016.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.09019, pruned_loss=0.01219, audio_tagging_loss=0.008524, over 3048343.53 frames. ], batch size: 53, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:21:10,850 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3470506.6666666665, ans=0.1 2023-11-26 16:21:15,178 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3470573.3333333335, ans=0.95 2023-11-26 16:21:27,169 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 520600 2023-11-26 16:21:34,594 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.08 vs. limit=10.0 2023-11-26 16:21:43,363 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3470706.6666666665, ans=0.125 2023-11-26 16:21:49,155 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.36 vs. limit=15.0 2023-11-26 16:21:55,834 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.188e+01 8.906e+01 9.475e+01 1.015e+02 1.360e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-26 16:21:57,993 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 3600, loss[loss=0.07631, simple_loss=0.1024, pruned_loss=0.0192, audio_tagging_loss=0.005887, over 14045.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08908, pruned_loss=0.01203, audio_tagging_loss=0.008561, over 3046780.35 frames. ], batch size: 53, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:22:02,656 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 16:22:05,193 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3470840.0, ans=0.04949747468305833 2023-11-26 16:22:23,262 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 520650 2023-11-26 16:22:26,635 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3470973.3333333335, ans=0.09899494936611666 2023-11-26 16:22:54,297 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 3650, loss[loss=0.0524, simple_loss=0.06419, pruned_loss=0.01006, audio_tagging_loss=0.01024, over 15486.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08926, pruned_loss=0.01216, audio_tagging_loss=0.008502, over 3044525.46 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:23:05,756 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3471240.0, ans=0.0 2023-11-26 16:23:18,320 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 520700 2023-11-26 16:23:19,852 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.70 vs. limit=15.0 2023-11-26 16:23:24,883 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3471306.6666666665, ans=0.2 2023-11-26 16:23:30,105 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.25 vs. limit=15.0 2023-11-26 16:23:41,916 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3471440.0, ans=0.0 2023-11-26 16:23:47,022 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.977e+01 8.866e+01 9.412e+01 1.015e+02 1.534e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-26 16:23:47,294 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3471440.0, ans=0.2 2023-11-26 16:23:49,725 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 3700, loss[loss=0.0663, simple_loss=0.08763, pruned_loss=0.01116, audio_tagging_loss=0.01132, over 15305.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.0892, pruned_loss=0.01212, audio_tagging_loss=0.008505, over 3048883.59 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:24:05,271 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.73 vs. limit=6.0 2023-11-26 16:24:09,480 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.78 vs. limit=15.0 2023-11-26 16:24:13,874 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 520750 2023-11-26 16:24:23,989 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3471706.6666666665, ans=0.1 2023-11-26 16:24:33,549 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.48 vs. limit=15.0 2023-11-26 16:24:37,495 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3471773.3333333335, ans=0.125 2023-11-26 16:24:37,792 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.46 vs. limit=15.0 2023-11-26 16:24:43,230 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.19 vs. limit=15.0 2023-11-26 16:24:44,723 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 3750, loss[loss=0.06878, simple_loss=0.08329, pruned_loss=0.01417, audio_tagging_loss=0.01297, over 14640.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.08931, pruned_loss=0.01208, audio_tagging_loss=0.008571, over 3048509.23 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:25:09,408 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 520800 2023-11-26 16:25:10,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3471973.3333333335, ans=0.125 2023-11-26 16:25:13,214 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.18 vs. limit=15.0 2023-11-26 16:25:14,053 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3471973.3333333335, ans=0.125 2023-11-26 16:25:24,029 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 16:25:24,172 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3472040.0, ans=0.125 2023-11-26 16:25:24,290 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3472040.0, ans=0.125 2023-11-26 16:25:38,792 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3472106.6666666665, ans=0.125 2023-11-26 16:25:39,675 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.862e+01 9.098e+01 9.695e+01 1.024e+02 1.279e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-26 16:25:40,794 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 3800, loss[loss=0.06639, simple_loss=0.09264, pruned_loss=0.01118, audio_tagging_loss=0.008895, over 14987.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08956, pruned_loss=0.0121, audio_tagging_loss=0.008683, over 3042583.81 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:25:51,501 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3472240.0, ans=0.125 2023-11-26 16:25:57,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3472240.0, ans=0.125 2023-11-26 16:26:05,845 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 520850 2023-11-26 16:26:23,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3472373.3333333335, ans=0.025 2023-11-26 16:26:25,406 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.74 vs. limit=15.0 2023-11-26 16:26:29,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3472440.0, ans=0.0 2023-11-26 16:26:30,426 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3472440.0, ans=0.125 2023-11-26 16:26:30,464 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3472440.0, ans=0.0 2023-11-26 16:26:36,565 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 3850, loss[loss=0.05577, simple_loss=0.07671, pruned_loss=0.00855, audio_tagging_loss=0.008862, over 14754.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08967, pruned_loss=0.01213, audio_tagging_loss=0.008684, over 3042878.52 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:26:39,951 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3472506.6666666665, ans=0.125 2023-11-26 16:26:52,270 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3472573.3333333335, ans=0.0 2023-11-26 16:27:01,208 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 520900 2023-11-26 16:27:15,678 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3472706.6666666665, ans=0.1 2023-11-26 16:27:17,857 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 16:27:30,926 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.762e+01 8.861e+01 9.516e+01 1.005e+02 1.326e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-26 16:27:32,022 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 3900, loss[loss=0.07355, simple_loss=0.1011, pruned_loss=0.01527, audio_tagging_loss=0.007751, over 13773.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.09013, pruned_loss=0.01221, audio_tagging_loss=0.008713, over 3041952.20 frames. ], batch size: 52, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:27:45,653 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3472906.6666666665, ans=0.05 2023-11-26 16:27:50,518 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3472906.6666666665, ans=0.125 2023-11-26 16:27:57,183 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 520950 2023-11-26 16:28:08,029 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3473040.0, ans=0.0 2023-11-26 16:28:12,025 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.02 vs. limit=6.0 2023-11-26 16:28:28,140 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 3950, loss[loss=0.04824, simple_loss=0.05814, pruned_loss=0.007828, audio_tagging_loss=0.01135, over 14935.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.08994, pruned_loss=0.01216, audio_tagging_loss=0.00877, over 3040458.42 frames. ], batch size: 60, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:28:30,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3473173.3333333335, ans=0.125 2023-11-26 16:28:30,495 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3473173.3333333335, ans=0.1 2023-11-26 16:28:35,204 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3473173.3333333335, ans=0.1 2023-11-26 16:28:40,385 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3473240.0, ans=0.0 2023-11-26 16:28:42,506 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3473240.0, ans=0.0 2023-11-26 16:28:47,833 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3473240.0, ans=0.125 2023-11-26 16:28:48,127 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.99 vs. limit=10.0 2023-11-26 16:28:48,901 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3473306.6666666665, ans=0.07 2023-11-26 16:28:51,995 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 521000 2023-11-26 16:28:56,356 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3473306.6666666665, ans=0.125 2023-11-26 16:29:11,869 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3473440.0, ans=0.025 2023-11-26 16:29:22,817 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.631e+01 9.021e+01 9.497e+01 1.017e+02 1.308e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-26 16:29:23,983 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 4000, loss[loss=0.07357, simple_loss=0.1004, pruned_loss=0.01397, audio_tagging_loss=0.009415, over 15353.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.09039, pruned_loss=0.01231, audio_tagging_loss=0.00881, over 3036612.36 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:29:36,110 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3473573.3333333335, ans=0.0 2023-11-26 16:29:37,080 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3473573.3333333335, ans=0.0 2023-11-26 16:29:38,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3473573.3333333335, ans=0.2 2023-11-26 16:29:48,270 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 521050 2023-11-26 16:30:02,159 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.51 vs. limit=22.5 2023-11-26 16:30:19,801 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 4050, loss[loss=0.05254, simple_loss=0.06723, pruned_loss=0.00795, audio_tagging_loss=0.01098, over 14149.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.09018, pruned_loss=0.01237, audio_tagging_loss=0.008857, over 3030440.68 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:30:19,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3473840.0, ans=0.125 2023-11-26 16:30:23,560 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 16:30:29,507 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.94 vs. limit=15.0 2023-11-26 16:30:32,199 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3473906.6666666665, ans=0.015 2023-11-26 16:30:37,071 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3473906.6666666665, ans=0.125 2023-11-26 16:30:44,946 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 521100 2023-11-26 16:31:01,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3474040.0, ans=0.125 2023-11-26 16:31:07,119 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3474106.6666666665, ans=0.2 2023-11-26 16:31:13,496 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3474106.6666666665, ans=0.125 2023-11-26 16:31:14,311 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.312e+01 8.820e+01 9.465e+01 1.007e+02 1.196e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 16:31:15,951 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 4100, loss[loss=0.06677, simple_loss=0.08929, pruned_loss=0.01199, audio_tagging_loss=0.01014, over 15673.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.09063, pruned_loss=0.01243, audio_tagging_loss=0.008872, over 3038152.56 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:31:18,352 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3474173.3333333335, ans=0.2 2023-11-26 16:31:28,984 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3474240.0, ans=0.125 2023-11-26 16:31:40,672 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 521150 2023-11-26 16:32:02,106 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3474440.0, ans=0.2 2023-11-26 16:32:10,794 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3474440.0, ans=0.1 2023-11-26 16:32:12,711 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 4150, loss[loss=0.06484, simple_loss=0.09282, pruned_loss=0.00929, audio_tagging_loss=0.009135, over 14799.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.09061, pruned_loss=0.01239, audio_tagging_loss=0.008793, over 3035065.07 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:32:26,921 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3474573.3333333335, ans=0.125 2023-11-26 16:32:27,810 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3474573.3333333335, ans=0.125 2023-11-26 16:32:33,877 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3474640.0, ans=0.0 2023-11-26 16:32:36,827 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 521200 2023-11-26 16:32:46,445 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3474706.6666666665, ans=0.125 2023-11-26 16:32:54,692 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 16:33:07,364 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.415e+01 9.007e+01 9.363e+01 1.013e+02 1.321e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-26 16:33:08,497 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 4200, loss[loss=0.07492, simple_loss=0.1071, pruned_loss=0.01203, audio_tagging_loss=0.00933, over 15643.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.09061, pruned_loss=0.01248, audio_tagging_loss=0.008708, over 3038795.27 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:33:13,087 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3474840.0, ans=0.125 2023-11-26 16:33:18,926 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3474906.6666666665, ans=0.1 2023-11-26 16:33:33,747 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 521250 2023-11-26 16:33:53,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3475106.6666666665, ans=0.0 2023-11-26 16:34:04,096 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 4250, loss[loss=0.06428, simple_loss=0.08843, pruned_loss=0.01316, audio_tagging_loss=0.006911, over 14683.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.09093, pruned_loss=0.01244, audio_tagging_loss=0.008586, over 3044211.03 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:34:05,334 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3475173.3333333335, ans=0.0 2023-11-26 16:34:25,981 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.28 vs. limit=12.0 2023-11-26 16:34:28,588 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 521300 2023-11-26 16:34:32,907 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3475306.6666666665, ans=0.2 2023-11-26 16:34:51,335 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3475440.0, ans=0.0 2023-11-26 16:34:53,944 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.39 vs. limit=15.0 2023-11-26 16:34:59,010 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.776e+01 8.954e+01 9.475e+01 1.016e+02 1.438e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-26 16:35:00,219 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 4300, loss[loss=0.05859, simple_loss=0.08143, pruned_loss=0.0101, audio_tagging_loss=0.007784, over 15370.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.0911, pruned_loss=0.01251, audio_tagging_loss=0.008573, over 3045033.15 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:35:17,289 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3475573.3333333335, ans=0.1 2023-11-26 16:35:23,613 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 521350 2023-11-26 16:35:36,852 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.00 vs. limit=12.0 2023-11-26 16:35:53,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3475773.3333333335, ans=0.0 2023-11-26 16:35:55,108 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 4350, loss[loss=0.07648, simple_loss=0.105, pruned_loss=0.01639, audio_tagging_loss=0.00759, over 14432.00 frames. ], tot_loss[loss=0.0672, simple_loss=0.09208, pruned_loss=0.01267, audio_tagging_loss=0.008494, over 3042333.26 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:36:02,771 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3475840.0, ans=0.125 2023-11-26 16:36:19,545 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 521400 2023-11-26 16:36:29,471 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.48 vs. limit=15.0 2023-11-26 16:36:33,449 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 16:36:38,757 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3476106.6666666665, ans=0.0 2023-11-26 16:36:42,001 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3476106.6666666665, ans=0.0 2023-11-26 16:36:49,101 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.562e+01 8.904e+01 9.369e+01 1.014e+02 1.389e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-26 16:36:50,193 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 4400, loss[loss=0.05034, simple_loss=0.06746, pruned_loss=0.007944, audio_tagging_loss=0.008664, over 13999.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.09139, pruned_loss=0.01236, audio_tagging_loss=0.008558, over 3051029.37 frames. ], batch size: 53, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:37:14,878 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 521450 2023-11-26 16:37:27,065 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.05 vs. limit=15.0 2023-11-26 16:37:34,229 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3476440.0, ans=0.125 2023-11-26 16:37:35,245 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3476440.0, ans=0.2 2023-11-26 16:37:46,787 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 4450, loss[loss=0.06675, simple_loss=0.09187, pruned_loss=0.01228, audio_tagging_loss=0.008538, over 15246.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.09059, pruned_loss=0.01226, audio_tagging_loss=0.008616, over 3052987.49 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:37:56,606 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3476573.3333333335, ans=0.07 2023-11-26 16:38:01,963 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3476573.3333333335, ans=0.125 2023-11-26 16:38:10,253 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 521500 2023-11-26 16:38:18,199 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.21 vs. limit=12.0 2023-11-26 16:38:21,633 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3476706.6666666665, ans=0.0 2023-11-26 16:38:35,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3476773.3333333335, ans=0.0 2023-11-26 16:38:40,521 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.377e+01 8.912e+01 9.426e+01 1.014e+02 1.545e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-26 16:38:41,580 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 4500, loss[loss=0.06567, simple_loss=0.08468, pruned_loss=0.01242, audio_tagging_loss=0.01092, over 15546.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.09027, pruned_loss=0.01227, audio_tagging_loss=0.00856, over 3060160.59 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:38:41,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3476840.0, ans=0.1 2023-11-26 16:38:53,421 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3476906.6666666665, ans=0.2 2023-11-26 16:39:05,307 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 521550 2023-11-26 16:39:19,597 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.86 vs. limit=22.5 2023-11-26 16:39:26,682 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3477106.6666666665, ans=0.125 2023-11-26 16:39:36,118 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 4550, loss[loss=0.09665, simple_loss=0.136, pruned_loss=0.02208, audio_tagging_loss=0.006585, over 14827.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.09052, pruned_loss=0.01245, audio_tagging_loss=0.008616, over 3048372.46 frames. ], batch size: 54, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:39:37,394 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3477173.3333333335, ans=0.0 2023-11-26 16:39:45,346 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3477173.3333333335, ans=0.0 2023-11-26 16:40:01,313 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 521600 2023-11-26 16:40:12,439 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3477373.3333333335, ans=0.2 2023-11-26 16:40:12,546 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3477373.3333333335, ans=0.125 2023-11-26 16:40:20,754 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 16:40:23,031 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3477440.0, ans=0.2 2023-11-26 16:40:23,566 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.90 vs. limit=12.0 2023-11-26 16:40:28,885 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3477440.0, ans=0.035 2023-11-26 16:40:30,851 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.935e+01 8.644e+01 9.356e+01 1.024e+02 1.228e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-26 16:40:31,925 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 4600, loss[loss=0.06511, simple_loss=0.0909, pruned_loss=0.01098, audio_tagging_loss=0.008677, over 15030.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.09027, pruned_loss=0.01237, audio_tagging_loss=0.008695, over 3040664.67 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:40:39,118 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3477506.6666666665, ans=0.125 2023-11-26 16:40:47,698 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3477573.3333333335, ans=0.125 2023-11-26 16:40:52,898 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3477573.3333333335, ans=0.035 2023-11-26 16:40:57,093 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 521650 2023-11-26 16:40:57,462 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.89 vs. limit=15.0 2023-11-26 16:41:09,233 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3477706.6666666665, ans=0.125 2023-11-26 16:41:11,471 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3477706.6666666665, ans=0.125 2023-11-26 16:41:22,578 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3477773.3333333335, ans=0.0 2023-11-26 16:41:28,633 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 4650, loss[loss=0.05726, simple_loss=0.07266, pruned_loss=0.0112, audio_tagging_loss=0.009724, over 14362.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08953, pruned_loss=0.01219, audio_tagging_loss=0.008763, over 3043937.33 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:41:34,355 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3477840.0, ans=0.05 2023-11-26 16:41:40,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3477906.6666666665, ans=0.125 2023-11-26 16:41:46,477 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.12 vs. limit=12.0 2023-11-26 16:41:47,109 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3477906.6666666665, ans=0.0 2023-11-26 16:41:52,942 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 521700 2023-11-26 16:42:06,112 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3478040.0, ans=0.07 2023-11-26 16:42:11,287 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3478040.0, ans=0.125 2023-11-26 16:42:14,455 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3478106.6666666665, ans=0.0 2023-11-26 16:42:20,394 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.51 vs. limit=8.0 2023-11-26 16:42:20,825 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3478106.6666666665, ans=0.0 2023-11-26 16:42:23,738 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.106e+01 8.839e+01 9.612e+01 1.022e+02 1.375e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-26 16:42:23,768 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 4700, loss[loss=0.06883, simple_loss=0.09174, pruned_loss=0.01598, audio_tagging_loss=0.006976, over 16119.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.08989, pruned_loss=0.01215, audio_tagging_loss=0.008896, over 3044446.23 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:42:41,537 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.81 vs. limit=10.0 2023-11-26 16:42:46,012 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.22 vs. limit=6.0 2023-11-26 16:42:49,022 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 521750 2023-11-26 16:42:52,319 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3478306.6666666665, ans=0.125 2023-11-26 16:42:52,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3478306.6666666665, ans=0.125 2023-11-26 16:43:17,215 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=3478440.0, ans=0.05 2023-11-26 16:43:19,239 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 4750, loss[loss=0.0649, simple_loss=0.08212, pruned_loss=0.01545, audio_tagging_loss=0.008385, over 15566.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08929, pruned_loss=0.01207, audio_tagging_loss=0.008961, over 3044364.54 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:43:33,302 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3478573.3333333335, ans=0.125 2023-11-26 16:43:38,841 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.58 vs. limit=15.0 2023-11-26 16:43:43,693 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 521800 2023-11-26 16:43:44,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3478640.0, ans=0.125 2023-11-26 16:43:50,532 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3478640.0, ans=0.07 2023-11-26 16:43:56,706 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3478706.6666666665, ans=0.0 2023-11-26 16:43:59,626 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.67 vs. limit=15.0 2023-11-26 16:44:15,710 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 4800, loss[loss=0.07609, simple_loss=0.107, pruned_loss=0.01371, audio_tagging_loss=0.008864, over 14182.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.0889, pruned_loss=0.01203, audio_tagging_loss=0.009086, over 3045384.10 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:44:16,765 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.694e+01 8.936e+01 9.415e+01 1.023e+02 1.286e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-26 16:44:26,722 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3478906.6666666665, ans=0.0 2023-11-26 16:44:28,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3478906.6666666665, ans=0.125 2023-11-26 16:44:39,431 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 521850 2023-11-26 16:44:52,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3479040.0, ans=0.0 2023-11-26 16:44:53,410 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.72 vs. limit=22.5 2023-11-26 16:44:58,357 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3479040.0, ans=0.125 2023-11-26 16:45:08,442 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3479106.6666666665, ans=0.0 2023-11-26 16:45:11,466 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 4850, loss[loss=0.07334, simple_loss=0.1058, pruned_loss=0.01375, audio_tagging_loss=0.006704, over 15656.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08875, pruned_loss=0.012, audio_tagging_loss=0.009129, over 3042343.97 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:45:24,964 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3479240.0, ans=0.2 2023-11-26 16:45:36,702 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 521900 2023-11-26 16:46:06,194 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3479506.6666666665, ans=0.125 2023-11-26 16:46:06,249 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3479506.6666666665, ans=0.125 2023-11-26 16:46:07,658 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 4900, loss[loss=0.07188, simple_loss=0.1099, pruned_loss=0.01059, audio_tagging_loss=0.006366, over 15685.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.08934, pruned_loss=0.01214, audio_tagging_loss=0.00907, over 3037224.26 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:46:08,679 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.702e+01 8.681e+01 9.501e+01 1.005e+02 1.327e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-26 16:46:10,545 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.99 vs. limit=15.0 2023-11-26 16:46:22,772 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3479573.3333333335, ans=0.1 2023-11-26 16:46:32,678 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 521950 2023-11-26 16:46:34,994 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3479640.0, ans=0.125 2023-11-26 16:46:42,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3479706.6666666665, ans=0.125 2023-11-26 16:46:50,300 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3479706.6666666665, ans=0.1 2023-11-26 16:47:03,985 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 4950, loss[loss=0.05884, simple_loss=0.0855, pruned_loss=0.01025, audio_tagging_loss=0.005839, over 16436.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.08995, pruned_loss=0.01239, audio_tagging_loss=0.008872, over 3037354.46 frames. ], batch size: 61, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:47:27,914 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 522000 2023-11-26 16:47:52,825 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.66 vs. limit=22.5 2023-11-26 16:48:00,016 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 5000, loss[loss=0.07357, simple_loss=0.09956, pruned_loss=0.0165, audio_tagging_loss=0.007286, over 14175.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.08983, pruned_loss=0.01239, audio_tagging_loss=0.008723, over 3037302.03 frames. ], batch size: 53, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:48:01,110 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.774e+01 8.862e+01 9.666e+01 1.035e+02 1.226e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-26 16:48:25,372 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 522050 2023-11-26 16:48:28,800 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3480306.6666666665, ans=0.0 2023-11-26 16:48:28,961 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.76 vs. limit=15.0 2023-11-26 16:48:34,108 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3480373.3333333335, ans=0.0 2023-11-26 16:48:49,014 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.67 vs. limit=15.0 2023-11-26 16:48:49,651 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3480440.0, ans=0.125 2023-11-26 16:48:50,549 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3480440.0, ans=0.2 2023-11-26 16:48:55,735 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 5050, loss[loss=0.05234, simple_loss=0.06298, pruned_loss=0.01073, audio_tagging_loss=0.01011, over 14770.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08952, pruned_loss=0.01228, audio_tagging_loss=0.008601, over 3041019.53 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:49:11,587 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3480573.3333333335, ans=0.125 2023-11-26 16:49:17,932 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3480640.0, ans=0.2 2023-11-26 16:49:21,051 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 522100 2023-11-26 16:49:23,422 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3480640.0, ans=0.2 2023-11-26 16:49:35,479 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3480706.6666666665, ans=0.0 2023-11-26 16:49:51,250 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3480773.3333333335, ans=0.0 2023-11-26 16:49:53,094 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 5100, loss[loss=0.08151, simple_loss=0.1217, pruned_loss=0.01491, audio_tagging_loss=0.005758, over 15801.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.09077, pruned_loss=0.01242, audio_tagging_loss=0.008541, over 3042571.90 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:49:54,127 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.064e+01 8.678e+01 9.277e+01 1.001e+02 1.240e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-26 16:50:13,031 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3480906.6666666665, ans=0.125 2023-11-26 16:50:13,348 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.63 vs. limit=15.0 2023-11-26 16:50:17,114 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 522150 2023-11-26 16:50:22,757 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3480973.3333333335, ans=0.125 2023-11-26 16:50:42,789 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.31 vs. limit=12.0 2023-11-26 16:50:43,403 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3481106.6666666665, ans=0.1 2023-11-26 16:50:44,548 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3481106.6666666665, ans=0.0 2023-11-26 16:50:46,945 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.09 vs. limit=15.0 2023-11-26 16:50:48,533 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 5150, loss[loss=0.07228, simple_loss=0.08873, pruned_loss=0.01514, audio_tagging_loss=0.01278, over 14445.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.0908, pruned_loss=0.01245, audio_tagging_loss=0.008572, over 3041225.10 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:50:54,961 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.90 vs. limit=15.0 2023-11-26 16:51:03,059 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3481240.0, ans=0.125 2023-11-26 16:51:14,147 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 522200 2023-11-26 16:51:15,628 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.48 vs. limit=22.5 2023-11-26 16:51:21,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3481306.6666666665, ans=0.125 2023-11-26 16:51:36,977 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.04 vs. limit=15.0 2023-11-26 16:51:42,808 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3481440.0, ans=0.0 2023-11-26 16:51:44,720 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 5200, loss[loss=0.06137, simple_loss=0.0784, pruned_loss=0.01273, audio_tagging_loss=0.009439, over 15479.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.09069, pruned_loss=0.01241, audio_tagging_loss=0.008607, over 3038877.45 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:51:45,187 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.12 vs. limit=15.0 2023-11-26 16:51:45,725 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.218e+01 8.774e+01 9.486e+01 1.034e+02 1.875e+02, threshold=1.897e+02, percent-clipped=1.0 2023-11-26 16:52:09,795 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 522250 2023-11-26 16:52:38,097 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3481773.3333333335, ans=0.125 2023-11-26 16:52:41,983 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 5250, loss[loss=0.06734, simple_loss=0.08269, pruned_loss=0.01373, audio_tagging_loss=0.01227, over 15250.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.09012, pruned_loss=0.01235, audio_tagging_loss=0.008659, over 3033894.65 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:52:43,269 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3481840.0, ans=0.125 2023-11-26 16:52:45,408 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3481840.0, ans=0.125 2023-11-26 16:52:45,855 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.37 vs. limit=10.0 2023-11-26 16:53:05,974 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 522300 2023-11-26 16:53:15,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3482040.0, ans=0.1 2023-11-26 16:53:15,589 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.76 vs. limit=22.5 2023-11-26 16:53:21,643 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3482040.0, ans=0.125 2023-11-26 16:53:36,556 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3482173.3333333335, ans=0.125 2023-11-26 16:53:37,439 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 5300, loss[loss=0.07552, simple_loss=0.1053, pruned_loss=0.01589, audio_tagging_loss=0.007003, over 15082.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.09002, pruned_loss=0.01245, audio_tagging_loss=0.008677, over 3038031.31 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:53:39,544 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.063e+01 8.820e+01 9.463e+01 1.024e+02 1.274e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 16:54:03,238 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 522350 2023-11-26 16:54:05,470 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3482306.6666666665, ans=0.125 2023-11-26 16:54:08,834 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.45 vs. limit=12.0 2023-11-26 16:54:08,921 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.88 vs. limit=15.0 2023-11-26 16:54:09,939 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.30 vs. limit=22.5 2023-11-26 16:54:19,503 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.33 vs. limit=10.0 2023-11-26 16:54:26,872 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.14 vs. limit=15.0 2023-11-26 16:54:28,017 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.58 vs. limit=22.5 2023-11-26 16:54:30,166 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.65 vs. limit=12.0 2023-11-26 16:54:33,244 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 5350, loss[loss=0.05921, simple_loss=0.08188, pruned_loss=0.008894, audio_tagging_loss=0.009378, over 15178.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.08997, pruned_loss=0.01245, audio_tagging_loss=0.008683, over 3036692.26 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:54:35,527 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3482506.6666666665, ans=0.125 2023-11-26 16:54:43,611 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3482573.3333333335, ans=0.125 2023-11-26 16:54:58,401 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 522400 2023-11-26 16:55:12,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3482706.6666666665, ans=0.125 2023-11-26 16:55:20,083 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.70 vs. limit=10.0 2023-11-26 16:55:30,686 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 5400, loss[loss=0.06661, simple_loss=0.09087, pruned_loss=0.01238, audio_tagging_loss=0.008796, over 16972.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.08975, pruned_loss=0.01247, audio_tagging_loss=0.008766, over 3035361.20 frames. ], batch size: 61, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:55:32,793 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.955e+01 9.129e+01 9.512e+01 1.019e+02 1.244e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-26 16:55:34,539 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.43 vs. limit=15.0 2023-11-26 16:55:36,368 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3482840.0, ans=0.05 2023-11-26 16:55:46,825 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3482906.6666666665, ans=0.2 2023-11-26 16:55:54,158 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 522450 2023-11-26 16:56:03,220 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3483040.0, ans=0.025 2023-11-26 16:56:12,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3483040.0, ans=0.0 2023-11-26 16:56:12,499 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3483040.0, ans=0.125 2023-11-26 16:56:15,625 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3483106.6666666665, ans=0.0 2023-11-26 16:56:26,054 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 5450, loss[loss=0.06992, simple_loss=0.08792, pruned_loss=0.01618, audio_tagging_loss=0.009785, over 14906.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.08936, pruned_loss=0.01238, audio_tagging_loss=0.008864, over 3037311.26 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:56:32,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3483173.3333333335, ans=0.125 2023-11-26 16:56:38,561 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.67 vs. limit=15.0 2023-11-26 16:56:50,437 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 522500 2023-11-26 16:57:13,036 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3483440.0, ans=0.125 2023-11-26 16:57:17,319 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3483440.0, ans=0.125 2023-11-26 16:57:20,512 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3483506.6666666665, ans=0.125 2023-11-26 16:57:21,256 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 5500, loss[loss=0.07551, simple_loss=0.09869, pruned_loss=0.01653, audio_tagging_loss=0.009634, over 15356.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08894, pruned_loss=0.01219, audio_tagging_loss=0.008803, over 3037394.34 frames. ], batch size: 54, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:57:23,310 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.125e+01 8.753e+01 9.597e+01 1.033e+02 1.583e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-26 16:57:35,977 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3483573.3333333335, ans=0.125 2023-11-26 16:57:45,879 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3483640.0, ans=0.1 2023-11-26 16:57:46,832 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 522550 2023-11-26 16:57:59,324 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.64 vs. limit=22.5 2023-11-26 16:58:15,250 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.96 vs. limit=15.0 2023-11-26 16:58:17,005 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.48 vs. limit=15.0 2023-11-26 16:58:18,144 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 5550, loss[loss=0.05765, simple_loss=0.09102, pruned_loss=0.004602, audio_tagging_loss=0.007534, over 15843.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.09011, pruned_loss=0.01226, audio_tagging_loss=0.008839, over 3042252.54 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 16.0 2023-11-26 16:58:26,176 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3483840.0, ans=0.125 2023-11-26 16:58:35,004 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.09 vs. limit=15.0 2023-11-26 16:58:41,033 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 16:58:41,950 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 522600 2023-11-26 16:59:03,552 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3484106.6666666665, ans=0.1 2023-11-26 16:59:07,872 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3484106.6666666665, ans=0.0 2023-11-26 16:59:13,970 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 5600, loss[loss=0.07872, simple_loss=0.1149, pruned_loss=0.01371, audio_tagging_loss=0.007548, over 16141.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.09006, pruned_loss=0.01213, audio_tagging_loss=0.008955, over 3049193.80 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 16:59:16,060 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.541e+01 8.836e+01 9.428e+01 1.004e+02 1.214e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-26 16:59:26,954 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3484240.0, ans=0.125 2023-11-26 16:59:35,375 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3484306.6666666665, ans=0.2 2023-11-26 16:59:38,569 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 522650 2023-11-26 16:59:40,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3484306.6666666665, ans=0.125 2023-11-26 16:59:55,798 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3484373.3333333335, ans=0.0 2023-11-26 16:59:56,682 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 17:00:03,528 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.38 vs. limit=15.0 2023-11-26 17:00:09,439 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 5650, loss[loss=0.05069, simple_loss=0.06826, pruned_loss=0.007679, audio_tagging_loss=0.008877, over 15128.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08913, pruned_loss=0.01197, audio_tagging_loss=0.009042, over 3049132.31 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 17:00:20,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3484573.3333333335, ans=0.0 2023-11-26 17:00:34,592 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 522700 2023-11-26 17:00:56,334 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn1.whiten.whitening_limit, batch_count=3484773.3333333335, ans=22.5 2023-11-26 17:00:57,201 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.51 vs. limit=15.0 2023-11-26 17:01:05,370 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 5700, loss[loss=0.06392, simple_loss=0.09691, pruned_loss=0.00993, audio_tagging_loss=0.005542, over 15350.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08871, pruned_loss=0.01204, audio_tagging_loss=0.009005, over 3050986.98 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 17:01:08,015 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.017e+01 9.014e+01 9.489e+01 1.005e+02 1.284e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-26 17:01:13,126 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 17:01:14,183 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3484840.0, ans=0.09899494936611666 2023-11-26 17:01:19,853 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.17 vs. limit=15.0 2023-11-26 17:01:22,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3484906.6666666665, ans=0.0 2023-11-26 17:01:29,924 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 522750 2023-11-26 17:01:29,984 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3484973.3333333335, ans=0.0 2023-11-26 17:01:30,078 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3484973.3333333335, ans=0.0 2023-11-26 17:01:34,335 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3484973.3333333335, ans=0.07 2023-11-26 17:01:35,413 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3484973.3333333335, ans=0.05 2023-11-26 17:01:35,427 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3484973.3333333335, ans=0.1 2023-11-26 17:01:42,825 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3485040.0, ans=0.125 2023-11-26 17:02:01,804 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 5750, loss[loss=0.08227, simple_loss=0.1147, pruned_loss=0.01749, audio_tagging_loss=0.007442, over 15597.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08814, pruned_loss=0.01204, audio_tagging_loss=0.008947, over 3045481.56 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 17:02:07,361 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3485173.3333333335, ans=0.0 2023-11-26 17:02:10,456 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3485173.3333333335, ans=0.125 2023-11-26 17:02:25,403 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 522800 2023-11-26 17:02:34,699 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3485373.3333333335, ans=0.0 2023-11-26 17:02:48,011 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3485440.0, ans=0.125 2023-11-26 17:02:57,294 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 5800, loss[loss=0.07454, simple_loss=0.1035, pruned_loss=0.01611, audio_tagging_loss=0.006697, over 15039.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.0886, pruned_loss=0.01222, audio_tagging_loss=0.008829, over 3044237.68 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 32.0 2023-11-26 17:02:59,435 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.838e+01 8.825e+01 9.413e+01 1.036e+02 1.628e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-26 17:03:23,248 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 522850 2023-11-26 17:03:27,648 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3485640.0, ans=0.125 2023-11-26 17:03:49,839 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3485773.3333333335, ans=0.2 2023-11-26 17:03:53,387 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 5850, loss[loss=0.06289, simple_loss=0.09163, pruned_loss=0.00921, audio_tagging_loss=0.007866, over 14803.00 frames. ], tot_loss[loss=0.06446, simple_loss=0.08721, pruned_loss=0.01205, audio_tagging_loss=0.008801, over 3031040.48 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:03:53,655 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3485840.0, ans=0.125 2023-11-26 17:03:58,410 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=3485840.0, ans=0.5 2023-11-26 17:03:58,514 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 17:04:04,027 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3485840.0, ans=0.0 2023-11-26 17:04:18,738 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 522900 2023-11-26 17:04:40,089 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3486106.6666666665, ans=0.0 2023-11-26 17:04:50,429 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 5900, loss[loss=0.06982, simple_loss=0.09216, pruned_loss=0.01684, audio_tagging_loss=0.006907, over 15032.00 frames. ], tot_loss[loss=0.06476, simple_loss=0.08779, pruned_loss=0.01206, audio_tagging_loss=0.008799, over 3028217.20 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:04:52,510 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.394e+01 8.809e+01 9.343e+01 1.010e+02 1.341e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-26 17:05:11,853 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3486306.6666666665, ans=0.125 2023-11-26 17:05:13,765 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 522950 2023-11-26 17:05:14,386 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.72 vs. limit=15.0 2023-11-26 17:05:27,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3486373.3333333335, ans=0.0 2023-11-26 17:05:29,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3486373.3333333335, ans=0.125 2023-11-26 17:05:45,315 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 5950, loss[loss=0.05723, simple_loss=0.08122, pruned_loss=0.009322, audio_tagging_loss=0.0073, over 15589.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08895, pruned_loss=0.01212, audio_tagging_loss=0.008636, over 3032872.37 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:05:46,661 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff2.min_abs, batch_count=3486506.6666666665, ans=0.1 2023-11-26 17:06:10,259 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 523000 2023-11-26 17:06:21,923 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3486706.6666666665, ans=0.0 2023-11-26 17:06:40,781 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 6000, loss[loss=0.06671, simple_loss=0.08323, pruned_loss=0.01443, audio_tagging_loss=0.01067, over 15233.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.0885, pruned_loss=0.01197, audio_tagging_loss=0.008687, over 3036557.18 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:06:40,782 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-26 17:07:10,110 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.2121, 4.0048, 3.7180, 3.3455], device='cuda:3') 2023-11-26 17:07:13,752 INFO [train_asr.py:1267] (3/4) Epoch 44, validation: loss=0.05792, simple_loss=0.05061, pruned_loss=0.005328, audio_tagging_loss=0.02728, over 4681554.00 frames. 2023-11-26 17:07:13,753 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-26 17:07:16,874 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.326e+01 8.949e+01 9.418e+01 1.019e+02 1.469e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-26 17:07:37,161 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 523050 2023-11-26 17:07:54,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3487040.0, ans=0.0 2023-11-26 17:07:56,232 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 17:08:08,761 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 6050, loss[loss=0.06347, simple_loss=0.08545, pruned_loss=0.00948, audio_tagging_loss=0.01126, over 15483.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08907, pruned_loss=0.01211, audio_tagging_loss=0.00866, over 3038703.16 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:08:13,470 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.47 vs. limit=12.0 2023-11-26 17:08:14,206 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3487173.3333333335, ans=0.125 2023-11-26 17:08:23,793 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.29 vs. limit=8.0 2023-11-26 17:08:32,851 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3487306.6666666665, ans=0.125 2023-11-26 17:08:32,852 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3487306.6666666665, ans=0.1 2023-11-26 17:08:33,718 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 523100 2023-11-26 17:08:42,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3487373.3333333335, ans=0.125 2023-11-26 17:09:04,416 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 6100, loss[loss=0.0723, simple_loss=0.1045, pruned_loss=0.01215, audio_tagging_loss=0.007921, over 14536.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08865, pruned_loss=0.01204, audio_tagging_loss=0.008604, over 3038763.76 frames. ], batch size: 53, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:09:08,050 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.524e+01 8.667e+01 9.163e+01 9.920e+01 1.251e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-26 17:09:11,445 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3487506.6666666665, ans=0.1 2023-11-26 17:09:13,053 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3487506.6666666665, ans=0.125 2023-11-26 17:09:13,440 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.94 vs. limit=15.0 2023-11-26 17:09:29,471 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 523150 2023-11-26 17:09:34,205 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.20 vs. limit=22.5 2023-11-26 17:09:42,368 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3487706.6666666665, ans=0.1 2023-11-26 17:09:43,405 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3487706.6666666665, ans=0.125 2023-11-26 17:09:48,219 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3487773.3333333335, ans=0.125 2023-11-26 17:09:54,816 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.84 vs. limit=15.0 2023-11-26 17:10:00,750 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 6150, loss[loss=0.07803, simple_loss=0.1135, pruned_loss=0.01731, audio_tagging_loss=0.003971, over 15787.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08961, pruned_loss=0.01212, audio_tagging_loss=0.008543, over 3042813.04 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:10:12,179 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3487906.6666666665, ans=0.125 2023-11-26 17:10:12,201 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3487906.6666666665, ans=0.2 2023-11-26 17:10:14,176 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3487906.6666666665, ans=0.0 2023-11-26 17:10:14,305 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3487906.6666666665, ans=0.1 2023-11-26 17:10:15,281 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3487906.6666666665, ans=0.125 2023-11-26 17:10:24,727 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 523200 2023-11-26 17:10:27,381 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3487973.3333333335, ans=0.0 2023-11-26 17:10:32,570 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3487973.3333333335, ans=0.125 2023-11-26 17:10:37,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3488040.0, ans=0.2 2023-11-26 17:10:43,097 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.25 vs. limit=12.0 2023-11-26 17:10:45,853 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3488106.6666666665, ans=0.2 2023-11-26 17:10:48,980 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3488106.6666666665, ans=0.0 2023-11-26 17:10:56,661 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 6200, loss[loss=0.06078, simple_loss=0.08368, pruned_loss=0.007958, audio_tagging_loss=0.01098, over 15712.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.08851, pruned_loss=0.01192, audio_tagging_loss=0.008697, over 3039682.49 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:10:59,879 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.581e+01 8.701e+01 9.346e+01 1.022e+02 1.320e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-26 17:11:07,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3488240.0, ans=0.1 2023-11-26 17:11:16,283 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3488240.0, ans=0.125 2023-11-26 17:11:22,148 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 523250 2023-11-26 17:11:23,315 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3488306.6666666665, ans=0.2 2023-11-26 17:11:25,326 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3488306.6666666665, ans=0.2 2023-11-26 17:11:29,536 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3488373.3333333335, ans=0.0 2023-11-26 17:11:37,564 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3488373.3333333335, ans=0.125 2023-11-26 17:11:46,156 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3488440.0, ans=0.0 2023-11-26 17:11:50,343 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3488440.0, ans=0.0 2023-11-26 17:11:50,397 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3488440.0, ans=0.125 2023-11-26 17:11:52,841 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 6250, loss[loss=0.04277, simple_loss=0.05129, pruned_loss=0.007464, audio_tagging_loss=0.009666, over 15681.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.08917, pruned_loss=0.01206, audio_tagging_loss=0.008736, over 3041521.07 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:11:58,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3488506.6666666665, ans=0.0 2023-11-26 17:12:03,841 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3488573.3333333335, ans=0.125 2023-11-26 17:12:13,300 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3488573.3333333335, ans=0.125 2023-11-26 17:12:17,948 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 523300 2023-11-26 17:12:22,404 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3488640.0, ans=0.0 2023-11-26 17:12:26,858 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.17 vs. limit=22.5 2023-11-26 17:12:27,582 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3488706.6666666665, ans=0.0 2023-11-26 17:12:40,344 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3488773.3333333335, ans=0.2 2023-11-26 17:12:49,229 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 6300, loss[loss=0.06125, simple_loss=0.08377, pruned_loss=0.01095, audio_tagging_loss=0.008418, over 16165.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08883, pruned_loss=0.01197, audio_tagging_loss=0.008796, over 3045250.85 frames. ], batch size: 61, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:12:52,401 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.161e+01 8.680e+01 9.348e+01 1.004e+02 1.184e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-26 17:12:53,734 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3488840.0, ans=0.125 2023-11-26 17:12:58,904 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3488906.6666666665, ans=0.1 2023-11-26 17:13:04,756 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3488906.6666666665, ans=0.0 2023-11-26 17:13:13,216 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 523350 2023-11-26 17:13:26,279 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3489040.0, ans=0.2 2023-11-26 17:13:30,423 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.77 vs. limit=12.0 2023-11-26 17:13:44,470 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 6350, loss[loss=0.05685, simple_loss=0.07246, pruned_loss=0.01304, audio_tagging_loss=0.007575, over 15217.00 frames. ], tot_loss[loss=0.06466, simple_loss=0.08774, pruned_loss=0.01174, audio_tagging_loss=0.009054, over 3038950.87 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:13:44,772 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3489173.3333333335, ans=0.125 2023-11-26 17:14:04,772 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3489240.0, ans=0.1 2023-11-26 17:14:09,535 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 523400 2023-11-26 17:14:19,257 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.05 vs. limit=15.0 2023-11-26 17:14:22,582 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.36 vs. limit=15.0 2023-11-26 17:14:35,645 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.34 vs. limit=15.0 2023-11-26 17:14:40,444 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 6400, loss[loss=0.06941, simple_loss=0.09738, pruned_loss=0.01219, audio_tagging_loss=0.008529, over 14924.00 frames. ], tot_loss[loss=0.06462, simple_loss=0.08744, pruned_loss=0.01175, audio_tagging_loss=0.009154, over 3038767.31 frames. ], batch size: 53, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:14:45,175 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.093e+01 8.723e+01 9.338e+01 1.021e+02 1.186e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-26 17:15:05,625 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 523450 2023-11-26 17:15:14,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3489706.6666666665, ans=0.0 2023-11-26 17:15:15,928 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3489706.6666666665, ans=0.125 2023-11-26 17:15:16,980 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3489706.6666666665, ans=0.125 2023-11-26 17:15:17,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3489706.6666666665, ans=0.125 2023-11-26 17:15:21,767 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.53 vs. limit=12.0 2023-11-26 17:15:33,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3489773.3333333335, ans=0.1 2023-11-26 17:15:37,384 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 6450, loss[loss=0.0571, simple_loss=0.07751, pruned_loss=0.009159, audio_tagging_loss=0.009189, over 15652.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.08767, pruned_loss=0.01196, audio_tagging_loss=0.009212, over 3038255.07 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:15:50,197 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3489906.6666666665, ans=0.2 2023-11-26 17:15:58,314 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3489973.3333333335, ans=0.125 2023-11-26 17:16:01,355 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 523500 2023-11-26 17:16:03,608 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3489973.3333333335, ans=0.0 2023-11-26 17:16:04,558 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3489973.3333333335, ans=0.125 2023-11-26 17:16:08,423 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.35 vs. limit=15.0 2023-11-26 17:16:10,607 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 17:16:15,231 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3490040.0, ans=0.0 2023-11-26 17:16:18,984 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3490040.0, ans=0.0 2023-11-26 17:16:30,750 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3490106.6666666665, ans=0.05 2023-11-26 17:16:32,037 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.41 vs. limit=12.0 2023-11-26 17:16:32,620 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 6500, loss[loss=0.0617, simple_loss=0.07992, pruned_loss=0.01357, audio_tagging_loss=0.008172, over 14648.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08794, pruned_loss=0.01196, audio_tagging_loss=0.009197, over 3041384.44 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:16:32,808 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3490173.3333333335, ans=0.1 2023-11-26 17:16:33,157 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.07 vs. limit=15.0 2023-11-26 17:16:35,035 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3490173.3333333335, ans=0.125 2023-11-26 17:16:37,908 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.378e+01 8.749e+01 9.498e+01 1.005e+02 1.590e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-26 17:16:41,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3490173.3333333335, ans=0.125 2023-11-26 17:16:43,921 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3490240.0, ans=0.0 2023-11-26 17:16:47,233 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3490240.0, ans=0.125 2023-11-26 17:16:51,848 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3490240.0, ans=0.0 2023-11-26 17:16:57,456 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 523550 2023-11-26 17:17:28,338 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 6550, loss[loss=0.0474, simple_loss=0.05392, pruned_loss=0.009537, audio_tagging_loss=0.0109, over 13759.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08806, pruned_loss=0.01195, audio_tagging_loss=0.009043, over 3040635.64 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:17:53,225 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 523600 2023-11-26 17:18:24,779 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 6600, loss[loss=0.05883, simple_loss=0.07968, pruned_loss=0.01141, audio_tagging_loss=0.007577, over 16772.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08945, pruned_loss=0.01216, audio_tagging_loss=0.008927, over 3042693.22 frames. ], batch size: 62, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:18:25,574 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 17:18:30,615 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.701e+01 8.733e+01 9.478e+01 1.039e+02 1.396e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-26 17:18:31,103 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.48 vs. limit=15.0 2023-11-26 17:18:46,314 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.14 vs. limit=22.5 2023-11-26 17:18:46,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3490973.3333333335, ans=0.125 2023-11-26 17:18:48,737 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 523650 2023-11-26 17:18:48,811 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3490973.3333333335, ans=0.125 2023-11-26 17:18:54,622 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3490973.3333333335, ans=0.1 2023-11-26 17:19:20,544 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 6650, loss[loss=0.09348, simple_loss=0.1258, pruned_loss=0.02572, audio_tagging_loss=0.004851, over 15974.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.09007, pruned_loss=0.01217, audio_tagging_loss=0.008732, over 3045675.26 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:19:23,850 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 17:19:37,334 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3491240.0, ans=0.1 2023-11-26 17:19:45,631 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 523700 2023-11-26 17:19:49,344 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.29 vs. limit=10.0 2023-11-26 17:19:54,866 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3491373.3333333335, ans=0.125 2023-11-26 17:20:04,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3491440.0, ans=0.1 2023-11-26 17:20:06,535 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=3491440.0, ans=10.0 2023-11-26 17:20:15,090 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3491506.6666666665, ans=0.1 2023-11-26 17:20:15,836 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 6700, loss[loss=0.05984, simple_loss=0.08046, pruned_loss=0.01048, audio_tagging_loss=0.00913, over 15829.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.09048, pruned_loss=0.01231, audio_tagging_loss=0.008694, over 3041250.85 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:20:21,688 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.319e+01 8.660e+01 9.419e+01 1.025e+02 1.437e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-26 17:20:22,382 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.74 vs. limit=15.0 2023-11-26 17:20:37,634 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.41 vs. limit=12.0 2023-11-26 17:20:41,455 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 523750 2023-11-26 17:20:41,897 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.44 vs. limit=15.0 2023-11-26 17:20:44,682 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3491640.0, ans=0.1 2023-11-26 17:20:59,602 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3491773.3333333335, ans=0.125 2023-11-26 17:21:00,097 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.09 vs. limit=22.5 2023-11-26 17:21:00,898 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.97 vs. limit=10.0 2023-11-26 17:21:02,755 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3491773.3333333335, ans=0.0 2023-11-26 17:21:12,563 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 6750, loss[loss=0.08624, simple_loss=0.124, pruned_loss=0.01465, audio_tagging_loss=0.009605, over 15070.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08974, pruned_loss=0.01218, audio_tagging_loss=0.008675, over 3035357.97 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:21:35,804 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3491973.3333333335, ans=0.2 2023-11-26 17:21:36,698 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 523800 2023-11-26 17:21:56,626 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3492106.6666666665, ans=0.1 2023-11-26 17:22:06,132 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.02 vs. limit=22.5 2023-11-26 17:22:08,688 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 6800, loss[loss=0.08449, simple_loss=0.1087, pruned_loss=0.02361, audio_tagging_loss=0.006512, over 15430.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.08977, pruned_loss=0.01249, audio_tagging_loss=0.008613, over 3040750.70 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:22:13,936 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.758e+01 8.963e+01 9.408e+01 1.006e+02 1.345e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-26 17:22:19,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3492240.0, ans=0.0 2023-11-26 17:22:33,156 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 523850 2023-11-26 17:22:46,031 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 17:22:46,359 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.49 vs. limit=6.0 2023-11-26 17:22:49,161 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3492373.3333333335, ans=0.125 2023-11-26 17:22:58,378 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3492440.0, ans=0.125 2023-11-26 17:23:00,511 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3492440.0, ans=0.125 2023-11-26 17:23:02,726 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3492506.6666666665, ans=0.0 2023-11-26 17:23:03,564 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 6850, loss[loss=0.07998, simple_loss=0.1171, pruned_loss=0.01395, audio_tagging_loss=0.007477, over 16691.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.09052, pruned_loss=0.01251, audio_tagging_loss=0.008578, over 3043414.19 frames. ], batch size: 62, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:23:08,186 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.43 vs. limit=15.0 2023-11-26 17:23:24,906 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3492573.3333333335, ans=0.1 2023-11-26 17:23:29,034 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 523900 2023-11-26 17:23:33,570 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3492640.0, ans=0.125 2023-11-26 17:23:36,703 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3492706.6666666665, ans=0.2 2023-11-26 17:23:49,291 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3492773.3333333335, ans=0.035 2023-11-26 17:23:51,486 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3492773.3333333335, ans=0.0 2023-11-26 17:23:59,753 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 6900, loss[loss=0.0824, simple_loss=0.1151, pruned_loss=0.01772, audio_tagging_loss=0.007153, over 16749.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.09073, pruned_loss=0.01247, audio_tagging_loss=0.008557, over 3045693.06 frames. ], batch size: 61, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:24:06,503 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3492840.0, ans=0.0 2023-11-26 17:24:07,190 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.973e+01 8.659e+01 9.332e+01 1.017e+02 1.232e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-26 17:24:24,253 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 523950 2023-11-26 17:24:43,467 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3493106.6666666665, ans=0.125 2023-11-26 17:24:44,351 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 17:24:56,080 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 6950, loss[loss=0.06586, simple_loss=0.07956, pruned_loss=0.01775, audio_tagging_loss=0.008331, over 15104.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.0901, pruned_loss=0.01228, audio_tagging_loss=0.008566, over 3043272.88 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:24:57,498 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3493173.3333333335, ans=0.1 2023-11-26 17:24:58,474 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3493173.3333333335, ans=0.125 2023-11-26 17:25:10,247 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.50 vs. limit=15.0 2023-11-26 17:25:19,476 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 524000 2023-11-26 17:25:26,132 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3493306.6666666665, ans=0.1 2023-11-26 17:25:42,799 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.84 vs. limit=15.0 2023-11-26 17:25:47,685 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3493440.0, ans=0.125 2023-11-26 17:25:48,953 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.43 vs. limit=6.0 2023-11-26 17:25:50,821 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3493440.0, ans=0.0 2023-11-26 17:25:50,852 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3493440.0, ans=0.5 2023-11-26 17:25:53,710 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 7000, loss[loss=0.0624, simple_loss=0.08462, pruned_loss=0.01163, audio_tagging_loss=0.008463, over 15545.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.0901, pruned_loss=0.01227, audio_tagging_loss=0.008631, over 3042793.79 frames. ], batch size: 61, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:25:53,929 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3493506.6666666665, ans=0.1 2023-11-26 17:26:00,102 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.638e+01 8.888e+01 9.554e+01 1.025e+02 1.624e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-26 17:26:04,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3493573.3333333335, ans=0.07 2023-11-26 17:26:14,020 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.03 vs. limit=15.0 2023-11-26 17:26:19,209 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 524050 2023-11-26 17:26:37,292 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3493773.3333333335, ans=0.0 2023-11-26 17:26:49,084 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 7050, loss[loss=0.07065, simple_loss=0.09472, pruned_loss=0.01532, audio_tagging_loss=0.007973, over 15102.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.08991, pruned_loss=0.01249, audio_tagging_loss=0.008713, over 3035692.77 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:26:50,888 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3493840.0, ans=0.125 2023-11-26 17:27:01,912 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.34 vs. limit=12.0 2023-11-26 17:27:14,139 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 524100 2023-11-26 17:27:22,931 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3494040.0, ans=0.1 2023-11-26 17:27:25,001 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3494040.0, ans=0.0 2023-11-26 17:27:26,103 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3494040.0, ans=0.0 2023-11-26 17:27:31,241 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.00 vs. limit=15.0 2023-11-26 17:27:35,594 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3494106.6666666665, ans=0.1 2023-11-26 17:27:46,084 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 7100, loss[loss=0.0522, simple_loss=0.0698, pruned_loss=0.004752, audio_tagging_loss=0.01254, over 15060.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.09022, pruned_loss=0.01255, audio_tagging_loss=0.008737, over 3046705.90 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:27:52,380 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.497e+01 8.801e+01 9.665e+01 1.022e+02 1.314e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-26 17:27:57,974 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3494240.0, ans=0.125 2023-11-26 17:28:05,503 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.94 vs. limit=15.0 2023-11-26 17:28:09,370 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 524150 2023-11-26 17:28:18,024 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 17:28:25,906 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3494373.3333333335, ans=0.125 2023-11-26 17:28:40,570 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 7150, loss[loss=0.08633, simple_loss=0.1207, pruned_loss=0.01775, audio_tagging_loss=0.008205, over 15497.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.08988, pruned_loss=0.01234, audio_tagging_loss=0.008802, over 3046005.46 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:28:40,819 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3494506.6666666665, ans=0.1 2023-11-26 17:28:41,272 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.60 vs. limit=15.0 2023-11-26 17:28:53,443 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3494573.3333333335, ans=0.04949747468305833 2023-11-26 17:29:05,539 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 524200 2023-11-26 17:29:31,001 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3494773.3333333335, ans=0.025 2023-11-26 17:29:33,105 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=3494773.3333333335, ans=0.05 2023-11-26 17:29:35,229 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3494840.0, ans=0.125 2023-11-26 17:29:36,127 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 7200, loss[loss=0.06271, simple_loss=0.07998, pruned_loss=0.01079, audio_tagging_loss=0.01193, over 14705.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.08959, pruned_loss=0.01223, audio_tagging_loss=0.008941, over 3035736.99 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:29:36,905 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.59 vs. limit=15.0 2023-11-26 17:29:43,587 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.106e+01 8.953e+01 9.579e+01 1.038e+02 1.531e+02, threshold=1.916e+02, percent-clipped=0.0 2023-11-26 17:29:51,020 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.70 vs. limit=15.0 2023-11-26 17:29:56,579 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3494906.6666666665, ans=0.125 2023-11-26 17:30:01,763 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 524250 2023-11-26 17:30:04,072 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3494973.3333333335, ans=0.0 2023-11-26 17:30:08,232 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3494973.3333333335, ans=0.125 2023-11-26 17:30:32,731 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 7250, loss[loss=0.05519, simple_loss=0.07086, pruned_loss=0.008484, audio_tagging_loss=0.01128, over 13217.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08914, pruned_loss=0.012, audio_tagging_loss=0.009016, over 3031264.42 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:30:56,610 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 524300 2023-11-26 17:30:57,695 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3495306.6666666665, ans=0.125 2023-11-26 17:31:06,171 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3495373.3333333335, ans=0.125 2023-11-26 17:31:07,421 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3495373.3333333335, ans=0.125 2023-11-26 17:31:28,172 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 7300, loss[loss=0.08126, simple_loss=0.1112, pruned_loss=0.0191, audio_tagging_loss=0.006573, over 15550.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.08877, pruned_loss=0.01196, audio_tagging_loss=0.008882, over 3028448.27 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:31:31,665 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3495506.6666666665, ans=0.95 2023-11-26 17:31:35,612 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.194e+01 8.863e+01 9.398e+01 1.006e+02 1.192e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-26 17:31:52,479 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 524350 2023-11-26 17:32:13,013 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 17:32:23,232 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 7350, loss[loss=0.06341, simple_loss=0.08883, pruned_loss=0.008894, audio_tagging_loss=0.0101, over 16256.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08958, pruned_loss=0.01206, audio_tagging_loss=0.008681, over 3038528.41 frames. ], batch size: 61, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:32:33,966 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3495906.6666666665, ans=0.125 2023-11-26 17:32:46,156 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3495973.3333333335, ans=0.125 2023-11-26 17:32:46,163 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3495973.3333333335, ans=0.125 2023-11-26 17:32:48,579 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 524400 2023-11-26 17:33:20,278 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 7400, loss[loss=0.06821, simple_loss=0.08538, pruned_loss=0.01567, audio_tagging_loss=0.009853, over 14644.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08919, pruned_loss=0.01202, audio_tagging_loss=0.008568, over 3038545.92 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:33:27,564 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.789e+01 8.991e+01 9.600e+01 1.026e+02 1.969e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-26 17:33:40,069 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3496240.0, ans=0.025 2023-11-26 17:33:43,633 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.78 vs. limit=15.0 2023-11-26 17:33:44,221 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 524450 2023-11-26 17:33:45,364 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3496306.6666666665, ans=0.125 2023-11-26 17:33:47,066 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.83 vs. limit=22.5 2023-11-26 17:33:50,924 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.30 vs. limit=15.0 2023-11-26 17:33:51,725 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 17:34:05,132 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.54 vs. limit=22.5 2023-11-26 17:34:05,903 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3496440.0, ans=0.05 2023-11-26 17:34:08,066 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3496440.0, ans=0.2 2023-11-26 17:34:10,566 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.81 vs. limit=15.0 2023-11-26 17:34:15,323 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 7450, loss[loss=0.06186, simple_loss=0.08709, pruned_loss=0.01078, audio_tagging_loss=0.007536, over 14673.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08897, pruned_loss=0.01212, audio_tagging_loss=0.008553, over 3038766.91 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:34:40,336 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 524500 2023-11-26 17:35:07,036 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3496773.3333333335, ans=0.125 2023-11-26 17:35:11,143 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 7500, loss[loss=0.05453, simple_loss=0.07837, pruned_loss=0.006256, audio_tagging_loss=0.009091, over 16072.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.08892, pruned_loss=0.01202, audio_tagging_loss=0.00848, over 3038760.47 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:35:13,489 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3496840.0, ans=0.0 2023-11-26 17:35:15,913 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.52 vs. limit=15.0 2023-11-26 17:35:19,587 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.260e+01 8.894e+01 9.302e+01 1.027e+02 1.348e+02, threshold=1.860e+02, percent-clipped=1.0 2023-11-26 17:35:21,436 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.19 vs. limit=15.0 2023-11-26 17:35:36,263 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 524550 2023-11-26 17:35:39,487 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3496973.3333333335, ans=0.125 2023-11-26 17:35:40,716 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3496973.3333333335, ans=0.2 2023-11-26 17:36:03,369 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3497106.6666666665, ans=0.0 2023-11-26 17:36:05,372 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3497106.6666666665, ans=0.125 2023-11-26 17:36:07,362 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 7550, loss[loss=0.05945, simple_loss=0.07929, pruned_loss=0.008284, audio_tagging_loss=0.01152, over 14790.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.08858, pruned_loss=0.01215, audio_tagging_loss=0.008565, over 3031249.03 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:36:13,595 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3497173.3333333335, ans=0.2 2023-11-26 17:36:32,055 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 524600 2023-11-26 17:36:32,731 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.33 vs. limit=15.0 2023-11-26 17:36:37,847 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3497306.6666666665, ans=0.0 2023-11-26 17:36:40,528 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.24 vs. limit=5.0 2023-11-26 17:37:03,391 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 7600, loss[loss=0.07196, simple_loss=0.0956, pruned_loss=0.014, audio_tagging_loss=0.01015, over 15836.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08963, pruned_loss=0.01237, audio_tagging_loss=0.008492, over 3036172.06 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:37:03,643 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3497506.6666666665, ans=0.125 2023-11-26 17:37:07,842 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3497506.6666666665, ans=0.125 2023-11-26 17:37:10,767 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.143e+01 8.779e+01 9.690e+01 1.052e+02 1.195e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-26 17:37:17,085 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3497573.3333333335, ans=0.0 2023-11-26 17:37:27,967 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 524650 2023-11-26 17:37:53,333 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3497773.3333333335, ans=0.5 2023-11-26 17:37:54,385 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3497773.3333333335, ans=0.125 2023-11-26 17:37:58,860 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 7650, loss[loss=0.09197, simple_loss=0.1268, pruned_loss=0.02137, audio_tagging_loss=0.007191, over 14604.00 frames. ], tot_loss[loss=0.06492, simple_loss=0.08864, pruned_loss=0.01214, audio_tagging_loss=0.008466, over 3032243.61 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:38:17,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3497906.6666666665, ans=0.1 2023-11-26 17:38:20,494 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3497973.3333333335, ans=0.5 2023-11-26 17:38:23,442 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 524700 2023-11-26 17:38:29,975 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3497973.3333333335, ans=0.125 2023-11-26 17:38:40,888 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3498040.0, ans=0.0 2023-11-26 17:38:54,585 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 7700, loss[loss=0.07616, simple_loss=0.111, pruned_loss=0.01124, audio_tagging_loss=0.009424, over 15978.00 frames. ], tot_loss[loss=0.06494, simple_loss=0.08848, pruned_loss=0.01215, audio_tagging_loss=0.008551, over 3027886.37 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:38:56,960 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3498173.3333333335, ans=0.0 2023-11-26 17:39:02,510 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.452e+01 9.021e+01 9.589e+01 1.019e+02 1.364e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-26 17:39:06,017 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3498240.0, ans=0.1 2023-11-26 17:39:06,329 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.11 vs. limit=22.5 2023-11-26 17:39:09,513 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.48 vs. limit=6.0 2023-11-26 17:39:16,500 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3498306.6666666665, ans=0.0 2023-11-26 17:39:18,500 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 524750 2023-11-26 17:39:19,679 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3498306.6666666665, ans=0.0 2023-11-26 17:39:32,132 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3498373.3333333335, ans=0.125 2023-11-26 17:39:40,646 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3498440.0, ans=0.1 2023-11-26 17:39:41,171 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.15 vs. limit=15.0 2023-11-26 17:39:50,768 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 7750, loss[loss=0.07666, simple_loss=0.1074, pruned_loss=0.01164, audio_tagging_loss=0.0113, over 14769.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.08815, pruned_loss=0.01212, audio_tagging_loss=0.008607, over 3031822.82 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:39:56,260 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3498506.6666666665, ans=0.1 2023-11-26 17:40:03,699 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3498573.3333333335, ans=0.125 2023-11-26 17:40:04,714 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3498573.3333333335, ans=0.1 2023-11-26 17:40:14,460 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 524800 2023-11-26 17:40:20,932 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3498640.0, ans=0.0 2023-11-26 17:40:35,209 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3498773.3333333335, ans=0.125 2023-11-26 17:40:45,533 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 7800, loss[loss=0.06586, simple_loss=0.0926, pruned_loss=0.01031, audio_tagging_loss=0.009248, over 15058.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08897, pruned_loss=0.0124, audio_tagging_loss=0.008645, over 3034702.07 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:40:54,548 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.572e+01 8.862e+01 9.365e+01 1.011e+02 1.202e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-26 17:40:54,797 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3498840.0, ans=0.0 2023-11-26 17:41:07,094 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 17:41:11,077 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 524850 2023-11-26 17:41:13,584 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.09 vs. limit=12.0 2023-11-26 17:41:41,886 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 7850, loss[loss=0.07005, simple_loss=0.09801, pruned_loss=0.01204, audio_tagging_loss=0.009006, over 16488.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08854, pruned_loss=0.01227, audio_tagging_loss=0.008736, over 3042557.14 frames. ], batch size: 62, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:42:06,282 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 524900 2023-11-26 17:42:38,164 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 7900, loss[loss=0.06893, simple_loss=0.09382, pruned_loss=0.01466, audio_tagging_loss=0.007367, over 14609.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08865, pruned_loss=0.0124, audio_tagging_loss=0.008797, over 3043566.24 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:42:38,388 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3499506.6666666665, ans=0.0 2023-11-26 17:42:42,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3499506.6666666665, ans=0.125 2023-11-26 17:42:42,592 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3499506.6666666665, ans=0.09899494936611666 2023-11-26 17:42:45,599 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3499506.6666666665, ans=0.0 2023-11-26 17:42:45,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3499506.6666666665, ans=0.2 2023-11-26 17:42:45,739 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3499506.6666666665, ans=0.0 2023-11-26 17:42:46,458 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.148e+01 9.062e+01 9.709e+01 1.039e+02 1.444e+02, threshold=1.942e+02, percent-clipped=0.0 2023-11-26 17:42:47,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3499573.3333333335, ans=0.1 2023-11-26 17:42:49,843 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3499573.3333333335, ans=0.125 2023-11-26 17:43:01,416 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 524950 2023-11-26 17:43:10,916 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3499706.6666666665, ans=0.125 2023-11-26 17:43:17,612 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 17:43:21,845 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3499773.3333333335, ans=0.2 2023-11-26 17:43:32,487 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3499840.0, ans=0.125 2023-11-26 17:43:33,419 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 7950, loss[loss=0.06117, simple_loss=0.08261, pruned_loss=0.008779, audio_tagging_loss=0.01108, over 15024.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08844, pruned_loss=0.01234, audio_tagging_loss=0.008882, over 3043074.71 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:43:34,814 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3499840.0, ans=0.125 2023-11-26 17:43:36,824 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3499840.0, ans=0.125 2023-11-26 17:43:39,168 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.43 vs. limit=15.0 2023-11-26 17:43:50,591 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 17:43:59,128 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 525000 2023-11-26 17:43:59,332 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3499973.3333333335, ans=0.1 2023-11-26 17:44:06,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3499973.3333333335, ans=0.125 2023-11-26 17:44:18,799 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.72 vs. limit=15.0 2023-11-26 17:44:20,699 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3500106.6666666665, ans=0.07 2023-11-26 17:44:28,917 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 8000, loss[loss=0.05211, simple_loss=0.07254, pruned_loss=0.007831, audio_tagging_loss=0.008006, over 14892.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08836, pruned_loss=0.01229, audio_tagging_loss=0.008918, over 3036818.84 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:44:29,396 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.79 vs. limit=15.0 2023-11-26 17:44:38,612 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.125e+01 8.982e+01 9.655e+01 1.047e+02 1.497e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-26 17:44:41,383 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3500240.0, ans=0.2 2023-11-26 17:44:54,222 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 525050 2023-11-26 17:45:08,233 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3500373.3333333335, ans=0.125 2023-11-26 17:45:25,876 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 8050, loss[loss=0.0846, simple_loss=0.1175, pruned_loss=0.01999, audio_tagging_loss=0.00585, over 14659.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.08797, pruned_loss=0.01228, audio_tagging_loss=0.009057, over 3039482.09 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:45:28,306 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3500506.6666666665, ans=0.2 2023-11-26 17:45:36,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3500573.3333333335, ans=0.1 2023-11-26 17:45:49,202 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 525100 2023-11-26 17:46:21,176 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 8100, loss[loss=0.07007, simple_loss=0.1017, pruned_loss=0.01349, audio_tagging_loss=0.005714, over 15047.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08874, pruned_loss=0.0123, audio_tagging_loss=0.00896, over 3039276.47 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:46:22,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3500840.0, ans=0.2 2023-11-26 17:46:29,608 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.030e+01 8.964e+01 9.424e+01 1.017e+02 1.199e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-26 17:46:32,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3500906.6666666665, ans=0.05 2023-11-26 17:46:45,700 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 525150 2023-11-26 17:46:47,439 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3500973.3333333335, ans=0.0 2023-11-26 17:46:57,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3501040.0, ans=0.2 2023-11-26 17:46:58,143 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3501040.0, ans=0.0 2023-11-26 17:47:16,041 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 8150, loss[loss=0.06183, simple_loss=0.08436, pruned_loss=0.01313, audio_tagging_loss=0.006521, over 14165.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.08957, pruned_loss=0.01237, audio_tagging_loss=0.008749, over 3042680.39 frames. ], batch size: 53, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:47:23,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3501173.3333333335, ans=0.1 2023-11-26 17:47:35,296 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3501240.0, ans=0.125 2023-11-26 17:47:37,081 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.62 vs. limit=5.0 2023-11-26 17:47:38,935 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.25 vs. limit=10.0 2023-11-26 17:47:41,623 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 525200 2023-11-26 17:47:48,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3501306.6666666665, ans=0.0 2023-11-26 17:47:58,771 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3501373.3333333335, ans=0.125 2023-11-26 17:48:13,421 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 8200, loss[loss=0.08055, simple_loss=0.1154, pruned_loss=0.01859, audio_tagging_loss=0.004269, over 15580.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.09054, pruned_loss=0.01239, audio_tagging_loss=0.008632, over 3049114.87 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:48:16,602 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 17:48:21,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3501506.6666666665, ans=0.1 2023-11-26 17:48:22,946 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.532e+01 8.766e+01 9.406e+01 1.001e+02 1.239e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-26 17:48:36,849 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 525250 2023-11-26 17:49:08,676 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 8250, loss[loss=0.06115, simple_loss=0.07425, pruned_loss=0.01056, audio_tagging_loss=0.01346, over 14942.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08968, pruned_loss=0.01215, audio_tagging_loss=0.008635, over 3049837.79 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:49:30,598 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3501973.3333333335, ans=0.025 2023-11-26 17:49:33,196 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 525300 2023-11-26 17:49:34,422 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3501973.3333333335, ans=0.125 2023-11-26 17:49:48,282 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3502040.0, ans=0.125 2023-11-26 17:50:03,796 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 8300, loss[loss=0.06725, simple_loss=0.1009, pruned_loss=0.008946, audio_tagging_loss=0.007831, over 16450.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08914, pruned_loss=0.01213, audio_tagging_loss=0.008652, over 3049430.03 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:50:06,057 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3502173.3333333335, ans=0.125 2023-11-26 17:50:14,244 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.129e+01 8.963e+01 9.594e+01 1.033e+02 1.265e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-26 17:50:28,595 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 525350 2023-11-26 17:50:34,033 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3502306.6666666665, ans=0.0 2023-11-26 17:50:48,938 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3502440.0, ans=0.0 2023-11-26 17:50:58,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3502506.6666666665, ans=0.0 2023-11-26 17:50:59,502 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 8350, loss[loss=0.0534, simple_loss=0.07702, pruned_loss=0.006875, audio_tagging_loss=0.008014, over 15270.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08908, pruned_loss=0.01219, audio_tagging_loss=0.008568, over 3041312.65 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:51:07,720 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.69 vs. limit=15.0 2023-11-26 17:51:23,993 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 525400 2023-11-26 17:51:47,760 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3502773.3333333335, ans=0.09899494936611666 2023-11-26 17:51:55,985 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 8400, loss[loss=0.05862, simple_loss=0.08343, pruned_loss=0.008073, audio_tagging_loss=0.008834, over 15449.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.0887, pruned_loss=0.0121, audio_tagging_loss=0.008594, over 3041696.14 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:52:03,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3502840.0, ans=0.125 2023-11-26 17:52:05,392 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.713e+01 8.821e+01 9.539e+01 1.045e+02 1.757e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-26 17:52:19,648 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 525450 2023-11-26 17:52:23,882 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.63 vs. limit=15.0 2023-11-26 17:52:29,337 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3503040.0, ans=0.125 2023-11-26 17:52:34,437 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.30 vs. limit=10.0 2023-11-26 17:52:50,787 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 8450, loss[loss=0.07859, simple_loss=0.1067, pruned_loss=0.01767, audio_tagging_loss=0.007589, over 15247.00 frames. ], tot_loss[loss=0.06449, simple_loss=0.08808, pruned_loss=0.01189, audio_tagging_loss=0.00856, over 3049071.14 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:53:11,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3503240.0, ans=0.07 2023-11-26 17:53:16,138 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 525500 2023-11-26 17:53:19,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=3503306.6666666665, ans=0.95 2023-11-26 17:53:19,855 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.33 vs. limit=15.0 2023-11-26 17:53:45,938 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3503506.6666666665, ans=0.0 2023-11-26 17:53:46,680 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 8500, loss[loss=0.06804, simple_loss=0.08983, pruned_loss=0.01556, audio_tagging_loss=0.00756, over 15117.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.0892, pruned_loss=0.01222, audio_tagging_loss=0.008474, over 3052312.11 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:53:57,859 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.14 vs. limit=10.0 2023-11-26 17:53:58,147 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.806e+01 8.914e+01 9.480e+01 1.019e+02 1.218e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-26 17:54:10,886 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 525550 2023-11-26 17:54:16,387 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3503640.0, ans=0.0 2023-11-26 17:54:18,925 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.95 vs. limit=15.0 2023-11-26 17:54:19,018 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.80 vs. limit=15.0 2023-11-26 17:54:42,955 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 8550, loss[loss=0.05516, simple_loss=0.0672, pruned_loss=0.008125, audio_tagging_loss=0.01344, over 15953.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.09, pruned_loss=0.01221, audio_tagging_loss=0.008482, over 3049740.26 frames. ], batch size: 62, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:54:44,317 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3503840.0, ans=0.07 2023-11-26 17:54:53,061 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.68 vs. limit=6.0 2023-11-26 17:55:02,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3503906.6666666665, ans=0.125 2023-11-26 17:55:06,692 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 525600 2023-11-26 17:55:37,976 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 8600, loss[loss=0.04563, simple_loss=0.05829, pruned_loss=0.006489, audio_tagging_loss=0.009999, over 14297.00 frames. ], tot_loss[loss=0.06486, simple_loss=0.08853, pruned_loss=0.01202, audio_tagging_loss=0.008579, over 3049882.82 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:55:38,180 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3504173.3333333335, ans=0.125 2023-11-26 17:55:38,572 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.54 vs. limit=12.0 2023-11-26 17:55:45,674 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3504173.3333333335, ans=0.125 2023-11-26 17:55:49,034 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.001e+01 8.808e+01 9.575e+01 1.022e+02 1.354e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-26 17:55:52,944 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.20 vs. limit=15.0 2023-11-26 17:56:00,279 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=3504306.6666666665, ans=10.0 2023-11-26 17:56:01,018 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3504306.6666666665, ans=0.125 2023-11-26 17:56:02,976 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 525650 2023-11-26 17:56:07,392 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3504306.6666666665, ans=0.125 2023-11-26 17:56:07,593 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.16 vs. limit=15.0 2023-11-26 17:56:11,255 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3504373.3333333335, ans=0.0 2023-11-26 17:56:33,780 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 8650, loss[loss=0.06902, simple_loss=0.09206, pruned_loss=0.01338, audio_tagging_loss=0.009616, over 15637.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.08883, pruned_loss=0.01199, audio_tagging_loss=0.008606, over 3055566.89 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:56:39,782 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.78 vs. limit=15.0 2023-11-26 17:56:42,453 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3504506.6666666665, ans=0.0 2023-11-26 17:56:58,844 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 525700 2023-11-26 17:57:30,013 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 8700, loss[loss=0.0603, simple_loss=0.08492, pruned_loss=0.009651, audio_tagging_loss=0.008196, over 15250.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08893, pruned_loss=0.01198, audio_tagging_loss=0.008767, over 3049359.80 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:57:31,665 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.11 vs. limit=15.0 2023-11-26 17:57:34,435 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3504840.0, ans=0.125 2023-11-26 17:57:41,079 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.778e+01 8.922e+01 9.364e+01 9.934e+01 1.569e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-26 17:57:53,943 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 525750 2023-11-26 17:58:06,331 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 17:58:19,948 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3505106.6666666665, ans=0.125 2023-11-26 17:58:21,608 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3505106.6666666665, ans=0.0 2023-11-26 17:58:25,623 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 8750, loss[loss=0.06723, simple_loss=0.09098, pruned_loss=0.01333, audio_tagging_loss=0.008407, over 15347.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08977, pruned_loss=0.01223, audio_tagging_loss=0.008719, over 3046606.87 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 17:58:35,118 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3505240.0, ans=0.125 2023-11-26 17:58:43,787 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.83 vs. limit=15.0 2023-11-26 17:58:50,421 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 525800 2023-11-26 17:58:58,237 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3505373.3333333335, ans=0.125 2023-11-26 17:59:06,711 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.74 vs. limit=15.0 2023-11-26 17:59:13,727 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3505440.0, ans=0.2 2023-11-26 17:59:14,057 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.68 vs. limit=12.0 2023-11-26 17:59:17,208 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.71 vs. limit=22.5 2023-11-26 17:59:18,993 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3505440.0, ans=0.05 2023-11-26 17:59:21,327 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 8800, loss[loss=0.08033, simple_loss=0.1082, pruned_loss=0.01731, audio_tagging_loss=0.008941, over 15199.00 frames. ], tot_loss[loss=0.06668, simple_loss=0.09075, pruned_loss=0.01245, audio_tagging_loss=0.008858, over 3048227.92 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 17:59:31,054 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3505506.6666666665, ans=0.07 2023-11-26 17:59:32,886 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.888e+01 9.136e+01 9.670e+01 1.038e+02 1.622e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-26 17:59:37,339 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3505573.3333333335, ans=0.0 2023-11-26 17:59:45,715 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 525850 2023-11-26 17:59:57,437 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.00 vs. limit=15.0 2023-11-26 17:59:59,178 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3505706.6666666665, ans=0.05 2023-11-26 18:00:17,409 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 8850, loss[loss=0.05814, simple_loss=0.08544, pruned_loss=0.008149, audio_tagging_loss=0.007265, over 15104.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.09046, pruned_loss=0.01215, audio_tagging_loss=0.008905, over 3045242.23 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 18:00:21,251 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.85 vs. limit=15.0 2023-11-26 18:00:23,879 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3505840.0, ans=0.0 2023-11-26 18:00:30,096 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 18:00:41,265 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 525900 2023-11-26 18:00:51,943 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3506040.0, ans=0.125 2023-11-26 18:01:08,635 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.96 vs. limit=10.0 2023-11-26 18:01:12,213 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 8900, loss[loss=0.07552, simple_loss=0.1101, pruned_loss=0.01312, audio_tagging_loss=0.007352, over 15025.00 frames. ], tot_loss[loss=0.0663, simple_loss=0.09069, pruned_loss=0.01214, audio_tagging_loss=0.008809, over 3039590.05 frames. ], batch size: 53, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 18:01:15,128 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3506173.3333333335, ans=0.125 2023-11-26 18:01:22,398 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3506240.0, ans=0.1 2023-11-26 18:01:23,506 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 18:01:24,353 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.622e+01 8.724e+01 9.341e+01 9.971e+01 1.167e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-26 18:01:25,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3506240.0, ans=0.1 2023-11-26 18:01:27,235 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 18:01:37,698 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 525950 2023-11-26 18:01:44,406 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.67 vs. limit=15.0 2023-11-26 18:02:07,762 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 8950, loss[loss=0.04094, simple_loss=0.05531, pruned_loss=0.006131, audio_tagging_loss=0.00716, over 13593.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.08953, pruned_loss=0.01191, audio_tagging_loss=0.008657, over 3038487.17 frames. ], batch size: 53, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:02:21,168 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3506573.3333333335, ans=0.0 2023-11-26 18:02:32,640 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 526000 2023-11-26 18:02:32,844 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3506640.0, ans=0.0 2023-11-26 18:02:52,085 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3506773.3333333335, ans=0.1 2023-11-26 18:02:52,228 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.90 vs. limit=15.0 2023-11-26 18:03:02,679 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3506773.3333333335, ans=0.0 2023-11-26 18:03:04,589 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 9000, loss[loss=0.04851, simple_loss=0.05747, pruned_loss=0.008948, audio_tagging_loss=0.01083, over 14102.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08997, pruned_loss=0.01211, audio_tagging_loss=0.008531, over 3049038.69 frames. ], batch size: 53, lr: 1.53e-03, grad_scale: 8.0 2023-11-26 18:03:04,590 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-26 18:03:36,861 INFO [train_asr.py:1267] (3/4) Epoch 44, validation: loss=0.05857, simple_loss=0.05054, pruned_loss=0.005271, audio_tagging_loss=0.02803, over 4681554.00 frames. 2023-11-26 18:03:36,862 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-26 18:03:38,247 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3506840.0, ans=0.125 2023-11-26 18:03:48,103 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3506906.6666666665, ans=0.125 2023-11-26 18:03:50,483 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.728e+01 9.015e+01 9.647e+01 1.018e+02 1.400e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-26 18:04:02,232 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 526050 2023-11-26 18:04:02,369 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3506973.3333333335, ans=0.125 2023-11-26 18:04:05,035 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.48 vs. limit=15.0 2023-11-26 18:04:08,679 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3506973.3333333335, ans=0.125 2023-11-26 18:04:32,825 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 9050, loss[loss=0.0614, simple_loss=0.08457, pruned_loss=0.01089, audio_tagging_loss=0.00823, over 14846.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08927, pruned_loss=0.01217, audio_tagging_loss=0.008594, over 3044237.76 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 8.0 2023-11-26 18:04:43,682 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3507240.0, ans=0.0 2023-11-26 18:04:57,321 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 526100 2023-11-26 18:04:59,565 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3507306.6666666665, ans=0.125 2023-11-26 18:05:03,874 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3507306.6666666665, ans=0.0 2023-11-26 18:05:04,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3507373.3333333335, ans=0.0 2023-11-26 18:05:05,973 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3507373.3333333335, ans=0.1 2023-11-26 18:05:15,606 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3507373.3333333335, ans=0.125 2023-11-26 18:05:29,288 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 9100, loss[loss=0.07814, simple_loss=0.1137, pruned_loss=0.01575, audio_tagging_loss=0.005558, over 14637.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08992, pruned_loss=0.01228, audio_tagging_loss=0.008533, over 3039362.44 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 8.0 2023-11-26 18:05:35,886 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3507506.6666666665, ans=0.1 2023-11-26 18:05:41,911 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.792e+01 8.875e+01 9.431e+01 1.015e+02 1.268e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-26 18:05:53,187 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 526150 2023-11-26 18:05:56,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3507640.0, ans=0.0 2023-11-26 18:06:09,948 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3507706.6666666665, ans=0.125 2023-11-26 18:06:24,510 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 9150, loss[loss=0.05437, simple_loss=0.07078, pruned_loss=0.009583, audio_tagging_loss=0.009394, over 15175.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08945, pruned_loss=0.01232, audio_tagging_loss=0.008594, over 3043659.85 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 8.0 2023-11-26 18:06:39,355 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.62 vs. limit=15.0 2023-11-26 18:06:41,063 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.30 vs. limit=10.0 2023-11-26 18:06:42,242 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.19 vs. limit=10.0 2023-11-26 18:06:50,121 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 526200 2023-11-26 18:07:20,630 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 9200, loss[loss=0.06994, simple_loss=0.08747, pruned_loss=0.01543, audio_tagging_loss=0.01077, over 14908.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08857, pruned_loss=0.01227, audio_tagging_loss=0.008617, over 3043449.43 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:07:20,848 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3508173.3333333335, ans=0.1 2023-11-26 18:07:34,944 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.545e+01 8.887e+01 9.428e+01 1.018e+02 1.949e+02, threshold=1.886e+02, percent-clipped=1.0 2023-11-26 18:07:40,491 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3508240.0, ans=0.5 2023-11-26 18:07:41,533 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3508240.0, ans=0.0 2023-11-26 18:07:45,607 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 526250 2023-11-26 18:07:53,208 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3508373.3333333335, ans=0.125 2023-11-26 18:08:14,385 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3508440.0, ans=0.125 2023-11-26 18:08:16,952 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.02 vs. limit=10.0 2023-11-26 18:08:17,401 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 9250, loss[loss=0.05361, simple_loss=0.06807, pruned_loss=0.009108, audio_tagging_loss=0.01047, over 14938.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08838, pruned_loss=0.01231, audio_tagging_loss=0.008525, over 3048698.41 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:08:25,032 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 18:08:40,778 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 526300 2023-11-26 18:08:57,932 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.98 vs. limit=15.0 2023-11-26 18:09:00,891 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3508773.3333333335, ans=0.125 2023-11-26 18:09:12,330 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 9300, loss[loss=0.06213, simple_loss=0.08031, pruned_loss=0.01332, audio_tagging_loss=0.008653, over 14864.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.0892, pruned_loss=0.01234, audio_tagging_loss=0.008582, over 3053118.16 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:09:25,107 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.636e+01 8.680e+01 9.500e+01 1.023e+02 1.279e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-26 18:09:28,381 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3508906.6666666665, ans=0.0 2023-11-26 18:09:37,819 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 526350 2023-11-26 18:09:44,305 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3508973.3333333335, ans=0.125 2023-11-26 18:09:53,882 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3509040.0, ans=0.125 2023-11-26 18:09:54,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3509040.0, ans=0.0 2023-11-26 18:09:58,054 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3509106.6666666665, ans=0.125 2023-11-26 18:10:07,402 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 9350, loss[loss=0.05781, simple_loss=0.08031, pruned_loss=0.008072, audio_tagging_loss=0.009584, over 15270.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08952, pruned_loss=0.01238, audio_tagging_loss=0.008593, over 3047913.76 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:10:33,007 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 526400 2023-11-26 18:10:41,255 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.29 vs. limit=15.0 2023-11-26 18:10:46,424 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.66 vs. limit=15.0 2023-11-26 18:10:48,354 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3509373.3333333335, ans=10.0 2023-11-26 18:10:49,174 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3509373.3333333335, ans=0.1 2023-11-26 18:11:04,585 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 9400, loss[loss=0.05681, simple_loss=0.07774, pruned_loss=0.008196, audio_tagging_loss=0.009745, over 16275.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.09014, pruned_loss=0.01241, audio_tagging_loss=0.008683, over 3048887.95 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:11:14,080 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3509573.3333333335, ans=0.125 2023-11-26 18:11:15,342 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3509573.3333333335, ans=0.125 2023-11-26 18:11:17,204 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.575e+01 8.876e+01 9.519e+01 1.023e+02 1.284e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-26 18:11:20,868 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.58 vs. limit=15.0 2023-11-26 18:11:27,067 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3509640.0, ans=10.0 2023-11-26 18:11:27,907 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 526450 2023-11-26 18:11:45,277 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3509706.6666666665, ans=0.125 2023-11-26 18:11:59,794 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 9450, loss[loss=0.06336, simple_loss=0.08234, pruned_loss=0.01351, audio_tagging_loss=0.008681, over 14472.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.09039, pruned_loss=0.01238, audio_tagging_loss=0.008757, over 3055012.11 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:12:00,919 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 18:12:12,824 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3509906.6666666665, ans=0.05 2023-11-26 18:12:24,994 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 526500 2023-11-26 18:12:32,375 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.13 vs. limit=12.0 2023-11-26 18:12:38,723 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.68 vs. limit=15.0 2023-11-26 18:12:52,694 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.03 vs. limit=22.5 2023-11-26 18:12:55,192 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 9500, loss[loss=0.07681, simple_loss=0.1037, pruned_loss=0.01717, audio_tagging_loss=0.007808, over 14473.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.0897, pruned_loss=0.01237, audio_tagging_loss=0.008924, over 3048611.52 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:13:05,067 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3510173.3333333335, ans=0.2 2023-11-26 18:13:09,444 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.180e+01 8.837e+01 9.537e+01 1.034e+02 1.299e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-26 18:13:20,790 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 526550 2023-11-26 18:13:30,346 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 18:13:48,217 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3510440.0, ans=0.0 2023-11-26 18:13:51,803 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 9550, loss[loss=0.0601, simple_loss=0.07518, pruned_loss=0.008695, audio_tagging_loss=0.01381, over 15045.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.08942, pruned_loss=0.01229, audio_tagging_loss=0.009082, over 3050121.48 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:13:55,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3510506.6666666665, ans=0.125 2023-11-26 18:14:06,433 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3510573.3333333335, ans=0.0 2023-11-26 18:14:06,713 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.70 vs. limit=15.0 2023-11-26 18:14:15,672 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 526600 2023-11-26 18:14:22,441 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3510640.0, ans=0.0 2023-11-26 18:14:23,536 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3510706.6666666665, ans=0.2 2023-11-26 18:14:27,142 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3510706.6666666665, ans=0.125 2023-11-26 18:14:40,486 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3510773.3333333335, ans=0.0 2023-11-26 18:14:47,793 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 9600, loss[loss=0.08062, simple_loss=0.1111, pruned_loss=0.01613, audio_tagging_loss=0.008956, over 16472.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.08959, pruned_loss=0.0123, audio_tagging_loss=0.009031, over 3053541.30 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 18:14:55,477 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3510840.0, ans=0.2 2023-11-26 18:15:00,587 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.506e+01 8.909e+01 9.426e+01 1.006e+02 1.618e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-26 18:15:04,940 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3510906.6666666665, ans=0.0 2023-11-26 18:15:11,657 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 526650 2023-11-26 18:15:43,133 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 9650, loss[loss=0.05396, simple_loss=0.0742, pruned_loss=0.007581, audio_tagging_loss=0.009275, over 15944.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.08913, pruned_loss=0.01226, audio_tagging_loss=0.009047, over 3048120.36 frames. ], batch size: 61, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 18:15:45,801 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.68 vs. limit=15.0 2023-11-26 18:16:00,819 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3511240.0, ans=0.125 2023-11-26 18:16:04,196 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.36 vs. limit=15.0 2023-11-26 18:16:05,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3511306.6666666665, ans=0.2 2023-11-26 18:16:07,772 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3511306.6666666665, ans=0.125 2023-11-26 18:16:08,687 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 526700 2023-11-26 18:16:30,453 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.75 vs. limit=15.0 2023-11-26 18:16:32,372 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3511440.0, ans=0.05 2023-11-26 18:16:35,982 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3511440.0, ans=0.125 2023-11-26 18:16:38,732 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3511506.6666666665, ans=0.125 2023-11-26 18:16:38,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3511506.6666666665, ans=0.125 2023-11-26 18:16:40,148 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 9700, loss[loss=0.04031, simple_loss=0.05636, pruned_loss=0.004631, audio_tagging_loss=0.007503, over 14922.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.08962, pruned_loss=0.01238, audio_tagging_loss=0.008891, over 3046497.76 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:16:52,592 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3511573.3333333335, ans=0.125 2023-11-26 18:16:54,490 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.934e+01 9.029e+01 9.553e+01 1.029e+02 1.538e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-26 18:17:04,058 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 526750 2023-11-26 18:17:04,193 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3511640.0, ans=0.0 2023-11-26 18:17:10,442 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3511640.0, ans=0.125 2023-11-26 18:17:35,707 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 9750, loss[loss=0.08385, simple_loss=0.1187, pruned_loss=0.01665, audio_tagging_loss=0.00783, over 15320.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.08978, pruned_loss=0.01239, audio_tagging_loss=0.008751, over 3050300.17 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:17:52,166 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3511906.6666666665, ans=0.125 2023-11-26 18:17:53,190 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3511906.6666666665, ans=0.2 2023-11-26 18:17:56,389 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3511973.3333333335, ans=0.2 2023-11-26 18:17:59,924 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 526800 2023-11-26 18:18:00,087 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3511973.3333333335, ans=0.125 2023-11-26 18:18:05,709 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3511973.3333333335, ans=0.025 2023-11-26 18:18:10,217 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.37 vs. limit=6.0 2023-11-26 18:18:31,623 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 9800, loss[loss=0.06838, simple_loss=0.09653, pruned_loss=0.01334, audio_tagging_loss=0.006777, over 14640.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08982, pruned_loss=0.01229, audio_tagging_loss=0.008551, over 3042171.72 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:18:31,811 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3512173.3333333335, ans=0.0 2023-11-26 18:18:45,885 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.579e+01 8.863e+01 9.415e+01 1.026e+02 1.443e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-26 18:18:47,778 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3512240.0, ans=0.0 2023-11-26 18:18:55,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3512306.6666666665, ans=0.04949747468305833 2023-11-26 18:18:56,654 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 526850 2023-11-26 18:18:57,277 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.07 vs. limit=10.0 2023-11-26 18:19:00,994 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 18:19:02,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3512306.6666666665, ans=0.125 2023-11-26 18:19:22,558 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 18:19:27,289 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 9850, loss[loss=0.06359, simple_loss=0.08549, pruned_loss=0.01245, audio_tagging_loss=0.008399, over 15391.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.09015, pruned_loss=0.01246, audio_tagging_loss=0.008476, over 3044552.94 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:19:40,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3512573.3333333335, ans=0.125 2023-11-26 18:19:42,307 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3512573.3333333335, ans=0.0 2023-11-26 18:19:50,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3512640.0, ans=0.125 2023-11-26 18:19:52,217 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 526900 2023-11-26 18:20:02,472 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.49 vs. limit=15.0 2023-11-26 18:20:03,150 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.82 vs. limit=15.0 2023-11-26 18:20:18,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3512773.3333333335, ans=0.125 2023-11-26 18:20:22,070 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.59 vs. limit=12.0 2023-11-26 18:20:23,496 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 9900, loss[loss=0.06056, simple_loss=0.07302, pruned_loss=0.01199, audio_tagging_loss=0.01206, over 15439.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.08956, pruned_loss=0.01241, audio_tagging_loss=0.008567, over 3041805.42 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:20:37,752 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.419e+01 8.893e+01 9.553e+01 1.033e+02 1.550e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-26 18:20:47,270 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 526950 2023-11-26 18:20:54,837 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3512973.3333333335, ans=0.125 2023-11-26 18:21:05,254 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.55 vs. limit=10.0 2023-11-26 18:21:05,315 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.00 vs. limit=15.0 2023-11-26 18:21:19,082 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 9950, loss[loss=0.07307, simple_loss=0.1055, pruned_loss=0.0131, audio_tagging_loss=0.007202, over 14695.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08917, pruned_loss=0.01235, audio_tagging_loss=0.008595, over 3041096.01 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:21:44,400 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 527000 2023-11-26 18:22:15,649 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 10000, loss[loss=0.05952, simple_loss=0.07525, pruned_loss=0.00966, audio_tagging_loss=0.01224, over 14670.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.08891, pruned_loss=0.01224, audio_tagging_loss=0.008619, over 3040695.84 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 18:22:25,935 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3513573.3333333335, ans=0.125 2023-11-26 18:22:27,229 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.89 vs. limit=15.0 2023-11-26 18:22:29,971 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.250e+01 8.698e+01 9.339e+01 1.006e+02 1.184e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-26 18:22:35,598 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3513573.3333333335, ans=0.0 2023-11-26 18:22:39,659 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 527050 2023-11-26 18:22:50,564 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.83 vs. limit=6.0 2023-11-26 18:22:51,135 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3513706.6666666665, ans=0.2 2023-11-26 18:23:05,295 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3513773.3333333335, ans=0.015 2023-11-26 18:23:11,576 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 10050, loss[loss=0.06127, simple_loss=0.08699, pruned_loss=0.00861, audio_tagging_loss=0.009162, over 15105.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.08855, pruned_loss=0.01203, audio_tagging_loss=0.008648, over 3037450.10 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 18:23:14,915 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 18:23:14,977 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3513840.0, ans=0.125 2023-11-26 18:23:31,453 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3513906.6666666665, ans=0.0 2023-11-26 18:23:35,625 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 527100 2023-11-26 18:23:53,361 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3514040.0, ans=0.125 2023-11-26 18:23:56,974 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.74 vs. limit=22.5 2023-11-26 18:24:06,819 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 10100, loss[loss=0.05682, simple_loss=0.08528, pruned_loss=0.007379, audio_tagging_loss=0.006796, over 15999.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08897, pruned_loss=0.01193, audio_tagging_loss=0.008665, over 3045895.85 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 18:24:21,133 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.418e+01 8.926e+01 9.577e+01 1.044e+02 1.166e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-26 18:24:32,290 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 527150 2023-11-26 18:24:36,655 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3514306.6666666665, ans=0.0 2023-11-26 18:24:53,036 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3514440.0, ans=0.1 2023-11-26 18:24:53,874 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 18:25:02,354 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 10150, loss[loss=0.0809, simple_loss=0.1132, pruned_loss=0.01503, audio_tagging_loss=0.009281, over 15243.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.0894, pruned_loss=0.01202, audio_tagging_loss=0.008658, over 3045396.80 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:25:09,681 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.45 vs. limit=15.0 2023-11-26 18:25:12,326 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3514506.6666666665, ans=0.1 2023-11-26 18:25:15,514 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3514573.3333333335, ans=0.125 2023-11-26 18:25:16,532 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3514573.3333333335, ans=0.1 2023-11-26 18:25:17,669 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3514573.3333333335, ans=0.1 2023-11-26 18:25:26,107 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3514640.0, ans=0.2 2023-11-26 18:25:26,958 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 527200 2023-11-26 18:25:30,456 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 18:25:34,922 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=3514706.6666666665, ans=0.02 2023-11-26 18:25:52,598 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3514773.3333333335, ans=0.0 2023-11-26 18:25:58,624 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 10200, loss[loss=0.06309, simple_loss=0.09023, pruned_loss=0.01141, audio_tagging_loss=0.006559, over 15880.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.08919, pruned_loss=0.01197, audio_tagging_loss=0.00875, over 3042152.14 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:26:04,055 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=3514840.0, ans=0.02 2023-11-26 18:26:13,390 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.250e+01 9.101e+01 9.803e+01 1.043e+02 1.180e+02, threshold=1.961e+02, percent-clipped=0.0 2023-11-26 18:26:13,663 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3514906.6666666665, ans=0.0 2023-11-26 18:26:19,722 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.76 vs. limit=15.0 2023-11-26 18:26:21,416 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 18:26:21,604 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3514973.3333333335, ans=0.0 2023-11-26 18:26:22,527 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 527250 2023-11-26 18:26:36,877 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.36 vs. limit=12.0 2023-11-26 18:26:40,113 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.74 vs. limit=6.0 2023-11-26 18:26:53,055 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 10250, loss[loss=0.07418, simple_loss=0.1001, pruned_loss=0.01366, audio_tagging_loss=0.01048, over 15223.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.09027, pruned_loss=0.01214, audio_tagging_loss=0.008752, over 3045986.00 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:26:59,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3515173.3333333335, ans=0.0 2023-11-26 18:27:18,156 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 527300 2023-11-26 18:27:22,557 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3515306.6666666665, ans=0.0 2023-11-26 18:27:27,089 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3515373.3333333335, ans=0.025 2023-11-26 18:27:34,217 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3515373.3333333335, ans=0.125 2023-11-26 18:27:39,771 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3515440.0, ans=0.0 2023-11-26 18:27:47,735 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3515506.6666666665, ans=0.1 2023-11-26 18:27:48,580 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 10300, loss[loss=0.06704, simple_loss=0.08705, pruned_loss=0.01443, audio_tagging_loss=0.009084, over 15798.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.0895, pruned_loss=0.01208, audio_tagging_loss=0.008833, over 3049099.44 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 8.0 2023-11-26 18:27:51,995 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3515506.6666666665, ans=0.125 2023-11-26 18:28:00,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3515573.3333333335, ans=0.0 2023-11-26 18:28:05,633 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.445e+01 8.902e+01 9.462e+01 1.025e+02 1.207e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-26 18:28:06,409 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.54 vs. limit=22.5 2023-11-26 18:28:13,127 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 527350 2023-11-26 18:28:19,596 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3515640.0, ans=0.125 2023-11-26 18:28:27,044 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3515706.6666666665, ans=0.125 2023-11-26 18:28:29,248 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.85 vs. limit=15.0 2023-11-26 18:28:45,128 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 10350, loss[loss=0.06114, simple_loss=0.08204, pruned_loss=0.01145, audio_tagging_loss=0.008675, over 15318.00 frames. ], tot_loss[loss=0.066, simple_loss=0.08987, pruned_loss=0.01211, audio_tagging_loss=0.00896, over 3050918.29 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 8.0 2023-11-26 18:28:56,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3515906.6666666665, ans=0.125 2023-11-26 18:29:03,559 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3515906.6666666665, ans=0.2 2023-11-26 18:29:06,648 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3515973.3333333335, ans=0.125 2023-11-26 18:29:08,709 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 527400 2023-11-26 18:29:15,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3515973.3333333335, ans=0.0 2023-11-26 18:29:18,704 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3516040.0, ans=0.125 2023-11-26 18:29:36,491 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3516106.6666666665, ans=0.0 2023-11-26 18:29:36,613 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3516106.6666666665, ans=0.125 2023-11-26 18:29:40,532 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 10400, loss[loss=0.06968, simple_loss=0.09469, pruned_loss=0.01486, audio_tagging_loss=0.007478, over 14379.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.08986, pruned_loss=0.01219, audio_tagging_loss=0.009015, over 3048276.87 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:29:57,020 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.796e+01 9.067e+01 9.465e+01 1.042e+02 1.301e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 18:30:05,658 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 527450 2023-11-26 18:30:05,853 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3516306.6666666665, ans=0.0 2023-11-26 18:30:09,713 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.91 vs. limit=15.0 2023-11-26 18:30:13,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3516373.3333333335, ans=0.0 2023-11-26 18:30:23,739 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.20 vs. limit=15.0 2023-11-26 18:30:29,787 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3516440.0, ans=0.2 2023-11-26 18:30:35,793 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 10450, loss[loss=0.05625, simple_loss=0.0742, pruned_loss=0.009636, audio_tagging_loss=0.009519, over 15452.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.08923, pruned_loss=0.01215, audio_tagging_loss=0.008978, over 3045057.77 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:31:01,431 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 527500 2023-11-26 18:31:02,590 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3516640.0, ans=0.125 2023-11-26 18:31:13,119 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3516706.6666666665, ans=0.125 2023-11-26 18:31:13,583 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.66 vs. limit=15.0 2023-11-26 18:31:18,486 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3516706.6666666665, ans=0.125 2023-11-26 18:31:21,065 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.36 vs. limit=15.0 2023-11-26 18:31:26,991 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3516773.3333333335, ans=0.125 2023-11-26 18:31:33,191 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 10500, loss[loss=0.05713, simple_loss=0.07293, pruned_loss=0.009457, audio_tagging_loss=0.01121, over 15159.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08936, pruned_loss=0.01226, audio_tagging_loss=0.008839, over 3051879.39 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:31:33,469 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3516840.0, ans=0.2 2023-11-26 18:31:39,682 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=3516840.0, ans=0.025 2023-11-26 18:31:48,953 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.579e+01 8.975e+01 9.604e+01 1.035e+02 1.568e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-26 18:31:56,520 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 527550 2023-11-26 18:32:27,954 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 10550, loss[loss=0.06047, simple_loss=0.08958, pruned_loss=0.008907, audio_tagging_loss=0.006774, over 15601.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.0887, pruned_loss=0.01201, audio_tagging_loss=0.008797, over 3054711.49 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:32:35,424 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3517173.3333333335, ans=0.125 2023-11-26 18:32:40,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3517240.0, ans=0.125 2023-11-26 18:32:42,125 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.53 vs. limit=15.0 2023-11-26 18:32:52,371 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 527600 2023-11-26 18:32:54,433 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.73 vs. limit=22.5 2023-11-26 18:33:02,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3517373.3333333335, ans=0.2 2023-11-26 18:33:06,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3517373.3333333335, ans=0.125 2023-11-26 18:33:09,215 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=3517373.3333333335, ans=22.5 2023-11-26 18:33:17,171 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3517440.0, ans=0.125 2023-11-26 18:33:18,285 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3517440.0, ans=0.0 2023-11-26 18:33:23,267 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 10600, loss[loss=0.07318, simple_loss=0.09479, pruned_loss=0.01817, audio_tagging_loss=0.007617, over 15157.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08914, pruned_loss=0.01205, audio_tagging_loss=0.008729, over 3053714.45 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:33:41,550 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.914e+01 8.748e+01 9.260e+01 9.948e+01 1.249e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-26 18:33:49,062 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 527650 2023-11-26 18:33:50,393 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3517640.0, ans=0.0 2023-11-26 18:34:00,034 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3517706.6666666665, ans=0.0 2023-11-26 18:34:02,400 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.16 vs. limit=15.0 2023-11-26 18:34:09,466 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3517773.3333333335, ans=0.035 2023-11-26 18:34:20,600 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 10650, loss[loss=0.06442, simple_loss=0.09168, pruned_loss=0.01045, audio_tagging_loss=0.008122, over 15798.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08958, pruned_loss=0.01208, audio_tagging_loss=0.008574, over 3055808.45 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:34:28,641 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3517840.0, ans=0.125 2023-11-26 18:34:28,812 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3517840.0, ans=0.125 2023-11-26 18:34:44,374 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 527700 2023-11-26 18:34:49,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3517973.3333333335, ans=0.0 2023-11-26 18:35:02,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3518040.0, ans=0.0 2023-11-26 18:35:11,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3518106.6666666665, ans=0.125 2023-11-26 18:35:16,067 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 10700, loss[loss=0.05546, simple_loss=0.07405, pruned_loss=0.008132, audio_tagging_loss=0.0103, over 16332.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08947, pruned_loss=0.01208, audio_tagging_loss=0.008552, over 3050954.96 frames. ], batch size: 61, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:35:16,227 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3518173.3333333335, ans=0.125 2023-11-26 18:35:18,937 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.65 vs. limit=15.0 2023-11-26 18:35:20,942 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.60 vs. limit=6.0 2023-11-26 18:35:23,212 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.13 vs. limit=15.0 2023-11-26 18:35:30,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3518240.0, ans=0.125 2023-11-26 18:35:32,091 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.164e+01 8.938e+01 9.482e+01 1.003e+02 1.497e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-26 18:35:36,751 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3518306.6666666665, ans=0.5 2023-11-26 18:35:40,318 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 527750 2023-11-26 18:35:44,159 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3518306.6666666665, ans=0.125 2023-11-26 18:35:51,795 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 18:36:11,558 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 10750, loss[loss=0.05889, simple_loss=0.0877, pruned_loss=0.008189, audio_tagging_loss=0.006847, over 15657.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.09005, pruned_loss=0.01217, audio_tagging_loss=0.008581, over 3050486.38 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:36:19,220 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3518506.6666666665, ans=0.125 2023-11-26 18:36:32,132 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3518573.3333333335, ans=0.125 2023-11-26 18:36:37,321 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 527800 2023-11-26 18:37:08,283 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 10800, loss[loss=0.07727, simple_loss=0.1067, pruned_loss=0.01516, audio_tagging_loss=0.008762, over 16510.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08984, pruned_loss=0.01211, audio_tagging_loss=0.008572, over 3056680.51 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 18:37:24,569 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3518906.6666666665, ans=0.0 2023-11-26 18:37:25,433 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.531e+01 8.760e+01 9.363e+01 9.951e+01 1.169e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-26 18:37:25,662 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3518906.6666666665, ans=0.125 2023-11-26 18:37:33,057 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 527850 2023-11-26 18:38:04,874 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 10850, loss[loss=0.0633, simple_loss=0.08895, pruned_loss=0.01186, audio_tagging_loss=0.006961, over 14861.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.08929, pruned_loss=0.01212, audio_tagging_loss=0.008623, over 3046763.55 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:38:12,882 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.14 vs. limit=15.0 2023-11-26 18:38:19,311 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.37 vs. limit=6.0 2023-11-26 18:38:28,355 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 527900 2023-11-26 18:38:30,054 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.73 vs. limit=22.5 2023-11-26 18:38:33,389 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.18 vs. limit=15.0 2023-11-26 18:38:47,673 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3519373.3333333335, ans=0.0 2023-11-26 18:38:50,081 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.58 vs. limit=22.5 2023-11-26 18:38:59,177 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 18:39:00,248 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 10900, loss[loss=0.04864, simple_loss=0.06166, pruned_loss=0.007171, audio_tagging_loss=0.01064, over 14816.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08915, pruned_loss=0.01197, audio_tagging_loss=0.008666, over 3042788.55 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:39:06,734 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3519506.6666666665, ans=0.1 2023-11-26 18:39:18,334 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.607e+01 8.701e+01 9.472e+01 1.014e+02 1.998e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-26 18:39:20,742 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.89 vs. limit=15.0 2023-11-26 18:39:25,359 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 527950 2023-11-26 18:39:35,754 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.69 vs. limit=10.0 2023-11-26 18:39:47,102 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3519773.3333333335, ans=0.125 2023-11-26 18:39:52,835 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.57 vs. limit=15.0 2023-11-26 18:39:55,869 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 10950, loss[loss=0.06439, simple_loss=0.08536, pruned_loss=0.01347, audio_tagging_loss=0.008239, over 15395.00 frames. ], tot_loss[loss=0.065, simple_loss=0.08843, pruned_loss=0.01201, audio_tagging_loss=0.008769, over 3040757.85 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:40:05,816 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3519840.0, ans=0.2 2023-11-26 18:40:08,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3519906.6666666665, ans=0.0 2023-11-26 18:40:16,906 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3519906.6666666665, ans=0.0 2023-11-26 18:40:21,094 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 528000 2023-11-26 18:40:31,645 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.26 vs. limit=15.0 2023-11-26 18:40:32,166 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3520040.0, ans=0.125 2023-11-26 18:40:41,202 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3520040.0, ans=0.125 2023-11-26 18:40:54,838 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 11000, loss[loss=0.07746, simple_loss=0.1103, pruned_loss=0.01372, audio_tagging_loss=0.008602, over 15667.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08926, pruned_loss=0.01198, audio_tagging_loss=0.008776, over 3045006.26 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:40:57,821 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3520173.3333333335, ans=0.125 2023-11-26 18:41:06,049 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 18:41:12,352 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.170e+01 8.767e+01 9.294e+01 1.014e+02 1.282e+02, threshold=1.859e+02, percent-clipped=1.0 2023-11-26 18:41:17,979 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3520306.6666666665, ans=0.125 2023-11-26 18:41:18,921 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 528050 2023-11-26 18:41:25,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3520306.6666666665, ans=0.2 2023-11-26 18:41:46,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3520440.0, ans=0.0 2023-11-26 18:41:47,977 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.59 vs. limit=22.5 2023-11-26 18:41:50,613 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 11050, loss[loss=0.05686, simple_loss=0.07417, pruned_loss=0.009067, audio_tagging_loss=0.01071, over 14319.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.0888, pruned_loss=0.01185, audio_tagging_loss=0.008946, over 3053162.18 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:42:11,194 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3520573.3333333335, ans=0.2 2023-11-26 18:42:15,717 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 528100 2023-11-26 18:42:19,320 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.28 vs. limit=15.0 2023-11-26 18:42:23,836 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3520706.6666666665, ans=0.125 2023-11-26 18:42:24,403 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.06 vs. limit=6.0 2023-11-26 18:42:27,027 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3520706.6666666665, ans=0.125 2023-11-26 18:42:28,488 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.96 vs. limit=15.0 2023-11-26 18:42:30,368 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3520706.6666666665, ans=0.0 2023-11-26 18:42:31,428 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3520706.6666666665, ans=0.125 2023-11-26 18:42:46,073 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 11100, loss[loss=0.07114, simple_loss=0.09649, pruned_loss=0.01401, audio_tagging_loss=0.008884, over 14858.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08924, pruned_loss=0.01193, audio_tagging_loss=0.008938, over 3050678.56 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:43:04,538 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.363e+01 9.164e+01 1.008e+02 1.089e+02 1.427e+02, threshold=2.015e+02, percent-clipped=0.0 2023-11-26 18:43:11,699 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 528150 2023-11-26 18:43:11,843 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3520973.3333333335, ans=0.125 2023-11-26 18:43:43,207 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 11150, loss[loss=0.06207, simple_loss=0.08219, pruned_loss=0.009727, audio_tagging_loss=0.01125, over 16320.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.08985, pruned_loss=0.01209, audio_tagging_loss=0.009079, over 3050838.40 frames. ], batch size: 62, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:43:44,530 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3521173.3333333335, ans=0.0 2023-11-26 18:44:00,888 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3521240.0, ans=0.125 2023-11-26 18:44:07,159 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 528200 2023-11-26 18:44:08,712 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=3521306.6666666665, ans=15.0 2023-11-26 18:44:39,047 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 11200, loss[loss=0.06279, simple_loss=0.09072, pruned_loss=0.00952, audio_tagging_loss=0.007904, over 16167.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.09001, pruned_loss=0.01214, audio_tagging_loss=0.009138, over 3051683.59 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:44:48,105 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.55 vs. limit=22.5 2023-11-26 18:44:58,207 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.556e+01 8.872e+01 9.383e+01 1.014e+02 1.200e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-26 18:45:01,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3521640.0, ans=0.125 2023-11-26 18:45:04,149 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 528250 2023-11-26 18:45:34,288 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 11250, loss[loss=0.06382, simple_loss=0.08108, pruned_loss=0.01363, audio_tagging_loss=0.009649, over 15860.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08886, pruned_loss=0.01193, audio_tagging_loss=0.009106, over 3051626.33 frames. ], batch size: 62, lr: 1.53e-03, grad_scale: 8.0 2023-11-26 18:45:40,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3521840.0, ans=0.125 2023-11-26 18:45:49,036 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3521906.6666666665, ans=0.1 2023-11-26 18:45:59,610 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 528300 2023-11-26 18:46:00,902 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3521973.3333333335, ans=0.125 2023-11-26 18:46:17,107 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3522040.0, ans=0.0 2023-11-26 18:46:31,393 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 11300, loss[loss=0.05934, simple_loss=0.0886, pruned_loss=0.008583, audio_tagging_loss=0.006458, over 14486.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08941, pruned_loss=0.01209, audio_tagging_loss=0.008857, over 3044281.13 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 8.0 2023-11-26 18:46:34,198 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.32 vs. limit=15.0 2023-11-26 18:46:38,017 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3522173.3333333335, ans=0.0 2023-11-26 18:46:43,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3522240.0, ans=0.125 2023-11-26 18:46:45,550 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3522240.0, ans=0.07 2023-11-26 18:46:51,050 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.700e+01 8.924e+01 9.340e+01 1.002e+02 1.202e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-26 18:46:53,877 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.09 vs. limit=10.0 2023-11-26 18:46:55,416 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 528350 2023-11-26 18:46:56,683 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3522306.6666666665, ans=0.125 2023-11-26 18:47:01,929 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3522306.6666666665, ans=0.125 2023-11-26 18:47:05,758 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.20 vs. limit=15.0 2023-11-26 18:47:24,593 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3522440.0, ans=0.0 2023-11-26 18:47:26,496 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 11350, loss[loss=0.06783, simple_loss=0.09364, pruned_loss=0.01354, audio_tagging_loss=0.007472, over 15642.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08975, pruned_loss=0.0122, audio_tagging_loss=0.008697, over 3042089.91 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 8.0 2023-11-26 18:47:32,561 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3522506.6666666665, ans=0.125 2023-11-26 18:47:33,578 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3522506.6666666665, ans=0.125 2023-11-26 18:47:34,670 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 18:47:36,836 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3522573.3333333335, ans=0.0 2023-11-26 18:47:37,895 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3522573.3333333335, ans=0.125 2023-11-26 18:47:47,036 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3522573.3333333335, ans=0.125 2023-11-26 18:47:51,566 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 528400 2023-11-26 18:48:06,328 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3522706.6666666665, ans=0.0 2023-11-26 18:48:20,752 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3522773.3333333335, ans=0.125 2023-11-26 18:48:22,540 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 11400, loss[loss=0.07553, simple_loss=0.1069, pruned_loss=0.01743, audio_tagging_loss=0.004665, over 14406.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.09057, pruned_loss=0.01242, audio_tagging_loss=0.008628, over 3039097.12 frames. ], batch size: 53, lr: 1.53e-03, grad_scale: 8.0 2023-11-26 18:48:43,230 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.848e+01 9.001e+01 9.594e+01 1.048e+02 1.378e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-26 18:48:47,594 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 528450 2023-11-26 18:48:51,982 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3522973.3333333335, ans=0.0 2023-11-26 18:49:06,228 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3523106.6666666665, ans=0.2 2023-11-26 18:49:10,549 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3523106.6666666665, ans=0.0 2023-11-26 18:49:17,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=3523106.6666666665, ans=0.02 2023-11-26 18:49:19,512 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 11450, loss[loss=0.04847, simple_loss=0.05697, pruned_loss=0.009822, audio_tagging_loss=0.01016, over 15730.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.09014, pruned_loss=0.01234, audio_tagging_loss=0.008662, over 3035199.05 frames. ], batch size: 61, lr: 1.53e-03, grad_scale: 8.0 2023-11-26 18:49:42,824 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 528500 2023-11-26 18:50:07,425 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 18:50:14,551 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 11500, loss[loss=0.04943, simple_loss=0.06766, pruned_loss=0.006747, audio_tagging_loss=0.008855, over 15334.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.09011, pruned_loss=0.01229, audio_tagging_loss=0.008699, over 3040691.89 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 8.0 2023-11-26 18:50:17,870 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3523506.6666666665, ans=0.125 2023-11-26 18:50:34,141 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.429e+01 8.724e+01 9.278e+01 1.017e+02 1.417e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-26 18:50:39,584 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 528550 2023-11-26 18:51:05,792 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3523773.3333333335, ans=0.2 2023-11-26 18:51:09,837 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 11550, loss[loss=0.05636, simple_loss=0.07512, pruned_loss=0.00823, audio_tagging_loss=0.01056, over 15141.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.09012, pruned_loss=0.01227, audio_tagging_loss=0.008651, over 3048567.26 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 8.0 2023-11-26 18:51:22,262 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 18:51:23,311 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3523906.6666666665, ans=0.0 2023-11-26 18:51:26,080 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3523906.6666666665, ans=0.2 2023-11-26 18:51:29,253 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3523906.6666666665, ans=0.05 2023-11-26 18:51:35,405 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 528600 2023-11-26 18:51:47,424 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 18:51:53,085 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3524040.0, ans=0.125 2023-11-26 18:52:07,237 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 11600, loss[loss=0.0839, simple_loss=0.1172, pruned_loss=0.01883, audio_tagging_loss=0.006464, over 15449.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.0907, pruned_loss=0.0124, audio_tagging_loss=0.008611, over 3050193.54 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:52:10,470 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.65 vs. limit=15.0 2023-11-26 18:52:26,814 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.281e+01 8.975e+01 9.642e+01 1.029e+02 1.280e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-26 18:52:28,109 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3524306.6666666665, ans=0.0 2023-11-26 18:52:31,139 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 528650 2023-11-26 18:52:37,625 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3524306.6666666665, ans=0.125 2023-11-26 18:52:45,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3524373.3333333335, ans=0.1 2023-11-26 18:52:54,678 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3524440.0, ans=0.1 2023-11-26 18:52:56,784 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3524440.0, ans=0.2 2023-11-26 18:53:02,902 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 11650, loss[loss=0.05832, simple_loss=0.08512, pruned_loss=0.008022, audio_tagging_loss=0.007739, over 15017.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.09051, pruned_loss=0.01238, audio_tagging_loss=0.008581, over 3039451.64 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:53:03,250 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3524506.6666666665, ans=0.0 2023-11-26 18:53:14,833 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3524573.3333333335, ans=0.125 2023-11-26 18:53:27,501 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 528700 2023-11-26 18:53:27,664 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3524640.0, ans=0.0 2023-11-26 18:53:57,959 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 11700, loss[loss=0.07593, simple_loss=0.107, pruned_loss=0.01576, audio_tagging_loss=0.006682, over 15683.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.08976, pruned_loss=0.01227, audio_tagging_loss=0.008628, over 3041734.38 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:54:00,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3524840.0, ans=0.0 2023-11-26 18:54:00,354 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3524840.0, ans=0.0 2023-11-26 18:54:03,824 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.54 vs. limit=22.5 2023-11-26 18:54:05,662 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3524840.0, ans=0.125 2023-11-26 18:54:19,351 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.034e+01 9.030e+01 9.498e+01 1.025e+02 1.677e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-26 18:54:20,709 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3524973.3333333335, ans=0.125 2023-11-26 18:54:23,706 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 528750 2023-11-26 18:54:25,940 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3524973.3333333335, ans=0.125 2023-11-26 18:54:26,236 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.64 vs. limit=15.0 2023-11-26 18:54:30,143 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 18:54:36,861 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.92 vs. limit=15.0 2023-11-26 18:54:55,275 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 11750, loss[loss=0.07298, simple_loss=0.09963, pruned_loss=0.01307, audio_tagging_loss=0.0101, over 14521.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.09047, pruned_loss=0.01232, audio_tagging_loss=0.008663, over 3041207.78 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:54:56,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3525173.3333333335, ans=0.125 2023-11-26 18:55:01,450 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.17 vs. limit=22.5 2023-11-26 18:55:04,721 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.56 vs. limit=22.5 2023-11-26 18:55:11,914 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 18:55:19,257 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 528800 2023-11-26 18:55:32,240 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3525373.3333333335, ans=0.125 2023-11-26 18:55:47,550 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.47 vs. limit=22.5 2023-11-26 18:55:51,222 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 11800, loss[loss=0.04371, simple_loss=0.06025, pruned_loss=0.005508, audio_tagging_loss=0.00808, over 16574.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09094, pruned_loss=0.01244, audio_tagging_loss=0.008647, over 3046157.93 frames. ], batch size: 62, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:56:05,332 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3525573.3333333335, ans=0.0 2023-11-26 18:56:10,347 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.664e+01 8.967e+01 9.583e+01 1.033e+02 1.275e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-26 18:56:14,756 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 528850 2023-11-26 18:56:46,564 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 11850, loss[loss=0.06782, simple_loss=0.0903, pruned_loss=0.01435, audio_tagging_loss=0.008316, over 15501.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.09029, pruned_loss=0.01236, audio_tagging_loss=0.008763, over 3047121.79 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:56:46,740 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3525840.0, ans=0.2 2023-11-26 18:57:12,170 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 528900 2023-11-26 18:57:42,552 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 11900, loss[loss=0.07012, simple_loss=0.09603, pruned_loss=0.0137, audio_tagging_loss=0.008407, over 15973.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.09026, pruned_loss=0.01246, audio_tagging_loss=0.008869, over 3047175.12 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:57:58,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3526240.0, ans=0.1 2023-11-26 18:58:01,131 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.52 vs. limit=15.0 2023-11-26 18:58:02,718 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.524e+01 8.907e+01 9.565e+01 1.014e+02 1.302e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-26 18:58:02,885 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3526240.0, ans=0.125 2023-11-26 18:58:07,181 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 528950 2023-11-26 18:58:15,878 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3526373.3333333335, ans=0.2 2023-11-26 18:58:28,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3526440.0, ans=0.125 2023-11-26 18:58:39,287 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 11950, loss[loss=0.04401, simple_loss=0.06081, pruned_loss=0.004207, audio_tagging_loss=0.009393, over 14492.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.08898, pruned_loss=0.01223, audio_tagging_loss=0.009101, over 3040990.85 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-26 18:58:48,123 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3526506.6666666665, ans=0.0 2023-11-26 18:59:02,672 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 529000 2023-11-26 18:59:22,111 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3526706.6666666665, ans=0.0 2023-11-26 18:59:29,287 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3526773.3333333335, ans=0.125 2023-11-26 18:59:31,499 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.47 vs. limit=15.0 2023-11-26 18:59:34,108 INFO [train_asr.py:1235] (3/4) Epoch 44, batch 12000, loss[loss=0.0565, simple_loss=0.07033, pruned_loss=0.00948, audio_tagging_loss=0.01186, over 15012.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08868, pruned_loss=0.01219, audio_tagging_loss=0.009263, over 3042648.45 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 32.0 2023-11-26 18:59:34,109 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-26 18:59:46,639 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.1637, 3.5149, 3.5234, 3.2210], device='cuda:3') 2023-11-26 18:59:55,286 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([3.9706, 3.1637, 2.9057, 3.1030, 3.4102, 2.8675, 3.4414, 2.6534], device='cuda:3') 2023-11-26 19:00:06,913 INFO [train_asr.py:1267] (3/4) Epoch 44, validation: loss=0.05801, simple_loss=0.05056, pruned_loss=0.005309, audio_tagging_loss=0.02742, over 4681554.00 frames. 2023-11-26 19:00:06,913 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-26 19:00:13,232 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3526840.0, ans=0.95 2023-11-26 19:00:19,344 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3526906.6666666665, ans=0.07 2023-11-26 19:00:19,356 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3526906.6666666665, ans=0.07 2023-11-26 19:00:25,342 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.461e+01 8.909e+01 9.466e+01 1.042e+02 1.234e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 19:00:29,517 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 529050 2023-11-26 19:00:30,588 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3526973.3333333335, ans=0.125 2023-11-26 19:01:05,879 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 0, loss[loss=0.07473, simple_loss=0.09043, pruned_loss=0.01068, audio_tagging_loss=0.01884, over 15663.00 frames. ], tot_loss[loss=0.07473, simple_loss=0.09043, pruned_loss=0.01068, audio_tagging_loss=0.01884, over 15663.00 frames. ], batch size: 58, lr: 1.51e-03, grad_scale: 32.0 2023-11-26 19:01:05,880 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-26 19:01:37,704 INFO [train_asr.py:1267] (3/4) Epoch 45, validation: loss=0.05755, simple_loss=0.05055, pruned_loss=0.005302, audio_tagging_loss=0.02697, over 4681554.00 frames. 2023-11-26 19:01:37,705 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-26 19:01:44,579 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.71 vs. limit=22.5 2023-11-26 19:01:57,059 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3527080.0, ans=0.125 2023-11-26 19:02:05,022 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=3527146.6666666665, ans=0.025 2023-11-26 19:02:15,135 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.22 vs. limit=10.0 2023-11-26 19:02:24,730 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3527280.0, ans=0.125 2023-11-26 19:02:26,334 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.17 vs. limit=15.0 2023-11-26 19:02:28,770 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 529100 2023-11-26 19:02:32,908 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 50, loss[loss=0.07526, simple_loss=0.09882, pruned_loss=0.009377, audio_tagging_loss=0.01647, over 16191.00 frames. ], tot_loss[loss=0.07307, simple_loss=0.08852, pruned_loss=0.01201, audio_tagging_loss=0.01681, over 690607.70 frames. ], batch size: 59, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:02:36,297 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3527346.6666666665, ans=0.125 2023-11-26 19:02:45,264 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3527413.3333333335, ans=0.2 2023-11-26 19:03:08,405 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.33 vs. limit=15.0 2023-11-26 19:03:12,390 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3527546.6666666665, ans=0.07 2023-11-26 19:03:14,372 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3527546.6666666665, ans=0.125 2023-11-26 19:03:20,571 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.836e+01 9.859e+01 1.043e+02 1.139e+02 1.375e+02, threshold=2.086e+02, percent-clipped=0.0 2023-11-26 19:03:23,786 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 529150 2023-11-26 19:03:28,436 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 100, loss[loss=0.0744, simple_loss=0.09607, pruned_loss=0.01172, audio_tagging_loss=0.01465, over 15641.00 frames. ], tot_loss[loss=0.0743, simple_loss=0.09163, pruned_loss=0.01253, audio_tagging_loss=0.01595, over 1216187.16 frames. ], batch size: 59, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:03:38,668 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3527746.6666666665, ans=0.0 2023-11-26 19:03:45,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3527746.6666666665, ans=0.125 2023-11-26 19:03:47,652 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3527746.6666666665, ans=0.125 2023-11-26 19:03:47,653 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3527746.6666666665, ans=0.125 2023-11-26 19:03:49,701 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 19:04:01,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3527880.0, ans=0.125 2023-11-26 19:04:02,880 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.68 vs. limit=15.0 2023-11-26 19:04:03,891 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.17 vs. limit=10.0 2023-11-26 19:04:12,568 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3527946.6666666665, ans=0.125 2023-11-26 19:04:19,250 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 529200 2023-11-26 19:04:23,713 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 150, loss[loss=0.07962, simple_loss=0.106, pruned_loss=0.01577, audio_tagging_loss=0.01088, over 15324.00 frames. ], tot_loss[loss=0.07247, simple_loss=0.09171, pruned_loss=0.01244, audio_tagging_loss=0.01418, over 1622636.78 frames. ], batch size: 53, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:04:25,482 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.73 vs. limit=10.0 2023-11-26 19:04:38,688 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.84 vs. limit=15.0 2023-11-26 19:04:50,114 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3528146.6666666665, ans=0.1 2023-11-26 19:04:52,124 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=3528146.6666666665, ans=22.5 2023-11-26 19:04:58,624 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3528213.3333333335, ans=0.025 2023-11-26 19:05:09,876 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3528280.0, ans=0.1 2023-11-26 19:05:11,742 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.392e+01 9.212e+01 9.845e+01 1.053e+02 1.367e+02, threshold=1.969e+02, percent-clipped=0.0 2023-11-26 19:05:15,008 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 529250 2023-11-26 19:05:18,260 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3528346.6666666665, ans=0.125 2023-11-26 19:05:19,255 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 200, loss[loss=0.06084, simple_loss=0.08477, pruned_loss=0.01017, audio_tagging_loss=0.008281, over 15054.00 frames. ], tot_loss[loss=0.07015, simple_loss=0.09073, pruned_loss=0.01226, audio_tagging_loss=0.01252, over 1940824.38 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:05:30,067 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3528413.3333333335, ans=0.0 2023-11-26 19:05:35,884 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3528413.3333333335, ans=0.125 2023-11-26 19:05:57,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3528546.6666666665, ans=0.0 2023-11-26 19:05:59,297 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3528546.6666666665, ans=0.2 2023-11-26 19:06:06,682 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3528613.3333333335, ans=0.125 2023-11-26 19:06:09,745 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 529300 2023-11-26 19:06:09,872 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3528613.3333333335, ans=0.2 2023-11-26 19:06:13,963 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 250, loss[loss=0.06652, simple_loss=0.0879, pruned_loss=0.0118, audio_tagging_loss=0.01077, over 14874.00 frames. ], tot_loss[loss=0.06869, simple_loss=0.09007, pruned_loss=0.0123, audio_tagging_loss=0.01136, over 2190149.96 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:06:16,918 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3528680.0, ans=0.1 2023-11-26 19:06:23,968 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.62 vs. limit=15.0 2023-11-26 19:06:59,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3528946.6666666665, ans=0.0 2023-11-26 19:07:01,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3528946.6666666665, ans=0.2 2023-11-26 19:07:01,945 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.095e+01 9.038e+01 9.703e+01 1.049e+02 1.454e+02, threshold=1.941e+02, percent-clipped=0.0 2023-11-26 19:07:05,760 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 529350 2023-11-26 19:07:09,937 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 300, loss[loss=0.06531, simple_loss=0.09819, pruned_loss=0.009831, audio_tagging_loss=0.006388, over 15162.00 frames. ], tot_loss[loss=0.06849, simple_loss=0.09081, pruned_loss=0.01244, audio_tagging_loss=0.01065, over 2386298.69 frames. ], batch size: 59, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:07:20,344 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3529080.0, ans=0.1 2023-11-26 19:07:21,654 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.76 vs. limit=6.0 2023-11-26 19:07:23,571 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3529080.0, ans=0.125 2023-11-26 19:07:45,304 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3529213.3333333335, ans=0.125 2023-11-26 19:07:55,723 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3529280.0, ans=0.0 2023-11-26 19:07:58,821 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3529280.0, ans=0.0 2023-11-26 19:08:00,851 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 529400 2023-11-26 19:08:05,789 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 350, loss[loss=0.0661, simple_loss=0.09288, pruned_loss=0.01092, audio_tagging_loss=0.008738, over 16322.00 frames. ], tot_loss[loss=0.06814, simple_loss=0.09148, pruned_loss=0.01244, audio_tagging_loss=0.009954, over 2539757.01 frames. ], batch size: 60, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:08:09,574 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.45 vs. limit=10.0 2023-11-26 19:08:09,664 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.44 vs. limit=15.0 2023-11-26 19:08:12,394 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3529346.6666666665, ans=0.125 2023-11-26 19:08:15,567 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3529413.3333333335, ans=0.125 2023-11-26 19:08:17,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3529413.3333333335, ans=0.0 2023-11-26 19:08:21,076 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3529413.3333333335, ans=0.0 2023-11-26 19:08:27,012 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3529480.0, ans=0.0 2023-11-26 19:08:32,709 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3529480.0, ans=0.0 2023-11-26 19:08:32,742 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3529480.0, ans=0.125 2023-11-26 19:08:36,034 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3529480.0, ans=0.125 2023-11-26 19:08:53,266 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.461e+01 8.894e+01 9.566e+01 1.035e+02 1.216e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-26 19:08:56,492 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 529450 2023-11-26 19:09:00,675 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 400, loss[loss=0.05829, simple_loss=0.07053, pruned_loss=0.009683, audio_tagging_loss=0.01334, over 14773.00 frames. ], tot_loss[loss=0.06738, simple_loss=0.09069, pruned_loss=0.01234, audio_tagging_loss=0.009702, over 2649758.38 frames. ], batch size: 58, lr: 1.51e-03, grad_scale: 32.0 2023-11-26 19:09:22,039 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=3529746.6666666665, ans=15.0 2023-11-26 19:09:28,506 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.35 vs. limit=15.0 2023-11-26 19:09:52,817 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 529500 2023-11-26 19:09:56,846 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3530013.3333333335, ans=0.125 2023-11-26 19:09:57,567 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 450, loss[loss=0.06765, simple_loss=0.09968, pruned_loss=0.01178, audio_tagging_loss=0.006023, over 15455.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.09047, pruned_loss=0.01224, audio_tagging_loss=0.009434, over 2739463.99 frames. ], batch size: 57, lr: 1.51e-03, grad_scale: 32.0 2023-11-26 19:10:07,798 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.65 vs. limit=15.0 2023-11-26 19:10:24,311 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.15 vs. limit=22.5 2023-11-26 19:10:28,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3530146.6666666665, ans=0.125 2023-11-26 19:10:46,548 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.397e+01 8.546e+01 9.095e+01 1.009e+02 1.358e+02, threshold=1.819e+02, percent-clipped=0.0 2023-11-26 19:10:48,785 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 529550 2023-11-26 19:10:53,024 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 500, loss[loss=0.07462, simple_loss=0.1034, pruned_loss=0.01451, audio_tagging_loss=0.008423, over 14549.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.09054, pruned_loss=0.01237, audio_tagging_loss=0.009226, over 2808126.02 frames. ], batch size: 53, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:10:54,947 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.50 vs. limit=10.0 2023-11-26 19:11:01,076 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3530346.6666666665, ans=0.5 2023-11-26 19:11:14,862 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.13 vs. limit=15.0 2023-11-26 19:11:15,374 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3530480.0, ans=0.0 2023-11-26 19:11:29,456 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3530546.6666666665, ans=0.1 2023-11-26 19:11:41,314 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 19:11:44,893 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 529600 2023-11-26 19:11:49,389 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 550, loss[loss=0.06452, simple_loss=0.0841, pruned_loss=0.01308, audio_tagging_loss=0.00939, over 15095.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.08951, pruned_loss=0.01219, audio_tagging_loss=0.009103, over 2855243.56 frames. ], batch size: 57, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:11:49,665 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3530680.0, ans=10.0 2023-11-26 19:11:51,863 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3530680.0, ans=0.125 2023-11-26 19:12:01,864 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.59 vs. limit=15.0 2023-11-26 19:12:13,872 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3530813.3333333335, ans=0.0 2023-11-26 19:12:14,922 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3530813.3333333335, ans=0.2 2023-11-26 19:12:32,694 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3530880.0, ans=0.0 2023-11-26 19:12:38,569 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3530946.6666666665, ans=0.05 2023-11-26 19:12:39,403 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.657e+01 8.958e+01 9.518e+01 1.034e+02 1.414e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-26 19:12:41,689 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 529650 2023-11-26 19:12:45,645 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3531013.3333333335, ans=0.1 2023-11-26 19:12:46,986 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 600, loss[loss=0.06773, simple_loss=0.09959, pruned_loss=0.01003, audio_tagging_loss=0.007899, over 16097.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.08892, pruned_loss=0.01227, audio_tagging_loss=0.009104, over 2886403.47 frames. ], batch size: 59, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:12:54,435 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3531013.3333333335, ans=0.1 2023-11-26 19:13:10,521 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3531146.6666666665, ans=0.2 2023-11-26 19:13:13,260 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3531146.6666666665, ans=0.0 2023-11-26 19:13:13,304 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3531146.6666666665, ans=0.1 2023-11-26 19:13:31,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3531280.0, ans=0.125 2023-11-26 19:13:37,421 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 529700 2023-11-26 19:13:41,616 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 650, loss[loss=0.06727, simple_loss=0.09769, pruned_loss=0.01008, audio_tagging_loss=0.008345, over 16318.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08873, pruned_loss=0.01216, audio_tagging_loss=0.009066, over 2925508.46 frames. ], batch size: 59, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:13:46,028 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3531346.6666666665, ans=0.0 2023-11-26 19:13:48,302 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3531346.6666666665, ans=0.125 2023-11-26 19:13:53,717 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.88 vs. limit=22.5 2023-11-26 19:14:08,276 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3531480.0, ans=0.07 2023-11-26 19:14:29,926 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.866e+01 9.030e+01 9.559e+01 1.040e+02 1.405e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-26 19:14:32,232 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 529750 2023-11-26 19:14:35,477 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 19:14:36,341 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 700, loss[loss=0.05581, simple_loss=0.06926, pruned_loss=0.01023, audio_tagging_loss=0.01095, over 14697.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08864, pruned_loss=0.01215, audio_tagging_loss=0.00903, over 2950011.08 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:14:36,494 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3531680.0, ans=0.0 2023-11-26 19:14:41,370 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3531680.0, ans=0.025 2023-11-26 19:14:45,094 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3531680.0, ans=0.0 2023-11-26 19:14:50,403 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3531746.6666666665, ans=0.0 2023-11-26 19:15:15,470 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-26 19:15:27,964 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 529800 2023-11-26 19:15:32,450 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 750, loss[loss=0.0781, simple_loss=0.1052, pruned_loss=0.01603, audio_tagging_loss=0.009461, over 15712.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08923, pruned_loss=0.01227, audio_tagging_loss=0.008969, over 2973378.75 frames. ], batch size: 58, lr: 1.51e-03, grad_scale: 8.0 2023-11-26 19:15:55,514 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3532146.6666666665, ans=0.1 2023-11-26 19:16:02,326 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.30 vs. limit=6.0 2023-11-26 19:16:06,771 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3532213.3333333335, ans=0.0 2023-11-26 19:16:10,776 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.26 vs. limit=15.0 2023-11-26 19:16:22,632 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.575e+01 8.840e+01 9.480e+01 1.009e+02 1.765e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-26 19:16:23,784 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 529850 2023-11-26 19:16:24,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3532280.0, ans=0.0 2023-11-26 19:16:27,891 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 800, loss[loss=0.07308, simple_loss=0.09015, pruned_loss=0.01462, audio_tagging_loss=0.01339, over 16531.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.08978, pruned_loss=0.01232, audio_tagging_loss=0.008956, over 2996011.99 frames. ], batch size: 65, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:16:32,340 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3532346.6666666665, ans=0.0 2023-11-26 19:16:32,658 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.13 vs. limit=12.0 2023-11-26 19:16:40,865 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3532413.3333333335, ans=0.0 2023-11-26 19:16:44,379 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2023-11-26 19:17:09,689 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.57 vs. limit=15.0 2023-11-26 19:17:13,409 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3532613.3333333335, ans=0.2 2023-11-26 19:17:18,525 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 529900 2023-11-26 19:17:18,645 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3532613.3333333335, ans=0.125 2023-11-26 19:17:22,613 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 850, loss[loss=0.05276, simple_loss=0.07177, pruned_loss=0.00704, audio_tagging_loss=0.009832, over 14735.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.09042, pruned_loss=0.01233, audio_tagging_loss=0.008931, over 3013709.55 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:17:22,824 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3532680.0, ans=0.0 2023-11-26 19:17:29,190 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3532680.0, ans=0.125 2023-11-26 19:17:35,089 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3532746.6666666665, ans=0.125 2023-11-26 19:17:45,718 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3532813.3333333335, ans=0.125 2023-11-26 19:18:11,998 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.322e+01 9.004e+01 9.716e+01 1.045e+02 1.656e+02, threshold=1.943e+02, percent-clipped=0.0 2023-11-26 19:18:13,137 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 529950 2023-11-26 19:18:18,470 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 900, loss[loss=0.07813, simple_loss=0.1152, pruned_loss=0.01257, audio_tagging_loss=0.007941, over 16317.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.09041, pruned_loss=0.01243, audio_tagging_loss=0.009053, over 3015898.76 frames. ], batch size: 60, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:19:09,847 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 530000 2023-11-26 19:19:14,250 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 950, loss[loss=0.07432, simple_loss=0.1069, pruned_loss=0.01177, audio_tagging_loss=0.009091, over 15015.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.09067, pruned_loss=0.01244, audio_tagging_loss=0.008915, over 3025965.60 frames. ], batch size: 53, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:19:27,361 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3533413.3333333335, ans=0.2 2023-11-26 19:19:32,591 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3533413.3333333335, ans=0.0 2023-11-26 19:19:34,681 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3533480.0, ans=0.1 2023-11-26 19:19:44,388 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3533480.0, ans=0.1 2023-11-26 19:19:50,176 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3533546.6666666665, ans=0.125 2023-11-26 19:20:03,779 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.608e+01 8.669e+01 9.436e+01 1.000e+02 1.329e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-26 19:20:04,921 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 530050 2023-11-26 19:20:05,166 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3533613.3333333335, ans=0.125 2023-11-26 19:20:09,171 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 1000, loss[loss=0.07733, simple_loss=0.1111, pruned_loss=0.0129, audio_tagging_loss=0.008884, over 16202.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.08983, pruned_loss=0.01217, audio_tagging_loss=0.008848, over 3030779.11 frames. ], batch size: 60, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:20:21,732 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3533746.6666666665, ans=0.125 2023-11-26 19:20:31,256 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3533813.3333333335, ans=0.125 2023-11-26 19:20:31,261 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3533813.3333333335, ans=0.125 2023-11-26 19:20:33,248 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 19:20:33,724 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.11 vs. limit=6.0 2023-11-26 19:21:00,076 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 530100 2023-11-26 19:21:04,782 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 1050, loss[loss=0.06303, simple_loss=0.08905, pruned_loss=0.009911, audio_tagging_loss=0.008588, over 15936.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08948, pruned_loss=0.01217, audio_tagging_loss=0.008702, over 3036496.70 frames. ], batch size: 58, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:21:05,060 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3534013.3333333335, ans=0.125 2023-11-26 19:21:14,234 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3534013.3333333335, ans=0.125 2023-11-26 19:21:26,954 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3534146.6666666665, ans=0.125 2023-11-26 19:21:33,719 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.74 vs. limit=15.0 2023-11-26 19:21:34,217 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3534146.6666666665, ans=0.07 2023-11-26 19:21:39,517 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 19:21:48,541 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3534280.0, ans=0.0 2023-11-26 19:21:54,693 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.578e+01 8.897e+01 9.575e+01 1.026e+02 1.368e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-26 19:21:55,826 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 530150 2023-11-26 19:21:58,105 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3534280.0, ans=0.125 2023-11-26 19:22:00,002 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 1100, loss[loss=0.06773, simple_loss=0.09533, pruned_loss=0.01229, audio_tagging_loss=0.007772, over 15154.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08913, pruned_loss=0.01223, audio_tagging_loss=0.008645, over 3029575.50 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:22:02,116 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 19:22:02,341 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3534346.6666666665, ans=0.0 2023-11-26 19:22:20,507 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3534480.0, ans=0.125 2023-11-26 19:22:35,624 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3534546.6666666665, ans=0.125 2023-11-26 19:22:40,279 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3534546.6666666665, ans=0.1 2023-11-26 19:22:43,559 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3534613.3333333335, ans=0.125 2023-11-26 19:22:49,402 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.17 vs. limit=15.0 2023-11-26 19:22:50,892 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 530200 2023-11-26 19:22:55,389 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 1150, loss[loss=0.07974, simple_loss=0.1147, pruned_loss=0.01585, audio_tagging_loss=0.00653, over 16341.00 frames. ], tot_loss[loss=0.065, simple_loss=0.08836, pruned_loss=0.01213, audio_tagging_loss=0.008687, over 3035176.30 frames. ], batch size: 59, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:23:10,300 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.87 vs. limit=15.0 2023-11-26 19:23:27,470 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=3534813.3333333335, ans=15.0 2023-11-26 19:23:29,290 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3534880.0, ans=0.125 2023-11-26 19:23:44,886 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.438e+01 8.885e+01 9.403e+01 1.020e+02 1.405e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-26 19:23:46,010 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 530250 2023-11-26 19:23:48,815 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.54 vs. limit=10.0 2023-11-26 19:23:50,184 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 1200, loss[loss=0.06025, simple_loss=0.09145, pruned_loss=0.008026, audio_tagging_loss=0.006498, over 14286.00 frames. ], tot_loss[loss=0.06401, simple_loss=0.0871, pruned_loss=0.01181, audio_tagging_loss=0.008642, over 3035236.80 frames. ], batch size: 54, lr: 1.51e-03, grad_scale: 32.0 2023-11-26 19:23:53,305 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3535013.3333333335, ans=0.125 2023-11-26 19:23:59,462 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3535013.3333333335, ans=0.1 2023-11-26 19:24:01,705 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3535080.0, ans=0.09899494936611666 2023-11-26 19:24:10,858 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3535080.0, ans=0.125 2023-11-26 19:24:16,494 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.84 vs. limit=15.0 2023-11-26 19:24:28,010 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 19:24:29,483 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.73 vs. limit=15.0 2023-11-26 19:24:36,415 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.20 vs. limit=15.0 2023-11-26 19:24:43,512 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 530300 2023-11-26 19:24:43,917 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2023-11-26 19:24:47,723 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 1250, loss[loss=0.07526, simple_loss=0.1045, pruned_loss=0.01497, audio_tagging_loss=0.008053, over 15377.00 frames. ], tot_loss[loss=0.06483, simple_loss=0.08828, pruned_loss=0.012, audio_tagging_loss=0.008695, over 3047052.97 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 32.0 2023-11-26 19:24:48,335 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.98 vs. limit=12.0 2023-11-26 19:24:54,224 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.64 vs. limit=12.0 2023-11-26 19:25:08,735 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3535480.0, ans=0.0 2023-11-26 19:25:28,879 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3535546.6666666665, ans=0.125 2023-11-26 19:25:29,976 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3535546.6666666665, ans=0.125 2023-11-26 19:25:36,861 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3535613.3333333335, ans=0.2 2023-11-26 19:25:38,826 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.083e+01 8.830e+01 9.575e+01 1.052e+02 2.949e+02, threshold=1.915e+02, percent-clipped=1.0 2023-11-26 19:25:38,936 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 530350 2023-11-26 19:25:43,055 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 1300, loss[loss=0.06545, simple_loss=0.08426, pruned_loss=0.0128, audio_tagging_loss=0.01051, over 14096.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08844, pruned_loss=0.01205, audio_tagging_loss=0.008632, over 3046935.28 frames. ], batch size: 54, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:26:10,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3535813.3333333335, ans=0.1 2023-11-26 19:26:33,581 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 530400 2023-11-26 19:26:33,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3535946.6666666665, ans=0.05 2023-11-26 19:26:38,072 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 1350, loss[loss=0.06958, simple_loss=0.09403, pruned_loss=0.01399, audio_tagging_loss=0.008567, over 14961.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08883, pruned_loss=0.01209, audio_tagging_loss=0.008615, over 3039669.14 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:26:44,214 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3536013.3333333335, ans=0.125 2023-11-26 19:26:45,467 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2023-11-26 19:26:53,408 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3536080.0, ans=0.0 2023-11-26 19:27:01,510 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3536146.6666666665, ans=0.0 2023-11-26 19:27:08,747 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3536146.6666666665, ans=0.125 2023-11-26 19:27:18,137 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 19:27:24,611 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.91 vs. limit=12.0 2023-11-26 19:27:30,511 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.624e+01 8.761e+01 9.344e+01 1.009e+02 1.308e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-26 19:27:30,635 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 530450 2023-11-26 19:27:34,958 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 1400, loss[loss=0.05642, simple_loss=0.07386, pruned_loss=0.01118, audio_tagging_loss=0.008303, over 14208.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08869, pruned_loss=0.01206, audio_tagging_loss=0.008658, over 3038924.83 frames. ], batch size: 53, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:27:37,408 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3536346.6666666665, ans=0.125 2023-11-26 19:27:59,108 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3536480.0, ans=0.025 2023-11-26 19:28:17,723 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3536546.6666666665, ans=0.0 2023-11-26 19:28:24,192 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3536613.3333333335, ans=0.2 2023-11-26 19:28:26,121 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 530500 2023-11-26 19:28:30,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3536680.0, ans=0.0 2023-11-26 19:28:30,465 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.59 vs. limit=22.5 2023-11-26 19:28:30,901 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 1450, loss[loss=0.08564, simple_loss=0.1145, pruned_loss=0.02068, audio_tagging_loss=0.00771, over 15450.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08953, pruned_loss=0.01214, audio_tagging_loss=0.008735, over 3045415.18 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:28:32,129 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3536680.0, ans=0.125 2023-11-26 19:28:39,645 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3536680.0, ans=0.125 2023-11-26 19:28:51,977 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3536813.3333333335, ans=0.0 2023-11-26 19:28:52,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3536813.3333333335, ans=0.125 2023-11-26 19:28:58,311 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3536813.3333333335, ans=0.0 2023-11-26 19:29:22,143 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.568e+01 8.955e+01 9.646e+01 1.028e+02 1.353e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-26 19:29:22,235 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 530550 2023-11-26 19:29:26,584 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 1500, loss[loss=0.06955, simple_loss=0.09332, pruned_loss=0.01402, audio_tagging_loss=0.008868, over 15139.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.09033, pruned_loss=0.01217, audio_tagging_loss=0.008834, over 3045053.83 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:29:54,986 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3537146.6666666665, ans=0.05 2023-11-26 19:30:01,018 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3537213.3333333335, ans=0.0 2023-11-26 19:30:08,959 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.45 vs. limit=15.0 2023-11-26 19:30:09,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3537213.3333333335, ans=0.0 2023-11-26 19:30:14,415 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3537280.0, ans=0.0 2023-11-26 19:30:17,749 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.12 vs. limit=22.5 2023-11-26 19:30:18,354 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 530600 2023-11-26 19:30:23,554 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 1550, loss[loss=0.06254, simple_loss=0.08298, pruned_loss=0.008853, audio_tagging_loss=0.0122, over 14763.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08967, pruned_loss=0.01196, audio_tagging_loss=0.008851, over 3044946.12 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:30:25,871 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3537346.6666666665, ans=0.125 2023-11-26 19:30:36,468 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3537413.3333333335, ans=0.0 2023-11-26 19:31:10,221 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3537613.3333333335, ans=0.09899494936611666 2023-11-26 19:31:14,239 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.522e+01 9.127e+01 9.575e+01 1.042e+02 1.304e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-26 19:31:14,334 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 530650 2023-11-26 19:31:18,489 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 1600, loss[loss=0.05589, simple_loss=0.07563, pruned_loss=0.007275, audio_tagging_loss=0.0108, over 14696.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08914, pruned_loss=0.01185, audio_tagging_loss=0.008949, over 3040707.00 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 32.0 2023-11-26 19:31:23,063 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3537680.0, ans=0.125 2023-11-26 19:31:24,595 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3537680.0, ans=0.0 2023-11-26 19:31:25,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3537680.0, ans=0.0 2023-11-26 19:31:26,636 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3537680.0, ans=0.0 2023-11-26 19:31:51,728 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 19:31:51,872 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3537880.0, ans=0.125 2023-11-26 19:32:10,430 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 530700 2023-11-26 19:32:14,681 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 1650, loss[loss=0.05986, simple_loss=0.07834, pruned_loss=0.01085, audio_tagging_loss=0.009841, over 14286.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08866, pruned_loss=0.0119, audio_tagging_loss=0.008953, over 3033052.34 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:32:16,974 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3538013.3333333335, ans=0.1 2023-11-26 19:32:38,847 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3538146.6666666665, ans=0.125 2023-11-26 19:32:40,929 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3538146.6666666665, ans=0.125 2023-11-26 19:32:47,825 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.07 vs. limit=15.0 2023-11-26 19:32:56,357 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3538213.3333333335, ans=0.0 2023-11-26 19:33:05,488 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3538280.0, ans=0.125 2023-11-26 19:33:05,513 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3538280.0, ans=0.1 2023-11-26 19:33:06,440 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 530750 2023-11-26 19:33:08,485 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.083e+01 8.661e+01 9.270e+01 1.008e+02 1.328e+02, threshold=1.854e+02, percent-clipped=0.0 2023-11-26 19:33:11,682 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 1700, loss[loss=0.08159, simple_loss=0.1162, pruned_loss=0.01531, audio_tagging_loss=0.008179, over 14524.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08931, pruned_loss=0.01203, audio_tagging_loss=0.00892, over 3039445.85 frames. ], batch size: 54, lr: 1.51e-03, grad_scale: 8.0 2023-11-26 19:33:45,422 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3538546.6666666665, ans=0.0 2023-11-26 19:33:50,283 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3538546.6666666665, ans=0.125 2023-11-26 19:33:54,626 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3538546.6666666665, ans=0.125 2023-11-26 19:34:03,010 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 530800 2023-11-26 19:34:07,596 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 1750, loss[loss=0.07896, simple_loss=0.1075, pruned_loss=0.01793, audio_tagging_loss=0.007273, over 15332.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08934, pruned_loss=0.0121, audio_tagging_loss=0.008854, over 3044597.76 frames. ], batch size: 57, lr: 1.51e-03, grad_scale: 8.0 2023-11-26 19:34:33,322 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3538813.3333333335, ans=0.2 2023-11-26 19:34:37,610 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3538813.3333333335, ans=0.07 2023-11-26 19:34:58,827 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 530850 2023-11-26 19:35:00,912 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.543e+01 9.072e+01 9.578e+01 1.039e+02 1.393e+02, threshold=1.916e+02, percent-clipped=0.0 2023-11-26 19:35:03,575 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 1800, loss[loss=0.07194, simple_loss=0.09397, pruned_loss=0.01411, audio_tagging_loss=0.01084, over 16756.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08923, pruned_loss=0.01198, audio_tagging_loss=0.008776, over 3053662.42 frames. ], batch size: 62, lr: 1.51e-03, grad_scale: 8.0 2023-11-26 19:35:32,226 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3539146.6666666665, ans=0.0 2023-11-26 19:35:39,613 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3539213.3333333335, ans=0.125 2023-11-26 19:35:40,002 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.74 vs. limit=10.0 2023-11-26 19:35:43,289 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.63 vs. limit=10.0 2023-11-26 19:35:55,082 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 530900 2023-11-26 19:35:59,861 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 1850, loss[loss=0.06388, simple_loss=0.08419, pruned_loss=0.0104, audio_tagging_loss=0.01138, over 14040.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.09028, pruned_loss=0.01223, audio_tagging_loss=0.008698, over 3056812.10 frames. ], batch size: 57, lr: 1.51e-03, grad_scale: 8.0 2023-11-26 19:36:19,168 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3539413.3333333335, ans=0.0 2023-11-26 19:36:35,706 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3539546.6666666665, ans=0.125 2023-11-26 19:36:35,752 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3539546.6666666665, ans=0.2 2023-11-26 19:36:36,094 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.81 vs. limit=15.0 2023-11-26 19:36:51,403 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 530950 2023-11-26 19:36:52,808 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.43 vs. limit=15.0 2023-11-26 19:36:53,505 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.468e+01 8.894e+01 9.427e+01 1.010e+02 7.555e+02, threshold=1.885e+02, percent-clipped=1.0 2023-11-26 19:36:55,631 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 1900, loss[loss=0.06724, simple_loss=0.09087, pruned_loss=0.01311, audio_tagging_loss=0.008687, over 14671.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08961, pruned_loss=0.01218, audio_tagging_loss=0.008686, over 3064204.16 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 8.0 2023-11-26 19:37:05,634 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 19:37:23,950 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.48 vs. limit=15.0 2023-11-26 19:37:24,830 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3539813.3333333335, ans=0.1 2023-11-26 19:37:27,410 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3539813.3333333335, ans=0.0 2023-11-26 19:37:46,345 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 531000 2023-11-26 19:37:50,783 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 1950, loss[loss=0.05896, simple_loss=0.07336, pruned_loss=0.01452, audio_tagging_loss=0.007764, over 14432.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08952, pruned_loss=0.01213, audio_tagging_loss=0.008736, over 3055395.90 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 8.0 2023-11-26 19:37:57,489 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3540013.3333333335, ans=0.2 2023-11-26 19:38:03,057 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.56 vs. limit=6.0 2023-11-26 19:38:42,142 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 531050 2023-11-26 19:38:44,172 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.825e+01 8.819e+01 9.302e+01 9.928e+01 1.179e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-26 19:38:46,887 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 2000, loss[loss=0.0868, simple_loss=0.1169, pruned_loss=0.02189, audio_tagging_loss=0.006473, over 15621.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08982, pruned_loss=0.01225, audio_tagging_loss=0.008679, over 3055994.31 frames. ], batch size: 57, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:38:49,661 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.54 vs. limit=6.0 2023-11-26 19:39:14,397 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3540480.0, ans=0.125 2023-11-26 19:39:18,593 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3540546.6666666665, ans=0.1 2023-11-26 19:39:29,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3540546.6666666665, ans=0.125 2023-11-26 19:39:36,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3540613.3333333335, ans=0.125 2023-11-26 19:39:38,329 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 531100 2023-11-26 19:39:42,513 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 2050, loss[loss=0.05864, simple_loss=0.07316, pruned_loss=0.009738, audio_tagging_loss=0.01233, over 14865.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08958, pruned_loss=0.01226, audio_tagging_loss=0.008676, over 3049859.73 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:39:50,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3540680.0, ans=0.07 2023-11-26 19:39:54,894 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.09 vs. limit=15.0 2023-11-26 19:40:03,997 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3540813.3333333335, ans=0.2 2023-11-26 19:40:05,072 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3540813.3333333335, ans=0.125 2023-11-26 19:40:10,191 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.94 vs. limit=22.5 2023-11-26 19:40:33,205 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 531150 2023-11-26 19:40:35,283 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.661e+01 8.849e+01 9.497e+01 1.034e+02 1.158e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-26 19:40:37,439 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 2100, loss[loss=0.05787, simple_loss=0.07956, pruned_loss=0.006466, audio_tagging_loss=0.01162, over 14878.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08908, pruned_loss=0.01199, audio_tagging_loss=0.008741, over 3047610.04 frames. ], batch size: 58, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:40:46,424 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.91 vs. limit=10.0 2023-11-26 19:40:51,113 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3541080.0, ans=0.1 2023-11-26 19:40:52,163 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3541080.0, ans=0.125 2023-11-26 19:40:53,734 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3541080.0, ans=0.1 2023-11-26 19:41:06,592 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3541146.6666666665, ans=0.0 2023-11-26 19:41:24,798 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3541280.0, ans=0.125 2023-11-26 19:41:28,862 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 531200 2023-11-26 19:41:33,344 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 2150, loss[loss=0.05643, simple_loss=0.06831, pruned_loss=0.01084, audio_tagging_loss=0.01143, over 14899.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08989, pruned_loss=0.01205, audio_tagging_loss=0.008708, over 3045147.81 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:41:39,958 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 19:41:53,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3541413.3333333335, ans=0.125 2023-11-26 19:41:56,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3541480.0, ans=0.125 2023-11-26 19:41:57,627 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3541480.0, ans=0.125 2023-11-26 19:42:04,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3541480.0, ans=0.1 2023-11-26 19:42:06,942 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 19:42:12,590 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3541546.6666666665, ans=0.0 2023-11-26 19:42:12,596 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3541546.6666666665, ans=0.125 2023-11-26 19:42:13,438 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3541546.6666666665, ans=0.1 2023-11-26 19:42:14,889 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.64 vs. limit=12.0 2023-11-26 19:42:17,423 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3541613.3333333335, ans=0.125 2023-11-26 19:42:18,332 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3541613.3333333335, ans=0.2 2023-11-26 19:42:18,515 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3541613.3333333335, ans=0.2 2023-11-26 19:42:26,173 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 531250 2023-11-26 19:42:28,255 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.511e+01 8.897e+01 9.501e+01 1.024e+02 1.712e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-26 19:42:30,380 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 2200, loss[loss=0.07261, simple_loss=0.1011, pruned_loss=0.01436, audio_tagging_loss=0.007712, over 14447.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.09051, pruned_loss=0.01216, audio_tagging_loss=0.008629, over 3049483.97 frames. ], batch size: 54, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:42:34,720 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3541680.0, ans=0.125 2023-11-26 19:43:19,652 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3541946.6666666665, ans=0.125 2023-11-26 19:43:21,602 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 531300 2023-11-26 19:43:22,959 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3541946.6666666665, ans=0.125 2023-11-26 19:43:25,122 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3542013.3333333335, ans=0.2 2023-11-26 19:43:25,875 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 2250, loss[loss=0.06343, simple_loss=0.08115, pruned_loss=0.01336, audio_tagging_loss=0.009491, over 14191.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.09006, pruned_loss=0.01218, audio_tagging_loss=0.008698, over 3042573.60 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:43:33,624 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3542013.3333333335, ans=0.0 2023-11-26 19:44:16,392 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3542280.0, ans=0.125 2023-11-26 19:44:17,341 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 531350 2023-11-26 19:44:19,358 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.901e+01 8.880e+01 9.630e+01 1.027e+02 2.263e+02, threshold=1.926e+02, percent-clipped=1.0 2023-11-26 19:44:21,511 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 2300, loss[loss=0.05585, simple_loss=0.07688, pruned_loss=0.00922, audio_tagging_loss=0.008195, over 14558.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.09008, pruned_loss=0.01209, audio_tagging_loss=0.0087, over 3045703.10 frames. ], batch size: 58, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:44:22,859 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3542346.6666666665, ans=0.125 2023-11-26 19:44:52,701 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3542480.0, ans=0.0 2023-11-26 19:44:54,824 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3542546.6666666665, ans=0.125 2023-11-26 19:45:01,269 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3542546.6666666665, ans=0.0 2023-11-26 19:45:05,639 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3542613.3333333335, ans=0.0 2023-11-26 19:45:11,715 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 19:45:14,429 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 531400 2023-11-26 19:45:18,978 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 2350, loss[loss=0.05837, simple_loss=0.07372, pruned_loss=0.01021, audio_tagging_loss=0.0113, over 16276.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08968, pruned_loss=0.01204, audio_tagging_loss=0.008802, over 3047215.67 frames. ], batch size: 63, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:45:22,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3542680.0, ans=0.125 2023-11-26 19:45:29,265 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3542746.6666666665, ans=0.1 2023-11-26 19:45:46,556 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3542813.3333333335, ans=0.125 2023-11-26 19:45:51,316 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3542880.0, ans=0.125 2023-11-26 19:46:10,700 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 531450 2023-11-26 19:46:12,734 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.423e+01 8.878e+01 9.481e+01 9.940e+01 1.139e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-26 19:46:14,886 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 2400, loss[loss=0.06959, simple_loss=0.102, pruned_loss=0.01056, audio_tagging_loss=0.008022, over 15229.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.09016, pruned_loss=0.01206, audio_tagging_loss=0.008901, over 3051708.01 frames. ], batch size: 54, lr: 1.51e-03, grad_scale: 32.0 2023-11-26 19:46:23,604 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3543013.3333333335, ans=0.0 2023-11-26 19:46:26,737 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3543080.0, ans=0.0 2023-11-26 19:46:31,060 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3543080.0, ans=0.125 2023-11-26 19:46:41,446 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.69 vs. limit=6.0 2023-11-26 19:46:50,297 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.64 vs. limit=15.0 2023-11-26 19:46:52,130 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3543213.3333333335, ans=0.2 2023-11-26 19:47:05,840 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 531500 2023-11-26 19:47:10,001 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 2450, loss[loss=0.05106, simple_loss=0.06346, pruned_loss=0.0077, audio_tagging_loss=0.01163, over 15610.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.08963, pruned_loss=0.01201, audio_tagging_loss=0.009072, over 3050795.47 frames. ], batch size: 58, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:47:20,930 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-26 19:47:27,700 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3543413.3333333335, ans=0.0 2023-11-26 19:47:44,589 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.26 vs. limit=15.0 2023-11-26 19:47:45,441 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3543546.6666666665, ans=0.2 2023-11-26 19:47:48,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3543546.6666666665, ans=0.5 2023-11-26 19:48:01,783 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.26 vs. limit=15.0 2023-11-26 19:48:02,290 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 531550 2023-11-26 19:48:05,933 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.504e+01 8.711e+01 9.383e+01 9.958e+01 1.270e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-26 19:48:07,072 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 2500, loss[loss=0.05816, simple_loss=0.07957, pruned_loss=0.009247, audio_tagging_loss=0.009129, over 16116.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.08956, pruned_loss=0.01205, audio_tagging_loss=0.00906, over 3044951.18 frames. ], batch size: 59, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:48:10,537 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3543680.0, ans=0.125 2023-11-26 19:48:13,767 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3543680.0, ans=0.125 2023-11-26 19:48:24,029 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3543746.6666666665, ans=0.1 2023-11-26 19:48:27,028 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3543746.6666666665, ans=0.2 2023-11-26 19:48:30,388 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3543813.3333333335, ans=0.0 2023-11-26 19:48:53,046 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3543946.6666666665, ans=0.125 2023-11-26 19:48:58,199 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 531600 2023-11-26 19:49:03,309 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 2550, loss[loss=0.05631, simple_loss=0.08174, pruned_loss=0.008288, audio_tagging_loss=0.007152, over 14625.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08908, pruned_loss=0.01203, audio_tagging_loss=0.009014, over 3048837.73 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 16.0 2023-11-26 19:49:16,156 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3544080.0, ans=0.125 2023-11-26 19:49:19,327 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3544080.0, ans=0.1 2023-11-26 19:49:29,202 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.36 vs. limit=22.5 2023-11-26 19:49:54,488 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 531650 2023-11-26 19:49:55,633 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3544280.0, ans=0.0 2023-11-26 19:49:57,593 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.308e+01 8.904e+01 9.624e+01 1.038e+02 1.426e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-26 19:49:58,729 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 2600, loss[loss=0.07856, simple_loss=0.1101, pruned_loss=0.01403, audio_tagging_loss=0.009476, over 14708.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08906, pruned_loss=0.01216, audio_tagging_loss=0.008889, over 3046016.17 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 19:50:04,565 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.13 vs. limit=15.0 2023-11-26 19:50:26,336 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3544480.0, ans=0.0 2023-11-26 19:50:50,880 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 531700 2023-11-26 19:50:52,700 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3544613.3333333335, ans=0.125 2023-11-26 19:50:53,559 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3544613.3333333335, ans=0.125 2023-11-26 19:50:56,196 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 2650, loss[loss=0.07351, simple_loss=0.102, pruned_loss=0.01498, audio_tagging_loss=0.007537, over 15237.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08915, pruned_loss=0.01213, audio_tagging_loss=0.008779, over 3043154.60 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 19:50:59,552 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3544680.0, ans=0.125 2023-11-26 19:51:00,628 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3544680.0, ans=0.0 2023-11-26 19:51:13,010 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3544746.6666666665, ans=0.125 2023-11-26 19:51:15,322 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.94 vs. limit=15.0 2023-11-26 19:51:42,223 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3544946.6666666665, ans=0.125 2023-11-26 19:51:47,396 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 531750 2023-11-26 19:51:50,500 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.276e+01 8.824e+01 9.475e+01 1.010e+02 1.281e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-26 19:51:51,587 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 2700, loss[loss=0.06624, simple_loss=0.08451, pruned_loss=0.01363, audio_tagging_loss=0.01035, over 15875.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08925, pruned_loss=0.01218, audio_tagging_loss=0.008721, over 3044124.93 frames. ], batch size: 61, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 19:51:52,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3545013.3333333335, ans=0.125 2023-11-26 19:51:59,702 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3545013.3333333335, ans=0.125 2023-11-26 19:52:26,639 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3545213.3333333335, ans=0.125 2023-11-26 19:52:29,155 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.73 vs. limit=22.5 2023-11-26 19:52:30,667 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3545213.3333333335, ans=0.125 2023-11-26 19:52:30,860 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3545213.3333333335, ans=0.0 2023-11-26 19:52:31,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3545213.3333333335, ans=0.125 2023-11-26 19:52:42,950 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 531800 2023-11-26 19:52:45,413 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3545280.0, ans=0.125 2023-11-26 19:52:46,627 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3545346.6666666665, ans=0.95 2023-11-26 19:52:47,353 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 2750, loss[loss=0.05314, simple_loss=0.07339, pruned_loss=0.008651, audio_tagging_loss=0.007796, over 15380.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08869, pruned_loss=0.01207, audio_tagging_loss=0.008734, over 3039110.08 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 19:53:12,017 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3545480.0, ans=0.2 2023-11-26 19:53:12,217 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.12 vs. limit=12.0 2023-11-26 19:53:19,918 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.94 vs. limit=22.5 2023-11-26 19:53:26,261 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3545546.6666666665, ans=0.125 2023-11-26 19:53:27,434 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3545546.6666666665, ans=0.0 2023-11-26 19:53:28,287 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3545546.6666666665, ans=0.125 2023-11-26 19:53:34,572 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 19:53:34,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3545613.3333333335, ans=0.2 2023-11-26 19:53:37,816 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 531850 2023-11-26 19:53:42,035 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.836e+01 8.910e+01 9.547e+01 1.021e+02 1.473e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-26 19:53:43,095 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 2800, loss[loss=0.0708, simple_loss=0.09979, pruned_loss=0.01407, audio_tagging_loss=0.006831, over 14832.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.0895, pruned_loss=0.01215, audio_tagging_loss=0.008628, over 3040593.52 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 19:53:46,980 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3545680.0, ans=0.125 2023-11-26 19:54:15,239 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3545880.0, ans=0.125 2023-11-26 19:54:20,587 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 19:54:20,709 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3545880.0, ans=0.125 2023-11-26 19:54:25,966 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3545880.0, ans=0.0 2023-11-26 19:54:27,490 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=3545946.6666666665, ans=0.5 2023-11-26 19:54:34,811 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 531900 2023-11-26 19:54:35,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3545946.6666666665, ans=0.0 2023-11-26 19:54:38,992 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 2850, loss[loss=0.0828, simple_loss=0.114, pruned_loss=0.01741, audio_tagging_loss=0.008367, over 16353.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08889, pruned_loss=0.01212, audio_tagging_loss=0.0086, over 3046560.69 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 19:55:12,470 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3546213.3333333335, ans=0.125 2023-11-26 19:55:27,559 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3546280.0, ans=0.95 2023-11-26 19:55:30,548 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 531950 2023-11-26 19:55:31,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3546280.0, ans=0.125 2023-11-26 19:55:33,332 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.96 vs. limit=15.0 2023-11-26 19:55:34,705 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.532e+01 8.835e+01 9.315e+01 9.874e+01 1.722e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-26 19:55:34,732 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 2900, loss[loss=0.05639, simple_loss=0.07303, pruned_loss=0.009881, audio_tagging_loss=0.009992, over 14893.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08978, pruned_loss=0.01216, audio_tagging_loss=0.008581, over 3051509.77 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 19:55:35,016 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3546346.6666666665, ans=0.125 2023-11-26 19:55:46,340 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3546413.3333333335, ans=0.0 2023-11-26 19:55:55,353 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3546413.3333333335, ans=0.0 2023-11-26 19:56:07,146 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3546480.0, ans=0.125 2023-11-26 19:56:10,740 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.02 vs. limit=22.5 2023-11-26 19:56:26,678 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 532000 2023-11-26 19:56:33,904 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 2950, loss[loss=0.04742, simple_loss=0.05733, pruned_loss=0.008076, audio_tagging_loss=0.01068, over 14604.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08959, pruned_loss=0.01212, audio_tagging_loss=0.008592, over 3056795.64 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 19:56:38,351 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3546680.0, ans=0.125 2023-11-26 19:56:48,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3546746.6666666665, ans=0.1 2023-11-26 19:57:09,170 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.48 vs. limit=15.0 2023-11-26 19:57:13,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3546880.0, ans=0.125 2023-11-26 19:57:22,586 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.70 vs. limit=15.0 2023-11-26 19:57:26,198 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 532050 2023-11-26 19:57:30,293 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.646e+01 8.991e+01 9.608e+01 1.049e+02 1.344e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-26 19:57:30,324 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 3000, loss[loss=0.06115, simple_loss=0.07897, pruned_loss=0.01273, audio_tagging_loss=0.008933, over 14923.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08969, pruned_loss=0.0123, audio_tagging_loss=0.008664, over 3051070.02 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 19:57:30,325 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-26 19:58:03,031 INFO [train_asr.py:1267] (3/4) Epoch 45, validation: loss=0.05745, simple_loss=0.05048, pruned_loss=0.005228, audio_tagging_loss=0.02698, over 4681554.00 frames. 2023-11-26 19:58:03,031 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-26 19:58:05,463 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3547013.3333333335, ans=0.1 2023-11-26 19:58:33,130 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.68 vs. limit=22.5 2023-11-26 19:58:39,114 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3547213.3333333335, ans=0.125 2023-11-26 19:58:54,330 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 532100 2023-11-26 19:58:54,516 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3547280.0, ans=0.125 2023-11-26 19:59:00,132 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 3050, loss[loss=0.08234, simple_loss=0.1121, pruned_loss=0.01648, audio_tagging_loss=0.009817, over 14955.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.0893, pruned_loss=0.0121, audio_tagging_loss=0.008796, over 3049089.95 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 19:59:07,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3547346.6666666665, ans=0.0 2023-11-26 19:59:12,135 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3547413.3333333335, ans=0.125 2023-11-26 19:59:15,758 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.28 vs. limit=10.0 2023-11-26 19:59:29,783 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3547480.0, ans=0.2 2023-11-26 19:59:31,803 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 19:59:40,942 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.06 vs. limit=15.0 2023-11-26 19:59:51,776 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 532150 2023-11-26 19:59:55,897 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.630e+01 8.969e+01 9.484e+01 1.021e+02 1.234e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-26 19:59:55,924 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 3100, loss[loss=0.06347, simple_loss=0.0873, pruned_loss=0.008789, audio_tagging_loss=0.01103, over 14198.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.0897, pruned_loss=0.01221, audio_tagging_loss=0.008892, over 3047963.73 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:00:03,570 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3547680.0, ans=0.125 2023-11-26 20:00:10,467 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.44 vs. limit=15.0 2023-11-26 20:00:14,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3547746.6666666665, ans=0.1 2023-11-26 20:00:15,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3547746.6666666665, ans=0.0 2023-11-26 20:00:39,947 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3547946.6666666665, ans=0.125 2023-11-26 20:00:47,252 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 532200 2023-11-26 20:00:51,718 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 3150, loss[loss=0.07137, simple_loss=0.09001, pruned_loss=0.01416, audio_tagging_loss=0.01221, over 14632.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09047, pruned_loss=0.01235, audio_tagging_loss=0.008962, over 3044357.62 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:01:19,598 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 20:01:43,587 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 532250 2023-11-26 20:01:44,998 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.80 vs. limit=22.5 2023-11-26 20:01:48,384 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 3200, loss[loss=0.04837, simple_loss=0.0606, pruned_loss=0.008915, audio_tagging_loss=0.009152, over 14723.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.08987, pruned_loss=0.01238, audio_tagging_loss=0.009082, over 3047286.82 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:01:49,928 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.921e+01 8.870e+01 9.654e+01 1.076e+02 1.284e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-26 20:01:56,369 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.42 vs. limit=22.5 2023-11-26 20:02:17,620 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.62 vs. limit=12.0 2023-11-26 20:02:35,966 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.61 vs. limit=22.5 2023-11-26 20:02:36,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3548613.3333333335, ans=0.125 2023-11-26 20:02:40,650 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 532300 2023-11-26 20:02:41,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3548613.3333333335, ans=0.125 2023-11-26 20:02:44,895 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 3250, loss[loss=0.05771, simple_loss=0.08022, pruned_loss=0.008816, audio_tagging_loss=0.008781, over 15147.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.08988, pruned_loss=0.01236, audio_tagging_loss=0.009058, over 3045982.31 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:02:45,048 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3548680.0, ans=0.2 2023-11-26 20:02:51,291 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3548680.0, ans=0.0 2023-11-26 20:02:52,550 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3548680.0, ans=0.0 2023-11-26 20:03:00,015 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3548746.6666666665, ans=0.07 2023-11-26 20:03:00,035 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3548746.6666666665, ans=0.2 2023-11-26 20:03:00,090 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3548746.6666666665, ans=0.0 2023-11-26 20:03:03,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3548746.6666666665, ans=0.0 2023-11-26 20:03:13,000 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.40 vs. limit=15.0 2023-11-26 20:03:23,324 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3548880.0, ans=0.125 2023-11-26 20:03:25,503 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3548880.0, ans=0.0 2023-11-26 20:03:30,633 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3548946.6666666665, ans=0.0 2023-11-26 20:03:35,905 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 532350 2023-11-26 20:03:40,052 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 3300, loss[loss=0.0771, simple_loss=0.108, pruned_loss=0.01435, audio_tagging_loss=0.008731, over 14847.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.08964, pruned_loss=0.0123, audio_tagging_loss=0.009147, over 3048021.59 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:03:41,085 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.562e+01 8.931e+01 9.545e+01 1.032e+02 1.663e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-26 20:03:48,039 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.52 vs. limit=6.0 2023-11-26 20:03:50,963 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3549080.0, ans=0.125 2023-11-26 20:04:05,361 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3549146.6666666665, ans=0.125 2023-11-26 20:04:09,641 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3549146.6666666665, ans=0.1 2023-11-26 20:04:20,223 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3549213.3333333335, ans=0.125 2023-11-26 20:04:29,000 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3549280.0, ans=0.125 2023-11-26 20:04:30,989 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 532400 2023-11-26 20:04:35,386 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 3350, loss[loss=0.07646, simple_loss=0.1008, pruned_loss=0.01546, audio_tagging_loss=0.0106, over 14284.00 frames. ], tot_loss[loss=0.06682, simple_loss=0.09059, pruned_loss=0.01249, audio_tagging_loss=0.009028, over 3050448.40 frames. ], batch size: 53, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 20:05:12,944 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3549546.6666666665, ans=0.125 2023-11-26 20:05:27,171 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 532450 2023-11-26 20:05:31,385 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 3400, loss[loss=0.07626, simple_loss=0.1051, pruned_loss=0.01429, audio_tagging_loss=0.009429, over 16067.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.08997, pruned_loss=0.01239, audio_tagging_loss=0.008861, over 3058557.47 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 20:05:33,490 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.319e+01 8.780e+01 9.356e+01 1.019e+02 1.296e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-26 20:05:40,374 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.60 vs. limit=6.0 2023-11-26 20:05:46,898 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.88 vs. limit=15.0 2023-11-26 20:05:48,840 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.52 vs. limit=15.0 2023-11-26 20:06:22,037 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 532500 2023-11-26 20:06:26,251 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 3450, loss[loss=0.05966, simple_loss=0.08111, pruned_loss=0.01072, audio_tagging_loss=0.008386, over 15392.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.0904, pruned_loss=0.01232, audio_tagging_loss=0.008761, over 3060487.12 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 20:06:33,192 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.27 vs. limit=6.0 2023-11-26 20:06:51,943 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3550146.6666666665, ans=0.125 2023-11-26 20:06:52,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3550146.6666666665, ans=0.125 2023-11-26 20:06:58,562 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.83 vs. limit=15.0 2023-11-26 20:07:04,758 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3550213.3333333335, ans=0.125 2023-11-26 20:07:06,962 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3550213.3333333335, ans=0.0 2023-11-26 20:07:13,462 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.06 vs. limit=22.5 2023-11-26 20:07:17,335 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 532550 2023-11-26 20:07:21,478 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 3500, loss[loss=0.05609, simple_loss=0.07977, pruned_loss=0.01045, audio_tagging_loss=0.005752, over 14996.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08931, pruned_loss=0.01227, audio_tagging_loss=0.008653, over 3060157.64 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 20:07:23,648 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.644e+01 8.941e+01 9.512e+01 1.027e+02 1.407e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-26 20:07:34,413 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3550413.3333333335, ans=0.1 2023-11-26 20:07:50,184 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 20:08:04,088 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3550546.6666666665, ans=0.125 2023-11-26 20:08:14,253 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 532600 2023-11-26 20:08:19,284 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 3550, loss[loss=0.08258, simple_loss=0.1075, pruned_loss=0.01939, audio_tagging_loss=0.009434, over 15453.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08899, pruned_loss=0.01229, audio_tagging_loss=0.008651, over 3059534.92 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 20:08:36,650 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3550746.6666666665, ans=0.1 2023-11-26 20:08:47,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3550813.3333333335, ans=0.125 2023-11-26 20:08:52,460 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3550880.0, ans=0.125 2023-11-26 20:08:56,252 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3550880.0, ans=0.125 2023-11-26 20:09:03,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3550946.6666666665, ans=0.0 2023-11-26 20:09:10,417 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 532650 2023-11-26 20:09:14,581 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 3600, loss[loss=0.05774, simple_loss=0.07785, pruned_loss=0.01064, audio_tagging_loss=0.008171, over 15138.00 frames. ], tot_loss[loss=0.06478, simple_loss=0.0882, pruned_loss=0.01207, audio_tagging_loss=0.008613, over 3057165.79 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:09:16,644 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.168e+01 8.819e+01 9.427e+01 1.004e+02 1.284e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-26 20:09:21,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3551013.3333333335, ans=0.1 2023-11-26 20:09:27,465 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3551080.0, ans=0.125 2023-11-26 20:09:28,741 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3551080.0, ans=0.125 2023-11-26 20:09:52,273 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.80 vs. limit=12.0 2023-11-26 20:09:52,275 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.72 vs. limit=15.0 2023-11-26 20:09:54,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3551213.3333333335, ans=0.1 2023-11-26 20:09:57,254 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3551213.3333333335, ans=0.1 2023-11-26 20:10:05,622 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 532700 2023-11-26 20:10:09,758 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 3650, loss[loss=0.07766, simple_loss=0.1046, pruned_loss=0.01648, audio_tagging_loss=0.008907, over 15678.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.08884, pruned_loss=0.01202, audio_tagging_loss=0.008647, over 3063487.85 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:10:14,910 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3551346.6666666665, ans=0.1 2023-11-26 20:10:25,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3551413.3333333335, ans=0.125 2023-11-26 20:10:33,538 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3551480.0, ans=0.125 2023-11-26 20:10:36,692 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3551480.0, ans=0.1 2023-11-26 20:10:58,616 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3551613.3333333335, ans=0.125 2023-11-26 20:11:00,210 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3551613.3333333335, ans=0.0 2023-11-26 20:11:02,765 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 532750 2023-11-26 20:11:06,845 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 3700, loss[loss=0.08186, simple_loss=0.11, pruned_loss=0.02115, audio_tagging_loss=0.005716, over 14639.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.0886, pruned_loss=0.01207, audio_tagging_loss=0.00867, over 3056435.05 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:11:08,924 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.760e+01 8.774e+01 9.496e+01 1.016e+02 1.285e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-26 20:11:22,393 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3551746.6666666665, ans=0.95 2023-11-26 20:11:26,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3551746.6666666665, ans=0.0 2023-11-26 20:11:50,304 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.26 vs. limit=22.5 2023-11-26 20:11:55,245 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3551946.6666666665, ans=0.125 2023-11-26 20:11:57,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3551946.6666666665, ans=0.0 2023-11-26 20:11:58,957 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 532800 2023-11-26 20:12:03,448 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 3750, loss[loss=0.05784, simple_loss=0.08137, pruned_loss=0.01049, audio_tagging_loss=0.006672, over 15436.00 frames. ], tot_loss[loss=0.065, simple_loss=0.08824, pruned_loss=0.01216, audio_tagging_loss=0.008714, over 3060471.48 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:12:03,634 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3552013.3333333335, ans=0.125 2023-11-26 20:12:31,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3552146.6666666665, ans=0.0 2023-11-26 20:12:36,975 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3552213.3333333335, ans=0.125 2023-11-26 20:12:41,466 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 20:12:43,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3552213.3333333335, ans=0.125 2023-11-26 20:12:54,256 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 532850 2023-11-26 20:12:58,427 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 3800, loss[loss=0.05749, simple_loss=0.08979, pruned_loss=0.006328, audio_tagging_loss=0.006265, over 14298.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08914, pruned_loss=0.01225, audio_tagging_loss=0.0087, over 3058473.27 frames. ], batch size: 53, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:13:00,551 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.720e+01 8.984e+01 9.632e+01 1.029e+02 1.593e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-26 20:13:25,544 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.whiten.whitening_limit, batch_count=3552480.0, ans=12.0 2023-11-26 20:13:41,856 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3552613.3333333335, ans=0.1 2023-11-26 20:13:49,771 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 532900 2023-11-26 20:13:54,532 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 3850, loss[loss=0.07585, simple_loss=0.1147, pruned_loss=0.01189, audio_tagging_loss=0.006614, over 15674.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.0891, pruned_loss=0.01213, audio_tagging_loss=0.008703, over 3058556.80 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:13:59,232 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.90 vs. limit=15.0 2023-11-26 20:14:16,473 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3552813.3333333335, ans=0.125 2023-11-26 20:14:19,921 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.74 vs. limit=6.0 2023-11-26 20:14:34,786 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3552880.0, ans=0.0 2023-11-26 20:14:45,360 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 532950 2023-11-26 20:14:49,665 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 3900, loss[loss=0.06847, simple_loss=0.07617, pruned_loss=0.01835, audio_tagging_loss=0.01203, over 14402.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08879, pruned_loss=0.01222, audio_tagging_loss=0.00882, over 3054266.62 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:14:52,293 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.155e+01 8.931e+01 9.529e+01 1.011e+02 1.303e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-26 20:14:57,864 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3553013.3333333335, ans=0.0 2023-11-26 20:14:58,038 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.43 vs. limit=15.0 2023-11-26 20:14:58,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3553013.3333333335, ans=0.125 2023-11-26 20:15:00,547 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.37 vs. limit=15.0 2023-11-26 20:15:06,555 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.44 vs. limit=15.0 2023-11-26 20:15:22,227 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3553213.3333333335, ans=0.1 2023-11-26 20:15:31,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3553213.3333333335, ans=0.1 2023-11-26 20:15:31,931 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3553213.3333333335, ans=0.2 2023-11-26 20:15:36,723 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3553280.0, ans=0.0 2023-11-26 20:15:40,864 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 533000 2023-11-26 20:15:42,108 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3553280.0, ans=0.125 2023-11-26 20:15:45,314 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 3950, loss[loss=0.08749, simple_loss=0.1238, pruned_loss=0.01742, audio_tagging_loss=0.00816, over 15272.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.08954, pruned_loss=0.01241, audio_tagging_loss=0.009001, over 3048037.09 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:15:58,231 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3553413.3333333335, ans=0.2 2023-11-26 20:16:01,954 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3553413.3333333335, ans=0.1 2023-11-26 20:16:09,555 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3553480.0, ans=0.125 2023-11-26 20:16:10,821 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.04 vs. limit=22.5 2023-11-26 20:16:12,081 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.62 vs. limit=15.0 2023-11-26 20:16:12,705 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3553480.0, ans=0.125 2023-11-26 20:16:13,735 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3553480.0, ans=0.035 2023-11-26 20:16:13,826 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3553480.0, ans=0.0 2023-11-26 20:16:30,174 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3553613.3333333335, ans=0.0 2023-11-26 20:16:36,490 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 533050 2023-11-26 20:16:41,006 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3553680.0, ans=0.2 2023-11-26 20:16:42,272 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 4000, loss[loss=0.08526, simple_loss=0.1116, pruned_loss=0.01985, audio_tagging_loss=0.009639, over 14765.00 frames. ], tot_loss[loss=0.06716, simple_loss=0.09115, pruned_loss=0.01261, audio_tagging_loss=0.008972, over 3043787.26 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:16:44,357 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.930e+01 8.850e+01 9.399e+01 1.031e+02 1.680e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-26 20:16:51,036 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3553680.0, ans=0.0 2023-11-26 20:17:04,228 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3553813.3333333335, ans=0.0 2023-11-26 20:17:05,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3553813.3333333335, ans=0.125 2023-11-26 20:17:13,223 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.90 vs. limit=15.0 2023-11-26 20:17:19,150 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3553880.0, ans=0.1 2023-11-26 20:17:31,457 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3553946.6666666665, ans=0.125 2023-11-26 20:17:33,297 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 533100 2023-11-26 20:17:33,761 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.93 vs. limit=15.0 2023-11-26 20:17:37,528 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 4050, loss[loss=0.07674, simple_loss=0.1182, pruned_loss=0.01217, audio_tagging_loss=0.005461, over 15170.00 frames. ], tot_loss[loss=0.06729, simple_loss=0.09162, pruned_loss=0.01255, audio_tagging_loss=0.008929, over 3037155.72 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:17:39,675 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 20:17:45,377 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3554013.3333333335, ans=0.125 2023-11-26 20:18:15,607 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.97 vs. limit=22.5 2023-11-26 20:18:22,995 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.56 vs. limit=12.0 2023-11-26 20:18:29,019 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 533150 2023-11-26 20:18:33,721 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 4100, loss[loss=0.06856, simple_loss=0.09132, pruned_loss=0.01592, audio_tagging_loss=0.006985, over 15227.00 frames. ], tot_loss[loss=0.06766, simple_loss=0.09253, pruned_loss=0.01255, audio_tagging_loss=0.008848, over 3045784.39 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:18:36,815 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.406e+01 8.812e+01 9.418e+01 1.019e+02 1.290e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-26 20:18:38,122 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3554346.6666666665, ans=0.125 2023-11-26 20:18:54,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3554413.3333333335, ans=0.125 2023-11-26 20:19:07,646 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3554546.6666666665, ans=0.1 2023-11-26 20:19:10,942 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.89 vs. limit=10.0 2023-11-26 20:19:19,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3554613.3333333335, ans=0.125 2023-11-26 20:19:24,909 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 533200 2023-11-26 20:19:29,892 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 4150, loss[loss=0.05383, simple_loss=0.06328, pruned_loss=0.01115, audio_tagging_loss=0.01104, over 14216.00 frames. ], tot_loss[loss=0.0672, simple_loss=0.09199, pruned_loss=0.01245, audio_tagging_loss=0.008758, over 3043110.08 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:19:37,929 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.05 vs. limit=22.5 2023-11-26 20:19:52,547 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3554813.3333333335, ans=0.2 2023-11-26 20:19:55,735 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3554813.3333333335, ans=0.1 2023-11-26 20:19:56,164 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.81 vs. limit=15.0 2023-11-26 20:19:57,804 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3554813.3333333335, ans=0.2 2023-11-26 20:20:07,868 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3554880.0, ans=0.0 2023-11-26 20:20:08,955 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3554880.0, ans=0.125 2023-11-26 20:20:09,853 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 20:20:16,839 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3554946.6666666665, ans=0.025 2023-11-26 20:20:21,935 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 533250 2023-11-26 20:20:26,196 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 4200, loss[loss=0.07289, simple_loss=0.102, pruned_loss=0.01084, audio_tagging_loss=0.01103, over 16075.00 frames. ], tot_loss[loss=0.06736, simple_loss=0.09225, pruned_loss=0.01262, audio_tagging_loss=0.00861, over 3041273.95 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:20:29,353 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.451e+01 8.868e+01 9.396e+01 9.993e+01 1.238e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-26 20:20:29,646 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3555013.3333333335, ans=0.0 2023-11-26 20:20:56,745 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3555146.6666666665, ans=0.0 2023-11-26 20:20:59,398 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3555213.3333333335, ans=0.0 2023-11-26 20:21:11,218 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3555280.0, ans=0.125 2023-11-26 20:21:12,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3555280.0, ans=0.125 2023-11-26 20:21:17,430 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 533300 2023-11-26 20:21:21,706 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 4250, loss[loss=0.07476, simple_loss=0.1045, pruned_loss=0.01419, audio_tagging_loss=0.008332, over 15323.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.09153, pruned_loss=0.01246, audio_tagging_loss=0.008585, over 3041246.29 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:21:28,889 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3555346.6666666665, ans=0.1 2023-11-26 20:21:34,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3555413.3333333335, ans=0.0 2023-11-26 20:21:50,825 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3555480.0, ans=0.125 2023-11-26 20:22:08,958 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff3.min_abs, batch_count=3555613.3333333335, ans=0.2 2023-11-26 20:22:09,028 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3555613.3333333335, ans=0.0 2023-11-26 20:22:13,660 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 533350 2023-11-26 20:22:17,924 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 4300, loss[loss=0.07022, simple_loss=0.09372, pruned_loss=0.01551, audio_tagging_loss=0.007848, over 15251.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.09128, pruned_loss=0.01237, audio_tagging_loss=0.008547, over 3040616.14 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:22:21,695 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.535e+01 9.104e+01 9.879e+01 1.029e+02 1.419e+02, threshold=1.976e+02, percent-clipped=0.0 2023-11-26 20:22:39,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3555813.3333333335, ans=0.125 2023-11-26 20:22:47,326 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3555813.3333333335, ans=0.125 2023-11-26 20:23:10,890 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 533400 2023-11-26 20:23:15,332 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 4350, loss[loss=0.03898, simple_loss=0.0486, pruned_loss=0.004303, audio_tagging_loss=0.01038, over 14587.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.091, pruned_loss=0.01238, audio_tagging_loss=0.008596, over 3042838.91 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:23:25,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3556080.0, ans=0.125 2023-11-26 20:23:34,769 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3556080.0, ans=0.5 2023-11-26 20:23:38,965 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3556146.6666666665, ans=0.125 2023-11-26 20:23:40,125 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3556146.6666666665, ans=0.07 2023-11-26 20:23:55,232 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.14 vs. limit=15.0 2023-11-26 20:23:56,166 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.91 vs. limit=15.0 2023-11-26 20:24:00,038 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3556280.0, ans=0.2 2023-11-26 20:24:04,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3556280.0, ans=10.0 2023-11-26 20:24:06,178 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 533450 2023-11-26 20:24:10,362 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 4400, loss[loss=0.07031, simple_loss=0.103, pruned_loss=0.01195, audio_tagging_loss=0.006859, over 15936.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.09117, pruned_loss=0.01242, audio_tagging_loss=0.008521, over 3038214.49 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:24:13,576 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.889e+01 8.838e+01 9.451e+01 1.042e+02 1.230e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-26 20:24:22,365 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3556413.3333333335, ans=0.1 2023-11-26 20:24:58,743 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3556613.3333333335, ans=0.1 2023-11-26 20:25:01,846 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 533500 2023-11-26 20:25:06,098 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 4450, loss[loss=0.06858, simple_loss=0.09927, pruned_loss=0.01242, audio_tagging_loss=0.006523, over 15323.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.09093, pruned_loss=0.01229, audio_tagging_loss=0.008479, over 3041391.92 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:25:17,473 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3556746.6666666665, ans=0.0 2023-11-26 20:25:20,919 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3556746.6666666665, ans=0.125 2023-11-26 20:25:27,202 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3556746.6666666665, ans=0.0 2023-11-26 20:25:51,617 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3556946.6666666665, ans=0.125 2023-11-26 20:25:53,082 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=3556946.6666666665, ans=0.1 2023-11-26 20:25:58,342 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 533550 2023-11-26 20:26:02,377 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 4500, loss[loss=0.07306, simple_loss=0.1014, pruned_loss=0.01557, audio_tagging_loss=0.006821, over 15766.00 frames. ], tot_loss[loss=0.066, simple_loss=0.09081, pruned_loss=0.01221, audio_tagging_loss=0.008386, over 3039976.30 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:26:04,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3557013.3333333335, ans=0.1 2023-11-26 20:26:05,648 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.109e+01 9.005e+01 9.519e+01 1.049e+02 1.463e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-26 20:26:43,080 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.81 vs. limit=15.0 2023-11-26 20:26:53,405 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 533600 2023-11-26 20:26:57,879 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 4550, loss[loss=0.06654, simple_loss=0.09577, pruned_loss=0.0119, audio_tagging_loss=0.00675, over 15681.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.09072, pruned_loss=0.01203, audio_tagging_loss=0.008428, over 3035300.69 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:27:01,599 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.63 vs. limit=10.0 2023-11-26 20:27:06,583 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3557346.6666666665, ans=0.0 2023-11-26 20:27:33,256 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3557546.6666666665, ans=0.09899494936611666 2023-11-26 20:27:40,700 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 20:27:49,232 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 533650 2023-11-26 20:27:51,492 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3557613.3333333335, ans=0.0 2023-11-26 20:27:53,365 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 4600, loss[loss=0.06986, simple_loss=0.0973, pruned_loss=0.0125, audio_tagging_loss=0.008716, over 15012.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.0896, pruned_loss=0.01191, audio_tagging_loss=0.008534, over 3040232.37 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:27:56,455 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.271e+01 8.920e+01 9.626e+01 1.020e+02 1.318e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-26 20:28:11,541 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3557746.6666666665, ans=0.125 2023-11-26 20:28:14,140 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.02 vs. limit=6.0 2023-11-26 20:28:15,608 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3557813.3333333335, ans=0.125 2023-11-26 20:28:32,948 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3557880.0, ans=0.125 2023-11-26 20:28:45,466 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 533700 2023-11-26 20:28:50,178 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 4650, loss[loss=0.06972, simple_loss=0.09017, pruned_loss=0.01639, audio_tagging_loss=0.008241, over 14892.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08943, pruned_loss=0.01207, audio_tagging_loss=0.008673, over 3041792.71 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:28:58,513 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 20:29:29,986 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3558213.3333333335, ans=0.125 2023-11-26 20:29:42,057 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 533750 2023-11-26 20:29:42,479 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.96 vs. limit=15.0 2023-11-26 20:29:46,214 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 4700, loss[loss=0.07702, simple_loss=0.1047, pruned_loss=0.01579, audio_tagging_loss=0.008861, over 15476.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08899, pruned_loss=0.01199, audio_tagging_loss=0.008771, over 3041119.34 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:29:50,420 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.173e+01 8.876e+01 9.435e+01 1.008e+02 1.247e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-26 20:29:52,730 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3558346.6666666665, ans=0.07 2023-11-26 20:30:02,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3558413.3333333335, ans=0.125 2023-11-26 20:30:04,325 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-26 20:30:04,649 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.66 vs. limit=15.0 2023-11-26 20:30:15,388 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3558480.0, ans=0.125 2023-11-26 20:30:33,507 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.55 vs. limit=12.0 2023-11-26 20:30:37,103 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 533800 2023-11-26 20:30:41,606 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 4750, loss[loss=0.05445, simple_loss=0.0631, pruned_loss=0.009527, audio_tagging_loss=0.01338, over 16459.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08861, pruned_loss=0.01202, audio_tagging_loss=0.008922, over 3038557.52 frames. ], batch size: 63, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:30:47,244 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3558680.0, ans=0.2 2023-11-26 20:30:48,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3558680.0, ans=0.2 2023-11-26 20:30:53,511 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3558746.6666666665, ans=0.125 2023-11-26 20:31:24,105 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3558880.0, ans=0.0 2023-11-26 20:31:33,060 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 533850 2023-11-26 20:31:33,464 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.23 vs. limit=15.0 2023-11-26 20:31:38,355 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 4800, loss[loss=0.06758, simple_loss=0.09248, pruned_loss=0.01127, audio_tagging_loss=0.01007, over 16656.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.08935, pruned_loss=0.0122, audio_tagging_loss=0.008922, over 3044684.67 frames. ], batch size: 62, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:31:38,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3559013.3333333335, ans=0.0 2023-11-26 20:31:42,641 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.975e+01 9.037e+01 9.476e+01 1.008e+02 1.757e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-26 20:31:44,303 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.26 vs. limit=10.0 2023-11-26 20:31:47,720 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3559013.3333333335, ans=0.125 2023-11-26 20:31:47,852 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3559013.3333333335, ans=0.0 2023-11-26 20:31:55,180 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3559080.0, ans=0.95 2023-11-26 20:32:04,136 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.25 vs. limit=15.0 2023-11-26 20:32:05,175 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.87 vs. limit=12.0 2023-11-26 20:32:28,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3559280.0, ans=0.125 2023-11-26 20:32:29,575 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 533900 2023-11-26 20:32:31,254 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3559280.0, ans=0.0 2023-11-26 20:32:34,182 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 4850, loss[loss=0.05044, simple_loss=0.06805, pruned_loss=0.006017, audio_tagging_loss=0.0104, over 15062.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.08985, pruned_loss=0.0123, audio_tagging_loss=0.008941, over 3040601.40 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:32:47,726 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.70 vs. limit=15.0 2023-11-26 20:32:53,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3559413.3333333335, ans=0.0 2023-11-26 20:33:07,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3559546.6666666665, ans=0.125 2023-11-26 20:33:11,342 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.586e-03 2023-11-26 20:33:18,244 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3559613.3333333335, ans=0.2 2023-11-26 20:33:25,461 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 533950 2023-11-26 20:33:26,113 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.56 vs. limit=15.0 2023-11-26 20:33:29,634 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 4900, loss[loss=0.06252, simple_loss=0.0876, pruned_loss=0.01091, audio_tagging_loss=0.007816, over 14736.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.089, pruned_loss=0.0122, audio_tagging_loss=0.009012, over 3036259.60 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:33:31,982 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3559680.0, ans=0.0 2023-11-26 20:33:32,842 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3559680.0, ans=0.125 2023-11-26 20:33:34,854 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.612e+01 8.987e+01 9.681e+01 1.025e+02 1.624e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-26 20:33:41,927 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3559746.6666666665, ans=0.125 2023-11-26 20:33:58,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3559813.3333333335, ans=0.125 2023-11-26 20:34:20,407 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 534000 2023-11-26 20:34:25,487 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 4950, loss[loss=0.06472, simple_loss=0.09036, pruned_loss=0.01275, audio_tagging_loss=0.006798, over 14850.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.08955, pruned_loss=0.01245, audio_tagging_loss=0.008822, over 3031164.53 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:34:27,229 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3560013.3333333335, ans=0.0 2023-11-26 20:34:44,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3560080.0, ans=0.125 2023-11-26 20:34:47,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3560146.6666666665, ans=0.1 2023-11-26 20:34:48,990 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3560146.6666666665, ans=0.125 2023-11-26 20:35:05,404 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3560213.3333333335, ans=0.1 2023-11-26 20:35:10,022 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3560280.0, ans=0.125 2023-11-26 20:35:14,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3560280.0, ans=0.125 2023-11-26 20:35:16,316 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 534050 2023-11-26 20:35:18,549 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3560280.0, ans=0.125 2023-11-26 20:35:20,443 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 5000, loss[loss=0.06519, simple_loss=0.08957, pruned_loss=0.01171, audio_tagging_loss=0.008692, over 15719.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.09014, pruned_loss=0.01251, audio_tagging_loss=0.008664, over 3037025.88 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:35:23,350 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.56 vs. limit=15.0 2023-11-26 20:35:26,188 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.628e+01 9.102e+01 9.598e+01 1.044e+02 1.473e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-26 20:35:38,383 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.48 vs. limit=12.0 2023-11-26 20:35:43,363 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3560480.0, ans=0.125 2023-11-26 20:35:43,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3560480.0, ans=0.0 2023-11-26 20:36:04,803 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.25 vs. limit=15.0 2023-11-26 20:36:12,092 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 534100 2023-11-26 20:36:16,213 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 5050, loss[loss=0.05915, simple_loss=0.09013, pruned_loss=0.007062, audio_tagging_loss=0.007021, over 14913.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08906, pruned_loss=0.01231, audio_tagging_loss=0.008639, over 3045562.99 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:36:39,625 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.19 vs. limit=10.0 2023-11-26 20:36:46,761 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3560813.3333333335, ans=0.1 2023-11-26 20:36:55,889 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3560880.0, ans=0.0 2023-11-26 20:37:07,430 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 534150 2023-11-26 20:37:08,029 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.33 vs. limit=15.0 2023-11-26 20:37:12,197 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 5100, loss[loss=0.07365, simple_loss=0.1027, pruned_loss=0.0156, audio_tagging_loss=0.006672, over 15713.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08987, pruned_loss=0.0123, audio_tagging_loss=0.008614, over 3046142.25 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:37:18,599 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.475e+01 8.922e+01 9.558e+01 1.035e+02 1.358e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-26 20:37:23,181 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3561080.0, ans=0.2 2023-11-26 20:37:25,299 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 20:37:32,754 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3561080.0, ans=0.125 2023-11-26 20:37:37,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3561146.6666666665, ans=0.125 2023-11-26 20:37:38,777 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3561146.6666666665, ans=0.2 2023-11-26 20:38:00,690 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.80 vs. limit=15.0 2023-11-26 20:38:04,570 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 534200 2023-11-26 20:38:04,639 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3561280.0, ans=0.125 2023-11-26 20:38:09,083 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 5150, loss[loss=0.06015, simple_loss=0.07703, pruned_loss=0.01164, audio_tagging_loss=0.009998, over 15082.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08914, pruned_loss=0.01204, audio_tagging_loss=0.008583, over 3039504.05 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:38:16,212 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.69 vs. limit=15.0 2023-11-26 20:38:16,733 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3561346.6666666665, ans=0.1 2023-11-26 20:38:18,796 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3561413.3333333335, ans=0.1 2023-11-26 20:38:44,964 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.95 vs. limit=15.0 2023-11-26 20:38:54,208 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3561613.3333333335, ans=0.2 2023-11-26 20:38:58,424 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3561613.3333333335, ans=0.0 2023-11-26 20:39:00,367 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 534250 2023-11-26 20:39:02,532 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3561613.3333333335, ans=0.125 2023-11-26 20:39:05,067 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 5200, loss[loss=0.06111, simple_loss=0.07706, pruned_loss=0.01202, audio_tagging_loss=0.01056, over 15181.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.09076, pruned_loss=0.01245, audio_tagging_loss=0.008527, over 3037630.67 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:39:10,322 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.315e+01 8.817e+01 9.257e+01 9.950e+01 1.216e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-26 20:39:11,636 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3561680.0, ans=0.0 2023-11-26 20:39:11,697 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3561680.0, ans=0.0 2023-11-26 20:39:20,731 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3561746.6666666665, ans=0.2 2023-11-26 20:39:40,352 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.08 vs. limit=10.0 2023-11-26 20:39:45,357 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3561880.0, ans=0.0 2023-11-26 20:39:56,219 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 534300 2023-11-26 20:40:00,344 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 5250, loss[loss=0.07023, simple_loss=0.09299, pruned_loss=0.01655, audio_tagging_loss=0.007182, over 17104.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.09078, pruned_loss=0.01249, audio_tagging_loss=0.008468, over 3038187.54 frames. ], batch size: 62, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:40:00,641 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3562013.3333333335, ans=0.1 2023-11-26 20:40:02,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3562013.3333333335, ans=0.95 2023-11-26 20:40:08,602 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3562013.3333333335, ans=0.1 2023-11-26 20:40:15,977 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 20:40:18,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3562080.0, ans=0.1 2023-11-26 20:40:30,524 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.97 vs. limit=15.0 2023-11-26 20:40:53,572 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 534350 2023-11-26 20:40:57,737 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 5300, loss[loss=0.05847, simple_loss=0.07532, pruned_loss=0.01112, audio_tagging_loss=0.009692, over 15546.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.09063, pruned_loss=0.01241, audio_tagging_loss=0.008423, over 3034963.70 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:40:58,361 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.31 vs. limit=15.0 2023-11-26 20:41:01,091 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3562346.6666666665, ans=0.2 2023-11-26 20:41:02,969 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.709e+01 8.749e+01 9.362e+01 1.021e+02 1.179e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-26 20:41:11,062 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.88 vs. limit=22.5 2023-11-26 20:41:18,722 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 20:41:33,026 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.12 vs. limit=15.0 2023-11-26 20:41:37,409 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3562546.6666666665, ans=0.2 2023-11-26 20:41:38,661 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.93 vs. limit=6.0 2023-11-26 20:41:41,464 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3562613.3333333335, ans=0.125 2023-11-26 20:41:45,797 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3562613.3333333335, ans=0.125 2023-11-26 20:41:48,765 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 534400 2023-11-26 20:41:53,341 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 5350, loss[loss=0.07396, simple_loss=0.1029, pruned_loss=0.01329, audio_tagging_loss=0.009245, over 14882.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.09087, pruned_loss=0.01259, audio_tagging_loss=0.008444, over 3035535.43 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:42:01,773 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.09 vs. limit=15.0 2023-11-26 20:42:02,760 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 20:42:35,149 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3562880.0, ans=0.0 2023-11-26 20:42:45,093 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 534450 2023-11-26 20:42:48,753 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.64 vs. limit=15.0 2023-11-26 20:42:49,304 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 5400, loss[loss=0.0673, simple_loss=0.09216, pruned_loss=0.01179, audio_tagging_loss=0.009429, over 16047.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.09122, pruned_loss=0.0126, audio_tagging_loss=0.008481, over 3041309.69 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:42:50,482 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3563013.3333333335, ans=0.125 2023-11-26 20:42:51,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3563013.3333333335, ans=0.1 2023-11-26 20:42:56,045 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.432e+01 8.834e+01 9.520e+01 1.043e+02 1.175e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-26 20:43:01,653 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3563080.0, ans=0.125 2023-11-26 20:43:04,831 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 20:43:13,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3563146.6666666665, ans=0.2 2023-11-26 20:43:24,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3563213.3333333335, ans=0.125 2023-11-26 20:43:41,988 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 534500 2023-11-26 20:43:46,137 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 5450, loss[loss=0.07096, simple_loss=0.08782, pruned_loss=0.01755, audio_tagging_loss=0.009503, over 13929.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.09106, pruned_loss=0.01267, audio_tagging_loss=0.008503, over 3038765.05 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:44:03,300 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3563413.3333333335, ans=0.1 2023-11-26 20:44:05,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3563413.3333333335, ans=0.125 2023-11-26 20:44:37,377 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 534550 2023-11-26 20:44:41,538 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 5500, loss[loss=0.06605, simple_loss=0.07763, pruned_loss=0.0152, audio_tagging_loss=0.01204, over 16159.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.09089, pruned_loss=0.01256, audio_tagging_loss=0.008604, over 3045930.58 frames. ], batch size: 63, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:44:46,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3563680.0, ans=0.125 2023-11-26 20:44:47,928 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.824e+01 9.118e+01 9.897e+01 1.074e+02 1.555e+02, threshold=1.979e+02, percent-clipped=0.0 2023-11-26 20:44:51,323 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3563746.6666666665, ans=0.0 2023-11-26 20:45:32,790 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 534600 2023-11-26 20:45:37,300 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 5550, loss[loss=0.06147, simple_loss=0.08531, pruned_loss=0.00957, audio_tagging_loss=0.009246, over 15623.00 frames. ], tot_loss[loss=0.0668, simple_loss=0.09122, pruned_loss=0.01255, audio_tagging_loss=0.008645, over 3046505.99 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:45:50,496 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=3564080.0, ans=15.0 2023-11-26 20:45:53,167 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3564080.0, ans=0.0 2023-11-26 20:46:00,771 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3564146.6666666665, ans=0.2 2023-11-26 20:46:04,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3564146.6666666665, ans=0.125 2023-11-26 20:46:11,215 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3564213.3333333335, ans=0.5 2023-11-26 20:46:22,639 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3564280.0, ans=0.09899494936611666 2023-11-26 20:46:24,330 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.74 vs. limit=6.0 2023-11-26 20:46:29,938 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 534650 2023-11-26 20:46:32,640 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.42 vs. limit=15.0 2023-11-26 20:46:34,599 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 5600, loss[loss=0.06546, simple_loss=0.0817, pruned_loss=0.0155, audio_tagging_loss=0.009112, over 15194.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.09048, pruned_loss=0.01219, audio_tagging_loss=0.008826, over 3050221.86 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:46:37,320 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.33 vs. limit=15.0 2023-11-26 20:46:40,928 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.218e+01 8.848e+01 9.516e+01 1.047e+02 1.275e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-26 20:46:48,754 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3564413.3333333335, ans=0.125 2023-11-26 20:47:00,683 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.46 vs. limit=15.0 2023-11-26 20:47:06,103 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3564546.6666666665, ans=0.2 2023-11-26 20:47:13,760 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.67 vs. limit=15.0 2023-11-26 20:47:14,943 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 20:47:20,587 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.34 vs. limit=15.0 2023-11-26 20:47:25,531 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 534700 2023-11-26 20:47:29,706 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 5650, loss[loss=0.08166, simple_loss=0.1038, pruned_loss=0.01932, audio_tagging_loss=0.01046, over 16056.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.09058, pruned_loss=0.01227, audio_tagging_loss=0.008894, over 3052413.79 frames. ], batch size: 61, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:47:37,355 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3564680.0, ans=0.125 2023-11-26 20:47:39,511 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3564746.6666666665, ans=0.125 2023-11-26 20:47:44,954 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3564746.6666666665, ans=0.125 2023-11-26 20:48:21,012 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 534750 2023-11-26 20:48:23,282 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3564946.6666666665, ans=0.125 2023-11-26 20:48:25,209 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 5700, loss[loss=0.07594, simple_loss=0.1029, pruned_loss=0.01404, audio_tagging_loss=0.01046, over 15941.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.09046, pruned_loss=0.01222, audio_tagging_loss=0.008885, over 3054674.09 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:48:29,592 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3565013.3333333335, ans=0.09899494936611666 2023-11-26 20:48:33,021 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.188e+01 8.707e+01 9.299e+01 1.009e+02 1.151e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-26 20:48:34,968 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3565013.3333333335, ans=0.2 2023-11-26 20:48:41,282 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3565080.0, ans=0.1 2023-11-26 20:48:45,537 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3565080.0, ans=0.125 2023-11-26 20:48:53,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3565146.6666666665, ans=0.2 2023-11-26 20:48:57,163 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3565146.6666666665, ans=0.0 2023-11-26 20:49:09,128 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3565280.0, ans=0.0 2023-11-26 20:49:12,255 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3565280.0, ans=0.0 2023-11-26 20:49:16,927 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 534800 2023-11-26 20:49:21,612 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.11 vs. limit=6.0 2023-11-26 20:49:21,916 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 5750, loss[loss=0.06548, simple_loss=0.09324, pruned_loss=0.0128, audio_tagging_loss=0.00606, over 14767.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.08972, pruned_loss=0.01221, audio_tagging_loss=0.008742, over 3054964.72 frames. ], batch size: 53, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:49:23,128 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3565346.6666666665, ans=0.0 2023-11-26 20:49:23,662 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.36 vs. limit=12.0 2023-11-26 20:49:35,832 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3565413.3333333335, ans=0.125 2023-11-26 20:50:00,367 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3565546.6666666665, ans=0.05 2023-11-26 20:50:01,464 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3565546.6666666665, ans=0.1 2023-11-26 20:50:07,466 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3565613.3333333335, ans=0.125 2023-11-26 20:50:12,609 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 534850 2023-11-26 20:50:16,757 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 5800, loss[loss=0.06612, simple_loss=0.0972, pruned_loss=0.01108, audio_tagging_loss=0.006443, over 15039.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.0903, pruned_loss=0.0124, audio_tagging_loss=0.008651, over 3059393.24 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:50:16,911 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3565680.0, ans=0.95 2023-11-26 20:50:18,560 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.84 vs. limit=12.0 2023-11-26 20:50:24,136 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.524e+01 8.906e+01 9.529e+01 1.040e+02 1.512e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-26 20:50:48,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3565813.3333333335, ans=0.0 2023-11-26 20:50:49,227 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3565880.0, ans=0.0 2023-11-26 20:50:56,634 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3565880.0, ans=0.125 2023-11-26 20:51:07,138 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 534900 2023-11-26 20:51:11,320 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 5850, loss[loss=0.07286, simple_loss=0.0988, pruned_loss=0.01291, audio_tagging_loss=0.01055, over 15201.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.09005, pruned_loss=0.01237, audio_tagging_loss=0.008678, over 3047177.21 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:51:19,975 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3566013.3333333335, ans=0.125 2023-11-26 20:51:23,756 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3566080.0, ans=0.125 2023-11-26 20:51:25,067 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.61 vs. limit=15.0 2023-11-26 20:51:32,697 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3566146.6666666665, ans=0.125 2023-11-26 20:51:37,062 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3566146.6666666665, ans=0.0 2023-11-26 20:51:45,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3566213.3333333335, ans=0.1 2023-11-26 20:51:52,210 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.97 vs. limit=15.0 2023-11-26 20:51:58,752 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.12 vs. limit=12.0 2023-11-26 20:52:01,271 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 534950 2023-11-26 20:52:05,960 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 5900, loss[loss=0.05773, simple_loss=0.08427, pruned_loss=0.008197, audio_tagging_loss=0.007402, over 15556.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.09005, pruned_loss=0.01229, audio_tagging_loss=0.008591, over 3050801.55 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:52:14,441 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.237e+01 8.767e+01 9.381e+01 1.012e+02 1.422e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-26 20:52:21,593 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.38 vs. limit=15.0 2023-11-26 20:52:42,633 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3566546.6666666665, ans=0.125 2023-11-26 20:52:53,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3566613.3333333335, ans=0.125 2023-11-26 20:52:57,786 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 535000 2023-11-26 20:53:02,261 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 5950, loss[loss=0.08169, simple_loss=0.1056, pruned_loss=0.01692, audio_tagging_loss=0.01196, over 16231.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08949, pruned_loss=0.01234, audio_tagging_loss=0.008668, over 3051705.47 frames. ], batch size: 61, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 20:53:08,997 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_na.min_abs, batch_count=3566680.0, ans=0.02 2023-11-26 20:53:12,048 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3566746.6666666665, ans=0.125 2023-11-26 20:53:12,115 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 20:53:28,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3566813.3333333335, ans=0.125 2023-11-26 20:53:34,584 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3566880.0, ans=0.2 2023-11-26 20:53:35,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3566880.0, ans=0.125 2023-11-26 20:53:53,255 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 535050 2023-11-26 20:53:57,414 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 6000, loss[loss=0.05068, simple_loss=0.06639, pruned_loss=0.005951, audio_tagging_loss=0.01154, over 15543.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08837, pruned_loss=0.01225, audio_tagging_loss=0.008779, over 3049227.25 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:53:57,415 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-26 20:54:09,090 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.4833, 2.0617, 3.1611, 3.2190, 2.9987, 3.1409, 2.9213, 3.1741], device='cuda:3') 2023-11-26 20:54:12,174 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.5859, 2.5570, 3.4140, 2.6967], device='cuda:3') 2023-11-26 20:54:29,558 INFO [train_asr.py:1267] (3/4) Epoch 45, validation: loss=0.05766, simple_loss=0.05058, pruned_loss=0.005348, audio_tagging_loss=0.02702, over 4681554.00 frames. 2023-11-26 20:54:29,559 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-26 20:54:37,350 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.342e+01 8.765e+01 9.407e+01 1.018e+02 1.240e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-26 20:54:40,830 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3567080.0, ans=0.0 2023-11-26 20:54:54,860 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.34 vs. limit=15.0 2023-11-26 20:55:05,548 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.80 vs. limit=15.0 2023-11-26 20:55:09,187 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 20:55:20,911 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 535100 2023-11-26 20:55:25,136 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 6050, loss[loss=0.07095, simple_loss=0.09754, pruned_loss=0.01477, audio_tagging_loss=0.007413, over 13749.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08901, pruned_loss=0.01219, audio_tagging_loss=0.008702, over 3045799.46 frames. ], batch size: 52, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:55:39,316 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3567413.3333333335, ans=0.0 2023-11-26 20:55:45,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3567480.0, ans=0.0 2023-11-26 20:55:50,426 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3567480.0, ans=0.125 2023-11-26 20:55:51,053 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3567480.0, ans=0.125 2023-11-26 20:56:02,861 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3567546.6666666665, ans=0.125 2023-11-26 20:56:16,588 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 535150 2023-11-26 20:56:20,755 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 6100, loss[loss=0.07532, simple_loss=0.1062, pruned_loss=0.01588, audio_tagging_loss=0.006328, over 15003.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08906, pruned_loss=0.01218, audio_tagging_loss=0.008642, over 3048123.77 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:56:28,139 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.197e+01 8.965e+01 9.690e+01 1.035e+02 1.368e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-26 20:56:41,840 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.20 vs. limit=15.0 2023-11-26 20:56:50,399 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3567813.3333333335, ans=0.0 2023-11-26 20:56:53,692 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3567880.0, ans=0.125 2023-11-26 20:57:11,504 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 535200 2023-11-26 20:57:17,034 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 6150, loss[loss=0.04635, simple_loss=0.06392, pruned_loss=0.005032, audio_tagging_loss=0.009361, over 15609.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08954, pruned_loss=0.01225, audio_tagging_loss=0.008611, over 3053514.13 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:57:23,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3568013.3333333335, ans=0.0 2023-11-26 20:57:30,018 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3568080.0, ans=0.04949747468305833 2023-11-26 20:57:39,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3568146.6666666665, ans=0.125 2023-11-26 20:57:44,092 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.76 vs. limit=15.0 2023-11-26 20:57:51,508 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.10 vs. limit=15.0 2023-11-26 20:58:08,726 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 535250 2023-11-26 20:58:08,946 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3568280.0, ans=0.125 2023-11-26 20:58:13,469 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 6200, loss[loss=0.06604, simple_loss=0.09348, pruned_loss=0.01138, audio_tagging_loss=0.007923, over 13869.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08901, pruned_loss=0.01213, audio_tagging_loss=0.008645, over 3050344.31 frames. ], batch size: 53, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:58:20,988 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.437e+01 8.899e+01 9.421e+01 1.012e+02 1.333e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-26 20:58:22,352 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3568346.6666666665, ans=0.125 2023-11-26 20:58:25,686 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.63 vs. limit=15.0 2023-11-26 20:58:30,911 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3568413.3333333335, ans=0.125 2023-11-26 20:58:32,885 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3568413.3333333335, ans=10.0 2023-11-26 20:58:33,001 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3568413.3333333335, ans=0.07 2023-11-26 20:59:03,994 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 535300 2023-11-26 20:59:04,097 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3568613.3333333335, ans=0.1 2023-11-26 20:59:08,232 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 6250, loss[loss=0.06955, simple_loss=0.1001, pruned_loss=0.01149, audio_tagging_loss=0.008008, over 14596.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08863, pruned_loss=0.0119, audio_tagging_loss=0.008746, over 3048515.89 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 20:59:18,831 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3568746.6666666665, ans=0.0 2023-11-26 20:59:22,552 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3568746.6666666665, ans=0.0 2023-11-26 20:59:34,780 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3568813.3333333335, ans=0.2 2023-11-26 20:59:36,756 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3568813.3333333335, ans=0.125 2023-11-26 20:59:36,858 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3568813.3333333335, ans=0.125 2023-11-26 20:59:40,916 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.59 vs. limit=12.0 2023-11-26 20:59:56,398 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3568946.6666666665, ans=0.125 2023-11-26 20:59:58,307 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 535350 2023-11-26 21:00:02,485 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 6300, loss[loss=0.06029, simple_loss=0.07392, pruned_loss=0.0111, audio_tagging_loss=0.01223, over 14123.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08843, pruned_loss=0.01189, audio_tagging_loss=0.008918, over 3044159.20 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:00:03,648 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3569013.3333333335, ans=0.125 2023-11-26 21:00:12,055 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.533e+01 8.840e+01 9.586e+01 1.026e+02 1.198e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-26 21:00:13,656 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.47 vs. limit=10.0 2023-11-26 21:00:19,595 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=3569080.0, ans=10.0 2023-11-26 21:00:20,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3569080.0, ans=0.2 2023-11-26 21:00:29,747 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 21:00:54,198 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 535400 2023-11-26 21:00:58,579 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 6350, loss[loss=0.05829, simple_loss=0.08475, pruned_loss=0.007658, audio_tagging_loss=0.008261, over 15197.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08856, pruned_loss=0.01195, audio_tagging_loss=0.00895, over 3042037.98 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:01:08,879 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3569413.3333333335, ans=0.0 2023-11-26 21:01:17,671 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.45 vs. limit=15.0 2023-11-26 21:01:30,025 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.13 vs. limit=15.0 2023-11-26 21:01:32,415 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3569546.6666666665, ans=0.0 2023-11-26 21:01:37,704 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 21:01:49,236 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 535450 2023-11-26 21:01:53,954 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 6400, loss[loss=0.0612, simple_loss=0.08364, pruned_loss=0.01144, audio_tagging_loss=0.007942, over 14598.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.08845, pruned_loss=0.01189, audio_tagging_loss=0.008958, over 3040365.54 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:02:02,600 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.564e+01 8.580e+01 9.385e+01 1.005e+02 1.222e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-26 21:02:27,208 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3569880.0, ans=0.125 2023-11-26 21:02:39,566 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 21:02:40,585 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3569946.6666666665, ans=0.125 2023-11-26 21:02:44,673 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 535500 2023-11-26 21:02:48,860 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 6450, loss[loss=0.06398, simple_loss=0.09463, pruned_loss=0.008666, audio_tagging_loss=0.008002, over 16016.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.08857, pruned_loss=0.01177, audio_tagging_loss=0.008983, over 3041693.76 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:02:53,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3570013.3333333335, ans=0.125 2023-11-26 21:02:56,627 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3570013.3333333335, ans=0.1 2023-11-26 21:03:02,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3570080.0, ans=0.0 2023-11-26 21:03:04,173 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3570080.0, ans=0.125 2023-11-26 21:03:07,263 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3570080.0, ans=0.0 2023-11-26 21:03:11,822 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.48 vs. limit=15.0 2023-11-26 21:03:13,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3570146.6666666665, ans=0.125 2023-11-26 21:03:36,118 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3570280.0, ans=0.0 2023-11-26 21:03:40,696 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 535550 2023-11-26 21:03:44,898 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 6500, loss[loss=0.06781, simple_loss=0.08467, pruned_loss=0.01312, audio_tagging_loss=0.01235, over 16711.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08893, pruned_loss=0.01197, audio_tagging_loss=0.009013, over 3054040.39 frames. ], batch size: 63, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:03:53,423 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.873e+01 8.670e+01 9.516e+01 1.047e+02 1.238e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-26 21:03:57,816 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3570413.3333333335, ans=0.0 2023-11-26 21:04:03,726 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3570413.3333333335, ans=0.125 2023-11-26 21:04:07,018 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3570480.0, ans=0.2 2023-11-26 21:04:11,070 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3570480.0, ans=0.0 2023-11-26 21:04:33,448 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3570613.3333333335, ans=0.2 2023-11-26 21:04:34,478 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3570613.3333333335, ans=0.125 2023-11-26 21:04:35,325 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 535600 2023-11-26 21:04:37,837 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3570613.3333333335, ans=0.1 2023-11-26 21:04:39,830 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 6550, loss[loss=0.07899, simple_loss=0.1168, pruned_loss=0.01538, audio_tagging_loss=0.005224, over 16190.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08952, pruned_loss=0.01219, audio_tagging_loss=0.008744, over 3055108.63 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:04:40,003 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3570680.0, ans=0.0 2023-11-26 21:04:51,342 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3570746.6666666665, ans=0.125 2023-11-26 21:04:54,356 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3570746.6666666665, ans=0.0 2023-11-26 21:05:31,087 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 535650 2023-11-26 21:05:31,321 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3570946.6666666665, ans=0.04949747468305833 2023-11-26 21:05:35,389 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 6600, loss[loss=0.06574, simple_loss=0.08485, pruned_loss=0.01545, audio_tagging_loss=0.007858, over 15150.00 frames. ], tot_loss[loss=0.06488, simple_loss=0.08861, pruned_loss=0.01187, audio_tagging_loss=0.008707, over 3051382.21 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:05:44,994 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.643e+01 8.935e+01 9.455e+01 1.019e+02 1.266e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-26 21:05:59,294 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.59 vs. limit=22.5 2023-11-26 21:06:02,178 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3571146.6666666665, ans=0.2 2023-11-26 21:06:15,445 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.54 vs. limit=15.0 2023-11-26 21:06:26,887 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 535700 2023-11-26 21:06:27,123 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3571280.0, ans=0.2 2023-11-26 21:06:30,989 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 6650, loss[loss=0.06297, simple_loss=0.09621, pruned_loss=0.009621, audio_tagging_loss=0.005244, over 15740.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08893, pruned_loss=0.01206, audio_tagging_loss=0.008626, over 3057944.62 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:06:33,685 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=3571346.6666666665, ans=22.5 2023-11-26 21:06:58,227 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3571480.0, ans=0.125 2023-11-26 21:07:09,502 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3571546.6666666665, ans=0.125 2023-11-26 21:07:21,148 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 535750 2023-11-26 21:07:25,285 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 6700, loss[loss=0.0656, simple_loss=0.09198, pruned_loss=0.01134, audio_tagging_loss=0.008273, over 15975.00 frames. ], tot_loss[loss=0.06493, simple_loss=0.08857, pruned_loss=0.01204, audio_tagging_loss=0.008611, over 3046898.45 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:07:27,558 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3571680.0, ans=0.0 2023-11-26 21:07:30,698 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3571680.0, ans=0.125 2023-11-26 21:07:34,794 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.049e+01 8.689e+01 9.559e+01 1.023e+02 3.616e+02, threshold=1.912e+02, percent-clipped=1.0 2023-11-26 21:07:46,221 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3571813.3333333335, ans=0.125 2023-11-26 21:07:49,387 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.74 vs. limit=22.5 2023-11-26 21:08:03,040 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3571880.0, ans=0.125 2023-11-26 21:08:12,809 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3571946.6666666665, ans=0.2 2023-11-26 21:08:15,807 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 535800 2023-11-26 21:08:20,252 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 6750, loss[loss=0.07639, simple_loss=0.1019, pruned_loss=0.01714, audio_tagging_loss=0.0083, over 15844.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08862, pruned_loss=0.01212, audio_tagging_loss=0.008631, over 3051208.16 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:08:27,810 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.41 vs. limit=15.0 2023-11-26 21:08:42,320 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3572146.6666666665, ans=0.0 2023-11-26 21:08:44,792 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.23 vs. limit=15.0 2023-11-26 21:09:11,891 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 535850 2023-11-26 21:09:16,567 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 6800, loss[loss=0.06842, simple_loss=0.09113, pruned_loss=0.01534, audio_tagging_loss=0.007525, over 15221.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.08888, pruned_loss=0.01206, audio_tagging_loss=0.0086, over 3056382.65 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:09:26,075 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.921e+01 8.870e+01 9.420e+01 1.023e+02 1.274e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-26 21:09:30,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3572413.3333333335, ans=0.0 2023-11-26 21:09:38,000 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3572480.0, ans=0.125 2023-11-26 21:09:50,295 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3572546.6666666665, ans=0.2 2023-11-26 21:09:59,781 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3572613.3333333335, ans=0.1 2023-11-26 21:10:07,134 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 535900 2023-11-26 21:10:07,276 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3572613.3333333335, ans=0.0 2023-11-26 21:10:09,393 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3572613.3333333335, ans=0.5 2023-11-26 21:10:11,383 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 6850, loss[loss=0.08351, simple_loss=0.1109, pruned_loss=0.0169, audio_tagging_loss=0.01116, over 14526.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08979, pruned_loss=0.01229, audio_tagging_loss=0.008544, over 3053776.10 frames. ], batch size: 53, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:10:16,024 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3572680.0, ans=0.2 2023-11-26 21:10:24,599 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3572746.6666666665, ans=0.2 2023-11-26 21:10:27,717 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3572746.6666666665, ans=0.025 2023-11-26 21:10:37,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3572813.3333333335, ans=0.07 2023-11-26 21:11:02,649 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 535950 2023-11-26 21:11:06,885 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 6900, loss[loss=0.05002, simple_loss=0.06419, pruned_loss=0.01014, audio_tagging_loss=0.007793, over 13508.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08974, pruned_loss=0.01219, audio_tagging_loss=0.008585, over 3046929.03 frames. ], batch size: 53, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:11:18,669 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.403e+01 8.747e+01 9.465e+01 1.018e+02 1.501e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 21:11:20,258 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.49 vs. limit=12.0 2023-11-26 21:11:32,514 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3573146.6666666665, ans=0.125 2023-11-26 21:11:34,530 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3573146.6666666665, ans=0.125 2023-11-26 21:11:34,994 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.10 vs. limit=15.0 2023-11-26 21:11:42,267 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.22 vs. limit=22.5 2023-11-26 21:11:47,733 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.97 vs. limit=10.0 2023-11-26 21:11:50,323 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 21:11:57,687 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 536000 2023-11-26 21:12:05,230 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 6950, loss[loss=0.07355, simple_loss=0.107, pruned_loss=0.0139, audio_tagging_loss=0.006166, over 15059.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08973, pruned_loss=0.01211, audio_tagging_loss=0.008605, over 3046720.20 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:12:13,807 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.38 vs. limit=15.0 2023-11-26 21:12:15,464 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3573413.3333333335, ans=0.125 2023-11-26 21:12:20,767 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3573413.3333333335, ans=0.125 2023-11-26 21:12:25,019 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3573413.3333333335, ans=0.0 2023-11-26 21:12:34,660 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3573480.0, ans=0.125 2023-11-26 21:12:40,054 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3573546.6666666665, ans=0.125 2023-11-26 21:12:46,375 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3573546.6666666665, ans=0.2 2023-11-26 21:12:48,360 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.97 vs. limit=15.0 2023-11-26 21:12:56,845 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 536050 2023-11-26 21:13:01,003 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 7000, loss[loss=0.07716, simple_loss=0.1104, pruned_loss=0.01581, audio_tagging_loss=0.006156, over 14835.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08968, pruned_loss=0.01209, audio_tagging_loss=0.008569, over 3041529.75 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 21:13:12,666 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.349e+01 8.901e+01 9.470e+01 1.019e+02 1.225e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-26 21:13:26,493 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.00 vs. limit=15.0 2023-11-26 21:13:35,129 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3573880.0, ans=0.0 2023-11-26 21:13:40,389 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3573880.0, ans=0.0 2023-11-26 21:13:46,673 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 21:13:51,881 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 536100 2023-11-26 21:13:52,119 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3573946.6666666665, ans=0.0 2023-11-26 21:13:56,050 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 7050, loss[loss=0.05706, simple_loss=0.07081, pruned_loss=0.01004, audio_tagging_loss=0.01161, over 14493.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.08965, pruned_loss=0.01201, audio_tagging_loss=0.008718, over 3051775.95 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 21:13:58,838 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.88 vs. limit=22.5 2023-11-26 21:14:02,607 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3574013.3333333335, ans=0.125 2023-11-26 21:14:15,804 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3574080.0, ans=0.0 2023-11-26 21:14:16,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3574080.0, ans=0.1 2023-11-26 21:14:35,978 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3574213.3333333335, ans=0.1 2023-11-26 21:14:46,447 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 536150 2023-11-26 21:14:51,229 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 7100, loss[loss=0.05831, simple_loss=0.08004, pruned_loss=0.01029, audio_tagging_loss=0.008004, over 14443.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.09031, pruned_loss=0.01214, audio_tagging_loss=0.008815, over 3060221.95 frames. ], batch size: 53, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 21:15:04,200 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.303e+01 8.863e+01 9.458e+01 1.036e+02 1.512e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-26 21:15:12,895 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3574480.0, ans=10.0 2023-11-26 21:15:21,525 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3574480.0, ans=0.0 2023-11-26 21:15:27,904 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=3574546.6666666665, ans=10.0 2023-11-26 21:15:43,232 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 536200 2023-11-26 21:15:47,678 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 7150, loss[loss=0.08112, simple_loss=0.1199, pruned_loss=0.01492, audio_tagging_loss=0.006263, over 15520.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.08999, pruned_loss=0.01214, audio_tagging_loss=0.008855, over 3047380.23 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 21:15:49,963 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 21:15:52,711 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.76 vs. limit=15.0 2023-11-26 21:15:56,261 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3574680.0, ans=0.0 2023-11-26 21:16:32,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3574946.6666666665, ans=0.125 2023-11-26 21:16:36,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3574946.6666666665, ans=0.125 2023-11-26 21:16:37,852 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 536250 2023-11-26 21:16:42,032 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 7200, loss[loss=0.07329, simple_loss=0.1029, pruned_loss=0.01391, audio_tagging_loss=0.007943, over 16081.00 frames. ], tot_loss[loss=0.06674, simple_loss=0.09136, pruned_loss=0.01234, audio_tagging_loss=0.008721, over 3042368.40 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:16:53,653 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.282e+01 8.982e+01 9.532e+01 1.041e+02 1.325e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-26 21:17:08,199 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3575146.6666666665, ans=0.0 2023-11-26 21:17:32,477 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 536300 2023-11-26 21:17:36,683 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 7250, loss[loss=0.05032, simple_loss=0.05929, pruned_loss=0.007845, audio_tagging_loss=0.01283, over 15059.00 frames. ], tot_loss[loss=0.06675, simple_loss=0.09119, pruned_loss=0.01236, audio_tagging_loss=0.008797, over 3039579.19 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:17:41,186 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.63 vs. limit=15.0 2023-11-26 21:17:50,432 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3575413.3333333335, ans=0.125 2023-11-26 21:17:57,678 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3575413.3333333335, ans=0.0 2023-11-26 21:18:10,881 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=3575546.6666666665, ans=10.0 2023-11-26 21:18:17,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3575546.6666666665, ans=0.0 2023-11-26 21:18:17,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3575546.6666666665, ans=0.0 2023-11-26 21:18:28,403 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 536350 2023-11-26 21:18:33,114 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 7300, loss[loss=0.08159, simple_loss=0.1146, pruned_loss=0.01505, audio_tagging_loss=0.009255, over 16363.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.09092, pruned_loss=0.0123, audio_tagging_loss=0.008782, over 3040144.22 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 21:18:35,474 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3575680.0, ans=0.125 2023-11-26 21:18:42,967 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3575746.6666666665, ans=0.0 2023-11-26 21:18:45,941 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.580e+01 8.748e+01 9.464e+01 1.022e+02 1.262e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 21:18:48,614 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.58 vs. limit=22.5 2023-11-26 21:18:54,916 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.18 vs. limit=22.5 2023-11-26 21:19:08,364 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3575880.0, ans=0.07 2023-11-26 21:19:17,408 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3575946.6666666665, ans=0.125 2023-11-26 21:19:23,596 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 536400 2023-11-26 21:19:28,022 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 7350, loss[loss=0.05577, simple_loss=0.07371, pruned_loss=0.009218, audio_tagging_loss=0.009702, over 14819.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.08991, pruned_loss=0.01225, audio_tagging_loss=0.008729, over 3048805.65 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 21:19:54,525 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3576146.6666666665, ans=0.0 2023-11-26 21:20:02,087 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.20 vs. limit=10.0 2023-11-26 21:20:02,487 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3576213.3333333335, ans=0.125 2023-11-26 21:20:18,446 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 536450 2023-11-26 21:20:21,109 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.17 vs. limit=22.5 2023-11-26 21:20:22,636 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 7400, loss[loss=0.06454, simple_loss=0.08766, pruned_loss=0.01229, audio_tagging_loss=0.008428, over 14711.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.09003, pruned_loss=0.01239, audio_tagging_loss=0.008777, over 3040539.28 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 21:20:29,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3576346.6666666665, ans=0.0 2023-11-26 21:20:36,524 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.430e+01 8.979e+01 9.560e+01 1.029e+02 2.303e+02, threshold=1.912e+02, percent-clipped=1.0 2023-11-26 21:20:44,883 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3576480.0, ans=0.125 2023-11-26 21:21:08,881 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3576613.3333333335, ans=0.125 2023-11-26 21:21:14,568 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 536500 2023-11-26 21:21:18,753 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 7450, loss[loss=0.05598, simple_loss=0.0687, pruned_loss=0.0101, audio_tagging_loss=0.01153, over 14906.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.09004, pruned_loss=0.01243, audio_tagging_loss=0.008741, over 3034074.76 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 21:21:19,010 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3576680.0, ans=0.1 2023-11-26 21:21:30,221 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3576746.6666666665, ans=0.0 2023-11-26 21:21:50,845 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3576880.0, ans=0.05 2023-11-26 21:21:55,737 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3576880.0, ans=0.125 2023-11-26 21:22:09,797 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 536550 2023-11-26 21:22:13,906 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 7500, loss[loss=0.077, simple_loss=0.1075, pruned_loss=0.0143, audio_tagging_loss=0.008943, over 15042.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.09044, pruned_loss=0.01242, audio_tagging_loss=0.008673, over 3041293.94 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 21:22:16,277 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3577013.3333333335, ans=0.2 2023-11-26 21:22:16,526 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.30 vs. limit=15.0 2023-11-26 21:22:22,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3577013.3333333335, ans=0.1 2023-11-26 21:22:23,813 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3577080.0, ans=0.125 2023-11-26 21:22:26,630 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.500e+01 8.830e+01 9.434e+01 1.016e+02 1.615e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-26 21:22:26,807 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3577080.0, ans=0.125 2023-11-26 21:22:52,902 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3577213.3333333335, ans=0.125 2023-11-26 21:23:04,456 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 536600 2023-11-26 21:23:08,918 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 7550, loss[loss=0.07811, simple_loss=0.09867, pruned_loss=0.02078, audio_tagging_loss=0.007994, over 15178.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08964, pruned_loss=0.01231, audio_tagging_loss=0.008659, over 3046312.66 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 21:23:16,089 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3577346.6666666665, ans=0.125 2023-11-26 21:23:21,935 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3577413.3333333335, ans=0.0 2023-11-26 21:23:27,186 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3577413.3333333335, ans=0.125 2023-11-26 21:23:46,996 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3577546.6666666665, ans=0.125 2023-11-26 21:23:56,973 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3577613.3333333335, ans=0.125 2023-11-26 21:24:00,013 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 536650 2023-11-26 21:24:04,759 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 7600, loss[loss=0.07809, simple_loss=0.1026, pruned_loss=0.01904, audio_tagging_loss=0.007744, over 15085.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.0893, pruned_loss=0.01229, audio_tagging_loss=0.008603, over 3041469.12 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:24:08,554 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.55 vs. limit=12.0 2023-11-26 21:24:14,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3577746.6666666665, ans=0.0 2023-11-26 21:24:17,497 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.414e+01 8.794e+01 9.367e+01 9.817e+01 1.272e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-26 21:24:28,854 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3577813.3333333335, ans=0.0 2023-11-26 21:24:56,188 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 536700 2023-11-26 21:25:00,344 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 7650, loss[loss=0.08709, simple_loss=0.1218, pruned_loss=0.01801, audio_tagging_loss=0.008178, over 15588.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.09016, pruned_loss=0.01243, audio_tagging_loss=0.008483, over 3046654.79 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:25:31,343 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3578146.6666666665, ans=0.1 2023-11-26 21:25:38,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3578213.3333333335, ans=0.0 2023-11-26 21:25:46,848 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3578280.0, ans=0.0 2023-11-26 21:25:52,127 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 536750 2023-11-26 21:25:54,337 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3578280.0, ans=0.0 2023-11-26 21:25:56,376 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 7700, loss[loss=0.06845, simple_loss=0.09939, pruned_loss=0.01134, audio_tagging_loss=0.007417, over 15052.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08978, pruned_loss=0.01225, audio_tagging_loss=0.008526, over 3047373.36 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:25:59,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3578346.6666666665, ans=0.0 2023-11-26 21:26:10,140 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.691e+01 8.777e+01 9.451e+01 1.024e+02 1.236e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-26 21:26:17,650 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.59 vs. limit=22.5 2023-11-26 21:26:19,428 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3578480.0, ans=0.05 2023-11-26 21:26:35,106 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.44 vs. limit=6.0 2023-11-26 21:26:44,516 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.63 vs. limit=15.0 2023-11-26 21:26:47,864 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 536800 2023-11-26 21:26:49,050 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3578613.3333333335, ans=0.125 2023-11-26 21:26:52,884 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 7750, loss[loss=0.07241, simple_loss=0.1061, pruned_loss=0.01197, audio_tagging_loss=0.007396, over 15794.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08965, pruned_loss=0.01224, audio_tagging_loss=0.008562, over 3042742.62 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:27:09,536 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3578746.6666666665, ans=0.1 2023-11-26 21:27:13,056 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.65 vs. limit=10.0 2023-11-26 21:27:24,789 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3578880.0, ans=0.125 2023-11-26 21:27:32,745 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3578880.0, ans=10.0 2023-11-26 21:27:44,483 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 536850 2023-11-26 21:27:48,632 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 7800, loss[loss=0.05929, simple_loss=0.07766, pruned_loss=0.01133, audio_tagging_loss=0.009129, over 15367.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08933, pruned_loss=0.01219, audio_tagging_loss=0.008607, over 3040757.18 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:27:53,029 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3579013.3333333335, ans=0.125 2023-11-26 21:27:55,414 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.51 vs. limit=15.0 2023-11-26 21:27:56,348 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3579013.3333333335, ans=0.125 2023-11-26 21:28:01,828 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.237e+01 9.125e+01 9.673e+01 1.032e+02 1.227e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-26 21:28:09,829 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.52 vs. limit=6.0 2023-11-26 21:28:20,800 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3579146.6666666665, ans=0.0 2023-11-26 21:28:21,151 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.87 vs. limit=15.0 2023-11-26 21:28:30,581 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.73 vs. limit=15.0 2023-11-26 21:28:39,490 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 536900 2023-11-26 21:28:44,305 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 7850, loss[loss=0.06879, simple_loss=0.09263, pruned_loss=0.01356, audio_tagging_loss=0.008911, over 15336.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.09065, pruned_loss=0.01232, audio_tagging_loss=0.008592, over 3042850.35 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:29:08,700 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.93 vs. limit=15.0 2023-11-26 21:29:10,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3579480.0, ans=0.125 2023-11-26 21:29:21,412 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.90 vs. limit=15.0 2023-11-26 21:29:35,295 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 536950 2023-11-26 21:29:39,961 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 7900, loss[loss=0.07232, simple_loss=0.09722, pruned_loss=0.01372, audio_tagging_loss=0.009995, over 16126.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.09069, pruned_loss=0.01238, audio_tagging_loss=0.008707, over 3051829.49 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:29:53,832 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.143e+01 8.961e+01 9.633e+01 1.012e+02 1.259e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-26 21:29:56,192 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3579746.6666666665, ans=0.1 2023-11-26 21:30:02,566 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 21:30:20,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3579880.0, ans=0.2 2023-11-26 21:30:32,174 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 537000 2023-11-26 21:30:33,388 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3579946.6666666665, ans=0.125 2023-11-26 21:30:36,637 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 7950, loss[loss=0.06019, simple_loss=0.07371, pruned_loss=0.01029, audio_tagging_loss=0.01304, over 14558.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.08986, pruned_loss=0.0123, audio_tagging_loss=0.008843, over 3053213.16 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:30:48,583 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3580080.0, ans=0.0 2023-11-26 21:30:50,462 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 21:30:50,664 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3580080.0, ans=0.0 2023-11-26 21:30:59,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3580146.6666666665, ans=0.125 2023-11-26 21:31:07,291 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3580146.6666666665, ans=0.1 2023-11-26 21:31:09,295 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3580213.3333333335, ans=0.125 2023-11-26 21:31:18,698 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.55 vs. limit=12.0 2023-11-26 21:31:19,293 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3580213.3333333335, ans=0.0 2023-11-26 21:31:25,786 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3580280.0, ans=0.0 2023-11-26 21:31:27,710 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 537050 2023-11-26 21:31:31,825 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 8000, loss[loss=0.06308, simple_loss=0.08047, pruned_loss=0.01448, audio_tagging_loss=0.008357, over 16001.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08862, pruned_loss=0.0122, audio_tagging_loss=0.009058, over 3051219.10 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:31:38,893 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3580346.6666666665, ans=0.0 2023-11-26 21:31:43,093 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3580413.3333333335, ans=0.125 2023-11-26 21:31:45,570 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.179e+01 8.727e+01 9.223e+01 9.988e+01 1.687e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-26 21:31:46,698 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3580413.3333333335, ans=0.125 2023-11-26 21:31:53,442 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.93 vs. limit=15.0 2023-11-26 21:32:18,370 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3580613.3333333335, ans=0.0 2023-11-26 21:32:20,455 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3580613.3333333335, ans=0.125 2023-11-26 21:32:22,937 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 537100 2023-11-26 21:32:27,636 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 8050, loss[loss=0.07607, simple_loss=0.1069, pruned_loss=0.0159, audio_tagging_loss=0.006731, over 15701.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08882, pruned_loss=0.01221, audio_tagging_loss=0.009045, over 3046899.64 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:32:33,751 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3580680.0, ans=0.125 2023-11-26 21:32:44,505 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.51 vs. limit=15.0 2023-11-26 21:33:10,538 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=3580880.0, ans=10.0 2023-11-26 21:33:19,945 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 537150 2023-11-26 21:33:23,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3581013.3333333335, ans=0.125 2023-11-26 21:33:24,161 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 8100, loss[loss=0.05064, simple_loss=0.07137, pruned_loss=0.006908, audio_tagging_loss=0.008048, over 14458.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.08993, pruned_loss=0.01236, audio_tagging_loss=0.008926, over 3049493.59 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:33:24,434 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3581013.3333333335, ans=0.125 2023-11-26 21:33:35,041 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3581080.0, ans=0.2 2023-11-26 21:33:36,872 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.566e+01 8.942e+01 9.751e+01 1.046e+02 1.316e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-26 21:33:42,596 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.36 vs. limit=15.0 2023-11-26 21:33:52,143 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3581146.6666666665, ans=0.125 2023-11-26 21:34:15,182 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 537200 2023-11-26 21:34:19,657 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 8150, loss[loss=0.06124, simple_loss=0.07059, pruned_loss=0.0161, audio_tagging_loss=0.009841, over 15070.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08934, pruned_loss=0.01217, audio_tagging_loss=0.008812, over 3054025.09 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:35:05,920 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.87 vs. limit=15.0 2023-11-26 21:35:06,687 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3581613.3333333335, ans=0.09899494936611666 2023-11-26 21:35:10,739 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 537250 2023-11-26 21:35:15,038 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 8200, loss[loss=0.06385, simple_loss=0.09247, pruned_loss=0.008342, audio_tagging_loss=0.009276, over 16040.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.08988, pruned_loss=0.01222, audio_tagging_loss=0.008699, over 3052694.98 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:35:17,761 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 21:35:17,961 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3581680.0, ans=0.0 2023-11-26 21:35:29,854 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.292e+01 8.811e+01 9.586e+01 1.032e+02 1.518e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-26 21:36:03,050 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3581946.6666666665, ans=10.0 2023-11-26 21:36:08,181 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 537300 2023-11-26 21:36:12,434 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 8250, loss[loss=0.06696, simple_loss=0.09586, pruned_loss=0.01203, audio_tagging_loss=0.006991, over 14839.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08924, pruned_loss=0.01218, audio_tagging_loss=0.008702, over 3044867.87 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:36:13,749 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3582013.3333333335, ans=0.125 2023-11-26 21:36:13,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3582013.3333333335, ans=0.125 2023-11-26 21:36:14,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3582013.3333333335, ans=0.0 2023-11-26 21:36:26,406 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 21:36:43,242 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.52 vs. limit=6.0 2023-11-26 21:36:49,494 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.53 vs. limit=22.5 2023-11-26 21:36:50,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3582213.3333333335, ans=0.125 2023-11-26 21:36:53,983 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3582213.3333333335, ans=0.125 2023-11-26 21:37:03,420 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 537350 2023-11-26 21:37:07,514 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 8300, loss[loss=0.05564, simple_loss=0.07639, pruned_loss=0.009595, audio_tagging_loss=0.007845, over 14626.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.08991, pruned_loss=0.01225, audio_tagging_loss=0.008619, over 3050089.92 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:37:20,177 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.577e+01 8.995e+01 9.587e+01 1.028e+02 1.257e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-26 21:37:38,446 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3582480.0, ans=0.125 2023-11-26 21:37:39,483 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3582480.0, ans=0.0 2023-11-26 21:37:40,566 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3582546.6666666665, ans=0.1 2023-11-26 21:37:58,403 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 537400 2023-11-26 21:37:58,587 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=3582613.3333333335, ans=10.0 2023-11-26 21:38:02,098 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3582680.0, ans=0.0 2023-11-26 21:38:02,862 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 8350, loss[loss=0.07744, simple_loss=0.1143, pruned_loss=0.01468, audio_tagging_loss=0.005606, over 16053.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08928, pruned_loss=0.01204, audio_tagging_loss=0.008603, over 3044421.09 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:38:20,161 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3582746.6666666665, ans=0.0 2023-11-26 21:38:32,700 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3582813.3333333335, ans=0.125 2023-11-26 21:38:38,088 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3582880.0, ans=0.125 2023-11-26 21:38:42,171 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3582880.0, ans=0.125 2023-11-26 21:38:49,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3582946.6666666665, ans=0.125 2023-11-26 21:38:54,704 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 537450 2023-11-26 21:38:59,489 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 8400, loss[loss=0.06126, simple_loss=0.0812, pruned_loss=0.01005, audio_tagging_loss=0.01061, over 15164.00 frames. ], tot_loss[loss=0.06472, simple_loss=0.08852, pruned_loss=0.01185, audio_tagging_loss=0.008603, over 3046594.67 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:39:02,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3583013.3333333335, ans=0.125 2023-11-26 21:39:02,901 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3583013.3333333335, ans=0.0 2023-11-26 21:39:13,341 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.643e+01 8.925e+01 9.429e+01 9.938e+01 1.352e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-26 21:39:38,076 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3583213.3333333335, ans=0.125 2023-11-26 21:39:39,330 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.03 vs. limit=15.0 2023-11-26 21:39:50,061 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 537500 2023-11-26 21:39:51,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3583280.0, ans=0.125 2023-11-26 21:39:54,181 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 8450, loss[loss=0.07736, simple_loss=0.1028, pruned_loss=0.01885, audio_tagging_loss=0.007108, over 14807.00 frames. ], tot_loss[loss=0.06454, simple_loss=0.08831, pruned_loss=0.01181, audio_tagging_loss=0.008574, over 3042948.98 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:40:02,395 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.42 vs. limit=6.0 2023-11-26 21:40:10,325 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3583413.3333333335, ans=0.2 2023-11-26 21:40:14,219 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.38 vs. limit=15.0 2023-11-26 21:40:44,876 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 537550 2023-11-26 21:40:49,054 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 8500, loss[loss=0.07865, simple_loss=0.1019, pruned_loss=0.01819, audio_tagging_loss=0.009513, over 14567.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08935, pruned_loss=0.01216, audio_tagging_loss=0.00857, over 3047153.40 frames. ], batch size: 53, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:40:57,176 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3583680.0, ans=0.125 2023-11-26 21:41:03,237 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3583746.6666666665, ans=0.2 2023-11-26 21:41:04,449 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.148e+01 8.764e+01 9.533e+01 1.022e+02 1.336e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-26 21:41:06,700 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3583746.6666666665, ans=0.125 2023-11-26 21:41:14,699 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3583813.3333333335, ans=0.1 2023-11-26 21:41:27,411 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3583880.0, ans=0.1 2023-11-26 21:41:32,762 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 21:41:40,581 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 537600 2023-11-26 21:41:45,595 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 8550, loss[loss=0.04741, simple_loss=0.06375, pruned_loss=0.007378, audio_tagging_loss=0.008153, over 14608.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08908, pruned_loss=0.012, audio_tagging_loss=0.008656, over 3052818.45 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:42:04,712 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3584080.0, ans=0.95 2023-11-26 21:42:11,908 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3584146.6666666665, ans=0.0 2023-11-26 21:42:29,715 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.95 vs. limit=6.0 2023-11-26 21:42:34,851 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3584280.0, ans=0.125 2023-11-26 21:42:37,274 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 537650 2023-11-26 21:42:41,428 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 8600, loss[loss=0.04461, simple_loss=0.05545, pruned_loss=0.008081, audio_tagging_loss=0.008809, over 14401.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08903, pruned_loss=0.01205, audio_tagging_loss=0.008723, over 3049231.22 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:42:55,226 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.372e+01 8.837e+01 9.386e+01 1.001e+02 1.418e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-26 21:43:16,331 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3584546.6666666665, ans=0.125 2023-11-26 21:43:16,843 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.04 vs. limit=15.0 2023-11-26 21:43:22,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3584546.6666666665, ans=0.125 2023-11-26 21:43:32,604 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 537700 2023-11-26 21:43:36,829 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 8650, loss[loss=0.05672, simple_loss=0.07508, pruned_loss=0.009836, audio_tagging_loss=0.009348, over 14525.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08929, pruned_loss=0.01212, audio_tagging_loss=0.008775, over 3050360.32 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:43:41,314 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3584680.0, ans=0.2 2023-11-26 21:43:45,046 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.31 vs. limit=10.0 2023-11-26 21:43:56,192 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.56 vs. limit=22.5 2023-11-26 21:44:05,916 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3584813.3333333335, ans=0.125 2023-11-26 21:44:23,220 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.50 vs. limit=22.5 2023-11-26 21:44:28,160 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 537750 2023-11-26 21:44:33,344 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 8700, loss[loss=0.08575, simple_loss=0.1072, pruned_loss=0.02312, audio_tagging_loss=0.009052, over 15186.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08959, pruned_loss=0.01224, audio_tagging_loss=0.008808, over 3047143.01 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:44:49,092 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.824e+01 9.138e+01 9.810e+01 1.049e+02 1.289e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-26 21:44:50,337 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3585080.0, ans=0.0 2023-11-26 21:45:20,736 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3585280.0, ans=0.0 2023-11-26 21:45:24,807 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 537800 2023-11-26 21:45:26,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3585280.0, ans=0.125 2023-11-26 21:45:29,268 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 8750, loss[loss=0.08949, simple_loss=0.1342, pruned_loss=0.0183, audio_tagging_loss=0.004104, over 14882.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.0897, pruned_loss=0.01225, audio_tagging_loss=0.008844, over 3046701.20 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:45:42,720 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3585413.3333333335, ans=0.125 2023-11-26 21:45:57,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3585480.0, ans=0.125 2023-11-26 21:45:57,724 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.84 vs. limit=15.0 2023-11-26 21:46:20,572 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 537850 2023-11-26 21:46:20,680 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3585613.3333333335, ans=0.1 2023-11-26 21:46:24,661 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 8800, loss[loss=0.06853, simple_loss=0.0973, pruned_loss=0.01208, audio_tagging_loss=0.007803, over 14903.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.09019, pruned_loss=0.01229, audio_tagging_loss=0.008887, over 3045925.45 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 32.0 2023-11-26 21:46:37,276 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3585746.6666666665, ans=0.1 2023-11-26 21:46:40,771 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.730e+01 8.993e+01 9.548e+01 1.016e+02 1.284e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-26 21:46:52,152 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3585813.3333333335, ans=0.09899494936611666 2023-11-26 21:46:58,908 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.65 vs. limit=15.0 2023-11-26 21:47:07,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3585880.0, ans=0.2 2023-11-26 21:47:09,842 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.61 vs. limit=15.0 2023-11-26 21:47:15,776 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 537900 2023-11-26 21:47:20,552 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 8850, loss[loss=0.07212, simple_loss=0.101, pruned_loss=0.01326, audio_tagging_loss=0.008349, over 15914.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.09039, pruned_loss=0.01227, audio_tagging_loss=0.008922, over 3054245.89 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:47:22,293 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3586013.3333333335, ans=0.0 2023-11-26 21:47:22,457 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3586013.3333333335, ans=0.125 2023-11-26 21:47:33,208 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 21:47:34,521 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3586080.0, ans=0.125 2023-11-26 21:47:55,352 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3586213.3333333335, ans=0.1 2023-11-26 21:47:56,445 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3586213.3333333335, ans=0.07 2023-11-26 21:48:12,665 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 537950 2023-11-26 21:48:13,826 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3586280.0, ans=0.125 2023-11-26 21:48:16,101 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3586346.6666666665, ans=0.1 2023-11-26 21:48:16,855 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 8900, loss[loss=0.05907, simple_loss=0.07807, pruned_loss=0.0104, audio_tagging_loss=0.00964, over 14994.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.09059, pruned_loss=0.01216, audio_tagging_loss=0.008796, over 3048563.82 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:48:33,338 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.240e+01 9.001e+01 9.576e+01 1.032e+02 1.288e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-26 21:48:33,665 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3586413.3333333335, ans=0.0 2023-11-26 21:48:34,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3586413.3333333335, ans=0.04949747468305833 2023-11-26 21:48:42,319 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.90 vs. limit=6.0 2023-11-26 21:48:43,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3586480.0, ans=0.125 2023-11-26 21:48:55,782 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.26 vs. limit=15.0 2023-11-26 21:48:56,569 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3586546.6666666665, ans=0.2 2023-11-26 21:48:57,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3586546.6666666665, ans=0.0 2023-11-26 21:49:05,978 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3586613.3333333335, ans=0.5 2023-11-26 21:49:07,874 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 538000 2023-11-26 21:49:12,814 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 8950, loss[loss=0.0723, simple_loss=0.1017, pruned_loss=0.01498, audio_tagging_loss=0.006478, over 15641.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.09091, pruned_loss=0.01216, audio_tagging_loss=0.008595, over 3054796.24 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:49:14,108 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3586680.0, ans=0.125 2023-11-26 21:49:30,975 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3586746.6666666665, ans=0.125 2023-11-26 21:49:32,070 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3586746.6666666665, ans=0.2 2023-11-26 21:49:33,136 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3586746.6666666665, ans=0.2 2023-11-26 21:49:35,026 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.00 vs. limit=12.0 2023-11-26 21:49:58,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3586946.6666666665, ans=0.125 2023-11-26 21:50:03,806 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 538050 2023-11-26 21:50:08,016 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 9000, loss[loss=0.08386, simple_loss=0.1125, pruned_loss=0.01955, audio_tagging_loss=0.008059, over 14976.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08983, pruned_loss=0.01189, audio_tagging_loss=0.008579, over 3053164.79 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:50:08,017 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-26 21:50:40,486 INFO [train_asr.py:1267] (3/4) Epoch 45, validation: loss=0.05836, simple_loss=0.0505, pruned_loss=0.005274, audio_tagging_loss=0.02784, over 4681554.00 frames. 2023-11-26 21:50:40,486 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-26 21:50:40,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3587013.3333333335, ans=0.125 2023-11-26 21:50:56,766 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.438e+01 8.868e+01 9.363e+01 9.972e+01 1.329e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-26 21:51:19,861 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3587213.3333333335, ans=0.0 2023-11-26 21:51:24,094 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3587280.0, ans=0.2 2023-11-26 21:51:31,235 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 538100 2023-11-26 21:51:35,383 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 9050, loss[loss=0.06626, simple_loss=0.08653, pruned_loss=0.01447, audio_tagging_loss=0.008527, over 15265.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.09058, pruned_loss=0.012, audio_tagging_loss=0.008498, over 3051270.26 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:51:56,849 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3587413.3333333335, ans=0.125 2023-11-26 21:52:20,845 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3587613.3333333335, ans=0.2 2023-11-26 21:52:26,841 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 538150 2023-11-26 21:52:31,945 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 9100, loss[loss=0.06068, simple_loss=0.07871, pruned_loss=0.01348, audio_tagging_loss=0.00784, over 14816.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.09062, pruned_loss=0.01206, audio_tagging_loss=0.008539, over 3050121.15 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:52:34,209 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=3587680.0, ans=0.02 2023-11-26 21:52:49,591 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.208e+01 8.688e+01 9.524e+01 1.031e+02 1.397e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-26 21:53:23,844 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 538200 2023-11-26 21:53:28,232 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 9150, loss[loss=0.07206, simple_loss=0.09381, pruned_loss=0.01548, audio_tagging_loss=0.009676, over 15499.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.09089, pruned_loss=0.01218, audio_tagging_loss=0.008469, over 3044943.51 frames. ], batch size: 62, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 21:53:34,661 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3588013.3333333335, ans=0.125 2023-11-26 21:53:37,810 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3588080.0, ans=0.125 2023-11-26 21:53:45,412 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3588080.0, ans=0.0 2023-11-26 21:54:11,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3588213.3333333335, ans=0.125 2023-11-26 21:54:19,288 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 538250 2023-11-26 21:54:21,639 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3588280.0, ans=0.125 2023-11-26 21:54:23,513 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 9200, loss[loss=0.06306, simple_loss=0.08688, pruned_loss=0.01048, audio_tagging_loss=0.009144, over 16245.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.09048, pruned_loss=0.01211, audio_tagging_loss=0.008444, over 3050828.82 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:54:42,044 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.381e+01 8.859e+01 9.629e+01 1.034e+02 1.503e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-26 21:55:13,929 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.44 vs. limit=22.5 2023-11-26 21:55:15,019 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 538300 2023-11-26 21:55:19,667 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 9250, loss[loss=0.05428, simple_loss=0.06924, pruned_loss=0.01012, audio_tagging_loss=0.009548, over 16365.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.09002, pruned_loss=0.01212, audio_tagging_loss=0.008414, over 3053902.83 frames. ], batch size: 61, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:56:04,013 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3588946.6666666665, ans=0.125 2023-11-26 21:56:11,272 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 538350 2023-11-26 21:56:15,963 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 9300, loss[loss=0.04264, simple_loss=0.05296, pruned_loss=0.0068, audio_tagging_loss=0.009361, over 14504.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.0894, pruned_loss=0.01214, audio_tagging_loss=0.008524, over 3053751.94 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:56:17,321 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3589013.3333333335, ans=0.125 2023-11-26 21:56:32,741 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.167e+01 8.799e+01 9.431e+01 1.003e+02 1.401e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-26 21:56:47,682 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.77 vs. limit=22.5 2023-11-26 21:56:59,623 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3589280.0, ans=0.125 2023-11-26 21:57:07,057 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 538400 2023-11-26 21:57:11,536 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 9350, loss[loss=0.06539, simple_loss=0.08881, pruned_loss=0.01381, audio_tagging_loss=0.00717, over 15769.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.09016, pruned_loss=0.01228, audio_tagging_loss=0.008567, over 3056849.19 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:57:17,125 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 21:57:42,805 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.99 vs. limit=10.0 2023-11-26 21:57:43,864 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.01 vs. limit=15.0 2023-11-26 21:57:53,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3589546.6666666665, ans=0.125 2023-11-26 21:57:55,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3589613.3333333335, ans=0.125 2023-11-26 21:58:00,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3589613.3333333335, ans=0.125 2023-11-26 21:58:02,245 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 538450 2023-11-26 21:58:06,457 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 9400, loss[loss=0.06125, simple_loss=0.08708, pruned_loss=0.01041, audio_tagging_loss=0.007296, over 15058.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08946, pruned_loss=0.01215, audio_tagging_loss=0.008714, over 3049785.97 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:58:07,050 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.76 vs. limit=15.0 2023-11-26 21:58:19,001 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3589746.6666666665, ans=0.1 2023-11-26 21:58:23,249 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3589746.6666666665, ans=0.0 2023-11-26 21:58:25,073 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.062e+01 9.009e+01 9.595e+01 1.056e+02 1.388e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-26 21:58:25,767 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.42 vs. limit=12.0 2023-11-26 21:58:49,358 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.38 vs. limit=12.0 2023-11-26 21:58:58,865 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 538500 2023-11-26 21:59:03,565 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 9450, loss[loss=0.05643, simple_loss=0.07817, pruned_loss=0.008764, audio_tagging_loss=0.008581, over 14909.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08894, pruned_loss=0.01195, audio_tagging_loss=0.008723, over 3052836.72 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 21:59:03,594 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 21:59:19,983 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3590080.0, ans=0.125 2023-11-26 21:59:28,621 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3590146.6666666665, ans=0.125 2023-11-26 21:59:47,152 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.16 vs. limit=22.5 2023-11-26 21:59:48,863 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3590280.0, ans=0.2 2023-11-26 21:59:55,075 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 538550 2023-11-26 21:59:59,324 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 9500, loss[loss=0.06721, simple_loss=0.09539, pruned_loss=0.01028, audio_tagging_loss=0.009232, over 16699.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.08975, pruned_loss=0.01224, audio_tagging_loss=0.008772, over 3054753.19 frames. ], batch size: 61, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 22:00:18,519 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.889e+01 9.000e+01 9.693e+01 1.049e+02 2.337e+02, threshold=1.939e+02, percent-clipped=1.0 2023-11-26 22:00:22,526 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3590480.0, ans=0.125 2023-11-26 22:00:45,709 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3590613.3333333335, ans=0.0 2023-11-26 22:00:46,140 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.20 vs. limit=15.0 2023-11-26 22:00:50,781 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 538600 2023-11-26 22:00:53,309 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3590613.3333333335, ans=0.0 2023-11-26 22:00:55,219 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 9550, loss[loss=0.0778, simple_loss=0.09869, pruned_loss=0.01988, audio_tagging_loss=0.008574, over 14985.00 frames. ], tot_loss[loss=0.06676, simple_loss=0.09112, pruned_loss=0.01238, audio_tagging_loss=0.008817, over 3052700.82 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 8.0 2023-11-26 22:01:00,197 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3590680.0, ans=0.125 2023-11-26 22:01:03,945 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3590680.0, ans=0.1 2023-11-26 22:01:10,510 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.81 vs. limit=12.0 2023-11-26 22:01:12,482 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3590746.6666666665, ans=0.0 2023-11-26 22:01:16,844 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.25 vs. limit=22.5 2023-11-26 22:01:18,799 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 22:01:22,082 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3590813.3333333335, ans=0.125 2023-11-26 22:01:34,825 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3590880.0, ans=0.2 2023-11-26 22:01:37,000 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3590880.0, ans=0.2 2023-11-26 22:01:39,135 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3590946.6666666665, ans=0.0 2023-11-26 22:01:47,409 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 538650 2023-11-26 22:01:52,744 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 9600, loss[loss=0.06206, simple_loss=0.08219, pruned_loss=0.01254, audio_tagging_loss=0.008427, over 15630.00 frames. ], tot_loss[loss=0.06712, simple_loss=0.09164, pruned_loss=0.01242, audio_tagging_loss=0.008881, over 3054121.45 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 22:01:54,039 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3591013.3333333335, ans=0.1 2023-11-26 22:02:10,548 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.211e+01 8.846e+01 9.558e+01 1.014e+02 1.385e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-26 22:02:10,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3591080.0, ans=0.0 2023-11-26 22:02:30,117 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3591213.3333333335, ans=0.07 2023-11-26 22:02:32,101 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3591213.3333333335, ans=0.125 2023-11-26 22:02:32,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3591213.3333333335, ans=0.0 2023-11-26 22:02:33,771 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3591213.3333333335, ans=0.2 2023-11-26 22:02:33,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3591213.3333333335, ans=0.2 2023-11-26 22:02:40,758 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3591280.0, ans=0.0 2023-11-26 22:02:43,711 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 538700 2023-11-26 22:02:47,900 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 9650, loss[loss=0.06468, simple_loss=0.08704, pruned_loss=0.01326, audio_tagging_loss=0.007901, over 16256.00 frames. ], tot_loss[loss=0.06701, simple_loss=0.09148, pruned_loss=0.01239, audio_tagging_loss=0.00888, over 3056261.21 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 22:02:56,710 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3591346.6666666665, ans=0.125 2023-11-26 22:02:56,747 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3591346.6666666665, ans=0.0 2023-11-26 22:02:58,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3591413.3333333335, ans=0.125 2023-11-26 22:03:21,583 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3591546.6666666665, ans=0.125 2023-11-26 22:03:21,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3591546.6666666665, ans=0.07 2023-11-26 22:03:23,774 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3591546.6666666665, ans=0.125 2023-11-26 22:03:25,904 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3591546.6666666665, ans=0.125 2023-11-26 22:03:28,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3591546.6666666665, ans=0.0 2023-11-26 22:03:38,574 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 538750 2023-11-26 22:03:42,872 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 9700, loss[loss=0.05156, simple_loss=0.06866, pruned_loss=0.008901, audio_tagging_loss=0.008332, over 15652.00 frames. ], tot_loss[loss=0.06744, simple_loss=0.0921, pruned_loss=0.01265, audio_tagging_loss=0.008741, over 3060608.68 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-26 22:03:54,960 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 22:03:55,964 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3591746.6666666665, ans=0.125 2023-11-26 22:04:02,701 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.048e+01 8.825e+01 9.473e+01 1.012e+02 1.378e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-26 22:04:10,445 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3591813.3333333335, ans=0.125 2023-11-26 22:04:15,629 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3591880.0, ans=0.125 2023-11-26 22:04:17,166 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.30 vs. limit=15.0 2023-11-26 22:04:21,038 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3591880.0, ans=0.0 2023-11-26 22:04:24,103 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3591880.0, ans=0.125 2023-11-26 22:04:25,229 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3591880.0, ans=0.0 2023-11-26 22:04:34,594 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 538800 2023-11-26 22:04:39,049 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 9750, loss[loss=0.07926, simple_loss=0.1081, pruned_loss=0.01718, audio_tagging_loss=0.008038, over 14799.00 frames. ], tot_loss[loss=0.06719, simple_loss=0.09202, pruned_loss=0.01261, audio_tagging_loss=0.008577, over 3054652.64 frames. ], batch size: 55, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:04:50,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3592080.0, ans=0.125 2023-11-26 22:05:00,284 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3592146.6666666665, ans=0.0 2023-11-26 22:05:24,590 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3592280.0, ans=0.0 2023-11-26 22:05:30,180 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 538850 2023-11-26 22:05:34,343 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 9800, loss[loss=0.04444, simple_loss=0.05182, pruned_loss=0.008486, audio_tagging_loss=0.01004, over 14984.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.09088, pruned_loss=0.01247, audio_tagging_loss=0.008529, over 3051649.12 frames. ], batch size: 60, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:05:40,159 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.96 vs. limit=15.0 2023-11-26 22:05:52,323 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.272e+01 8.733e+01 9.432e+01 1.005e+02 1.366e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-26 22:06:00,131 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3592480.0, ans=0.2 2023-11-26 22:06:02,849 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.85 vs. limit=22.5 2023-11-26 22:06:05,401 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3592480.0, ans=0.125 2023-11-26 22:06:09,179 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3592546.6666666665, ans=0.1 2023-11-26 22:06:25,736 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 22:06:25,783 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 538900 2023-11-26 22:06:29,956 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 9850, loss[loss=0.06902, simple_loss=0.09874, pruned_loss=0.01173, audio_tagging_loss=0.007919, over 15357.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.09022, pruned_loss=0.01234, audio_tagging_loss=0.008487, over 3051830.11 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:06:35,329 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3592680.0, ans=0.0 2023-11-26 22:07:03,672 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.13 vs. limit=15.0 2023-11-26 22:07:12,943 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3592880.0, ans=0.125 2023-11-26 22:07:21,334 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 538950 2023-11-26 22:07:23,108 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 22:07:26,009 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 9900, loss[loss=0.06863, simple_loss=0.09507, pruned_loss=0.01261, audio_tagging_loss=0.008477, over 15235.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.09003, pruned_loss=0.01243, audio_tagging_loss=0.008414, over 3048650.96 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:07:28,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3593013.3333333335, ans=0.1 2023-11-26 22:07:30,548 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3593013.3333333335, ans=0.0 2023-11-26 22:07:33,244 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3593013.3333333335, ans=0.125 2023-11-26 22:07:37,373 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3593080.0, ans=0.125 2023-11-26 22:07:37,771 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.16 vs. limit=12.0 2023-11-26 22:07:45,064 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.355e+01 9.069e+01 9.666e+01 1.030e+02 3.243e+02, threshold=1.933e+02, percent-clipped=1.0 2023-11-26 22:07:46,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3593080.0, ans=0.07 2023-11-26 22:07:59,460 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.98 vs. limit=15.0 2023-11-26 22:08:16,898 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 539000 2023-11-26 22:08:21,870 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 9950, loss[loss=0.06543, simple_loss=0.09085, pruned_loss=0.01162, audio_tagging_loss=0.008392, over 17122.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08991, pruned_loss=0.01234, audio_tagging_loss=0.008342, over 3046103.84 frames. ], batch size: 62, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:08:38,314 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.18 vs. limit=12.0 2023-11-26 22:08:49,496 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3593480.0, ans=0.0 2023-11-26 22:09:12,411 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.36 vs. limit=15.0 2023-11-26 22:09:12,881 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 539050 2023-11-26 22:09:17,140 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 10000, loss[loss=0.04829, simple_loss=0.06665, pruned_loss=0.00682, audio_tagging_loss=0.00814, over 14683.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08944, pruned_loss=0.01221, audio_tagging_loss=0.008304, over 3043072.17 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 32.0 2023-11-26 22:09:33,787 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3593746.6666666665, ans=0.125 2023-11-26 22:09:35,524 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.167e+01 8.750e+01 9.330e+01 1.017e+02 1.273e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-26 22:09:50,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3593880.0, ans=0.2 2023-11-26 22:10:01,354 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3593946.6666666665, ans=0.125 2023-11-26 22:10:06,697 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3593946.6666666665, ans=0.025 2023-11-26 22:10:07,567 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 539100 2023-11-26 22:10:11,486 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3594013.3333333335, ans=0.0 2023-11-26 22:10:12,308 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 10050, loss[loss=0.07783, simple_loss=0.1055, pruned_loss=0.01542, audio_tagging_loss=0.009667, over 15007.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08947, pruned_loss=0.01223, audio_tagging_loss=0.008386, over 3047676.89 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 32.0 2023-11-26 22:10:14,650 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3594013.3333333335, ans=0.125 2023-11-26 22:10:16,905 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3594013.3333333335, ans=0.2 2023-11-26 22:10:17,857 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3594013.3333333335, ans=0.0 2023-11-26 22:10:32,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3594080.0, ans=0.125 2023-11-26 22:10:59,538 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3594280.0, ans=0.1 2023-11-26 22:11:03,172 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 539150 2023-11-26 22:11:07,345 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 10100, loss[loss=0.04798, simple_loss=0.06068, pruned_loss=0.009278, audio_tagging_loss=0.008363, over 14845.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.08927, pruned_loss=0.01225, audio_tagging_loss=0.008421, over 3053312.62 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:11:12,915 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3594346.6666666665, ans=0.2 2023-11-26 22:11:22,012 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3594413.3333333335, ans=0.125 2023-11-26 22:11:23,391 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=3594413.3333333335, ans=15.0 2023-11-26 22:11:27,042 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.124e+01 9.131e+01 9.595e+01 1.046e+02 1.257e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-26 22:11:28,683 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.94 vs. limit=15.0 2023-11-26 22:11:35,296 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3594480.0, ans=0.125 2023-11-26 22:11:53,699 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 22:11:56,455 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.40 vs. limit=22.5 2023-11-26 22:11:58,609 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 539200 2023-11-26 22:12:03,080 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 10150, loss[loss=0.04585, simple_loss=0.05219, pruned_loss=0.007535, audio_tagging_loss=0.01223, over 16127.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08898, pruned_loss=0.01208, audio_tagging_loss=0.008564, over 3055122.09 frames. ], batch size: 63, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:12:09,757 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3594680.0, ans=0.0 2023-11-26 22:12:23,550 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2023-11-26 22:12:30,750 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 22:12:44,331 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3594880.0, ans=0.04949747468305833 2023-11-26 22:12:48,417 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3594946.6666666665, ans=0.125 2023-11-26 22:12:48,464 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3594946.6666666665, ans=0.05 2023-11-26 22:12:49,614 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3594946.6666666665, ans=0.125 2023-11-26 22:12:53,681 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 539250 2023-11-26 22:12:58,466 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 10200, loss[loss=0.06619, simple_loss=0.09258, pruned_loss=0.01076, audio_tagging_loss=0.00914, over 15420.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08826, pruned_loss=0.01205, audio_tagging_loss=0.008722, over 3049899.37 frames. ], batch size: 58, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:13:17,120 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.06 vs. limit=15.0 2023-11-26 22:13:18,619 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.716e+01 9.041e+01 9.563e+01 1.048e+02 1.575e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-26 22:13:20,762 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 22:13:47,864 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.63 vs. limit=15.0 2023-11-26 22:13:49,499 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 539300 2023-11-26 22:13:50,629 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3595280.0, ans=0.035 2023-11-26 22:13:50,971 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.82 vs. limit=15.0 2023-11-26 22:13:53,451 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3595346.6666666665, ans=0.125 2023-11-26 22:13:54,227 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 10250, loss[loss=0.05205, simple_loss=0.06801, pruned_loss=0.007206, audio_tagging_loss=0.01084, over 15534.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08834, pruned_loss=0.01199, audio_tagging_loss=0.008813, over 3050893.08 frames. ], batch size: 60, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:13:54,373 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3595346.6666666665, ans=0.125 2023-11-26 22:13:54,425 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3595346.6666666665, ans=0.125 2023-11-26 22:14:06,058 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3595413.3333333335, ans=0.0 2023-11-26 22:14:09,939 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3595413.3333333335, ans=0.125 2023-11-26 22:14:15,508 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.48 vs. limit=15.0 2023-11-26 22:14:37,142 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3595546.6666666665, ans=0.125 2023-11-26 22:14:45,470 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 539350 2023-11-26 22:14:49,565 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 10300, loss[loss=0.06096, simple_loss=0.08232, pruned_loss=0.008724, audio_tagging_loss=0.01108, over 15087.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08954, pruned_loss=0.01226, audio_tagging_loss=0.008829, over 3049777.93 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:14:53,501 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3595680.0, ans=0.125 2023-11-26 22:15:10,406 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.546e+01 9.184e+01 9.815e+01 1.071e+02 1.317e+02, threshold=1.963e+02, percent-clipped=0.0 2023-11-26 22:15:41,416 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 539400 2023-11-26 22:15:44,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3595946.6666666665, ans=0.95 2023-11-26 22:15:46,011 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 10350, loss[loss=0.08618, simple_loss=0.1274, pruned_loss=0.01478, audio_tagging_loss=0.007716, over 16190.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.09014, pruned_loss=0.01223, audio_tagging_loss=0.008897, over 3048409.57 frames. ], batch size: 60, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:16:38,453 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 539450 2023-11-26 22:16:43,140 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 10400, loss[loss=0.07458, simple_loss=0.1038, pruned_loss=0.01401, audio_tagging_loss=0.008668, over 15742.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.08967, pruned_loss=0.01224, audio_tagging_loss=0.009015, over 3046093.81 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 32.0 2023-11-26 22:17:02,386 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.615e+01 8.954e+01 9.594e+01 1.032e+02 1.312e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-26 22:17:03,110 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.79 vs. limit=22.5 2023-11-26 22:17:06,873 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.86 vs. limit=22.5 2023-11-26 22:17:18,066 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3596546.6666666665, ans=0.1 2023-11-26 22:17:24,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3596546.6666666665, ans=0.09899494936611666 2023-11-26 22:17:28,410 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3596613.3333333335, ans=0.2 2023-11-26 22:17:30,775 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.02 vs. limit=22.5 2023-11-26 22:17:31,613 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3596613.3333333335, ans=0.0 2023-11-26 22:17:34,656 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 539500 2023-11-26 22:17:38,826 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 10450, loss[loss=0.06353, simple_loss=0.09545, pruned_loss=0.008834, audio_tagging_loss=0.006966, over 15289.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.09014, pruned_loss=0.01233, audio_tagging_loss=0.008916, over 3042927.83 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 32.0 2023-11-26 22:17:45,490 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3596680.0, ans=0.0 2023-11-26 22:17:59,774 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3596746.6666666665, ans=0.05 2023-11-26 22:17:59,860 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3596746.6666666665, ans=0.1 2023-11-26 22:18:08,901 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3596813.3333333335, ans=0.1 2023-11-26 22:18:18,494 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3596880.0, ans=0.1 2023-11-26 22:18:22,616 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3596946.6666666665, ans=0.125 2023-11-26 22:18:29,977 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 539550 2023-11-26 22:18:34,744 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 10500, loss[loss=0.06591, simple_loss=0.09862, pruned_loss=0.01109, audio_tagging_loss=0.005513, over 16217.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08937, pruned_loss=0.0122, audio_tagging_loss=0.008794, over 3044585.76 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 32.0 2023-11-26 22:18:45,707 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3597080.0, ans=0.1 2023-11-26 22:18:54,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3597080.0, ans=0.2 2023-11-26 22:18:55,227 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.808e+01 8.749e+01 9.296e+01 1.026e+02 1.262e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-26 22:19:26,535 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 539600 2023-11-26 22:19:30,990 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 10550, loss[loss=0.06937, simple_loss=0.09259, pruned_loss=0.01369, audio_tagging_loss=0.009391, over 16173.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08894, pruned_loss=0.01204, audio_tagging_loss=0.008758, over 3043394.18 frames. ], batch size: 61, lr: 1.49e-03, grad_scale: 32.0 2023-11-26 22:19:33,827 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3597346.6666666665, ans=0.1 2023-11-26 22:19:34,942 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3597346.6666666665, ans=0.125 2023-11-26 22:19:37,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3597346.6666666665, ans=0.125 2023-11-26 22:19:48,711 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 22:19:51,285 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.46 vs. limit=15.0 2023-11-26 22:19:57,627 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.65 vs. limit=12.0 2023-11-26 22:20:03,023 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3597546.6666666665, ans=0.125 2023-11-26 22:20:12,568 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 22:20:17,567 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.94 vs. limit=10.0 2023-11-26 22:20:22,554 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 539650 2023-11-26 22:20:26,748 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 10600, loss[loss=0.06211, simple_loss=0.08369, pruned_loss=0.01197, audio_tagging_loss=0.008302, over 15046.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.08876, pruned_loss=0.012, audio_tagging_loss=0.008675, over 3041174.37 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 32.0 2023-11-26 22:20:45,001 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.90 vs. limit=15.0 2023-11-26 22:20:47,032 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.182e+01 8.604e+01 9.125e+01 9.885e+01 1.207e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-26 22:20:53,110 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 22:21:17,948 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 539700 2023-11-26 22:21:22,184 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 10650, loss[loss=0.0834, simple_loss=0.1168, pruned_loss=0.01784, audio_tagging_loss=0.007156, over 14775.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08903, pruned_loss=0.01214, audio_tagging_loss=0.008527, over 3044854.31 frames. ], batch size: 53, lr: 1.49e-03, grad_scale: 32.0 2023-11-26 22:21:23,747 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.35 vs. limit=10.0 2023-11-26 22:21:53,319 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3598146.6666666665, ans=0.125 2023-11-26 22:22:00,834 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.44 vs. limit=12.0 2023-11-26 22:22:14,150 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 539750 2023-11-26 22:22:15,317 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3598280.0, ans=0.125 2023-11-26 22:22:18,302 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 10700, loss[loss=0.0548, simple_loss=0.06774, pruned_loss=0.01105, audio_tagging_loss=0.009874, over 15687.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08933, pruned_loss=0.01229, audio_tagging_loss=0.008523, over 3045696.14 frames. ], batch size: 62, lr: 1.49e-03, grad_scale: 32.0 2023-11-26 22:22:21,659 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3598346.6666666665, ans=0.0 2023-11-26 22:22:27,526 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3598346.6666666665, ans=0.1 2023-11-26 22:22:29,677 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3598413.3333333335, ans=0.2 2023-11-26 22:22:37,282 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 22:22:37,963 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.681e+01 8.850e+01 9.452e+01 1.010e+02 1.228e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-26 22:22:39,337 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-26 22:22:42,517 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3598480.0, ans=0.125 2023-11-26 22:23:03,018 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3598613.3333333335, ans=0.0 2023-11-26 22:23:04,118 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3598613.3333333335, ans=0.1 2023-11-26 22:23:08,849 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3598613.3333333335, ans=0.1 2023-11-26 22:23:09,822 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 539800 2023-11-26 22:23:14,294 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 10750, loss[loss=0.0647, simple_loss=0.08519, pruned_loss=0.01189, audio_tagging_loss=0.01023, over 14690.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08986, pruned_loss=0.01224, audio_tagging_loss=0.008485, over 3050111.24 frames. ], batch size: 53, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:23:18,598 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 22:23:19,696 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3598680.0, ans=0.0 2023-11-26 22:23:32,402 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3598746.6666666665, ans=0.125 2023-11-26 22:23:43,613 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3598813.3333333335, ans=0.05 2023-11-26 22:23:44,809 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3598813.3333333335, ans=0.125 2023-11-26 22:23:52,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3598880.0, ans=0.2 2023-11-26 22:23:54,691 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3598880.0, ans=0.125 2023-11-26 22:23:56,999 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3598880.0, ans=0.0 2023-11-26 22:24:05,267 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 539850 2023-11-26 22:24:06,786 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.10 vs. limit=15.0 2023-11-26 22:24:07,570 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3598946.6666666665, ans=0.1 2023-11-26 22:24:09,426 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 10800, loss[loss=0.04679, simple_loss=0.04569, pruned_loss=0.009853, audio_tagging_loss=0.01409, over 13919.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08912, pruned_loss=0.0122, audio_tagging_loss=0.008447, over 3042731.36 frames. ], batch size: 55, lr: 1.49e-03, grad_scale: 32.0 2023-11-26 22:24:31,743 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.551e+01 8.827e+01 9.312e+01 1.017e+02 1.289e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-26 22:24:38,245 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3599146.6666666665, ans=0.0 2023-11-26 22:24:41,404 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 22:24:41,830 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.26 vs. limit=15.0 2023-11-26 22:24:54,202 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3599280.0, ans=0.125 2023-11-26 22:25:01,000 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 539900 2023-11-26 22:25:06,354 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 10850, loss[loss=0.06273, simple_loss=0.08545, pruned_loss=0.01222, audio_tagging_loss=0.007784, over 14752.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08926, pruned_loss=0.01208, audio_tagging_loss=0.008444, over 3040766.19 frames. ], batch size: 55, lr: 1.49e-03, grad_scale: 32.0 2023-11-26 22:25:10,857 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3599346.6666666665, ans=0.1 2023-11-26 22:25:33,557 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3599480.0, ans=0.0 2023-11-26 22:25:50,415 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.41 vs. limit=15.0 2023-11-26 22:25:52,283 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3599613.3333333335, ans=0.125 2023-11-26 22:25:57,013 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3599613.3333333335, ans=0.125 2023-11-26 22:25:57,961 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 539950 2023-11-26 22:26:00,075 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 22:26:02,156 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 10900, loss[loss=0.07665, simple_loss=0.1128, pruned_loss=0.01425, audio_tagging_loss=0.006006, over 15377.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.09034, pruned_loss=0.0123, audio_tagging_loss=0.008465, over 3042847.13 frames. ], batch size: 55, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:26:08,925 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.22 vs. limit=6.0 2023-11-26 22:26:17,179 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3599746.6666666665, ans=0.125 2023-11-26 22:26:23,454 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.190e+01 9.085e+01 9.626e+01 1.024e+02 1.281e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-26 22:26:23,801 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3599813.3333333335, ans=0.125 2023-11-26 22:26:26,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3599813.3333333335, ans=0.125 2023-11-26 22:26:27,908 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3599813.3333333335, ans=0.0 2023-11-26 22:26:41,678 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3599880.0, ans=0.0 2023-11-26 22:26:48,089 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3599946.6666666665, ans=0.125 2023-11-26 22:26:53,221 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 540000 2023-11-26 22:26:59,566 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 10950, loss[loss=0.07628, simple_loss=0.1142, pruned_loss=0.0128, audio_tagging_loss=0.006368, over 15186.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08985, pruned_loss=0.01213, audio_tagging_loss=0.008582, over 3047094.92 frames. ], batch size: 60, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:27:04,352 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.29 vs. limit=22.5 2023-11-26 22:27:16,778 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.96 vs. limit=15.0 2023-11-26 22:27:18,117 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3600080.0, ans=0.0 2023-11-26 22:27:41,094 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3600213.3333333335, ans=0.125 2023-11-26 22:27:42,139 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3600213.3333333335, ans=0.0 2023-11-26 22:27:49,462 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3600280.0, ans=0.0 2023-11-26 22:27:50,448 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 540050 2023-11-26 22:27:51,866 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.07 vs. limit=10.0 2023-11-26 22:27:55,671 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 11000, loss[loss=0.06204, simple_loss=0.08998, pruned_loss=0.01006, audio_tagging_loss=0.006986, over 16196.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.09072, pruned_loss=0.01219, audio_tagging_loss=0.008607, over 3053013.16 frames. ], batch size: 59, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:28:00,324 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.52 vs. limit=10.0 2023-11-26 22:28:03,644 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3600346.6666666665, ans=0.1 2023-11-26 22:28:07,281 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 22:28:08,590 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3600413.3333333335, ans=0.2 2023-11-26 22:28:17,885 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.702e+01 8.990e+01 9.480e+01 9.957e+01 3.729e+02, threshold=1.896e+02, percent-clipped=1.0 2023-11-26 22:28:21,428 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3600480.0, ans=0.125 2023-11-26 22:28:22,344 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3600480.0, ans=0.0 2023-11-26 22:28:23,511 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3600480.0, ans=0.1 2023-11-26 22:28:38,966 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.19 vs. limit=15.0 2023-11-26 22:28:47,271 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 540100 2023-11-26 22:28:49,098 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3600613.3333333335, ans=0.125 2023-11-26 22:28:52,034 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 11050, loss[loss=0.0714, simple_loss=0.09868, pruned_loss=0.01158, audio_tagging_loss=0.01048, over 15834.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.09055, pruned_loss=0.01217, audio_tagging_loss=0.008691, over 3056077.47 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:29:12,274 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3600813.3333333335, ans=0.2 2023-11-26 22:29:14,878 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.79 vs. limit=15.0 2023-11-26 22:29:39,455 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3600946.6666666665, ans=0.0 2023-11-26 22:29:42,420 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 540150 2023-11-26 22:29:46,543 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 11100, loss[loss=0.06982, simple_loss=0.09876, pruned_loss=0.01099, audio_tagging_loss=0.009454, over 14846.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08945, pruned_loss=0.01195, audio_tagging_loss=0.008806, over 3050940.28 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:29:51,024 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3601013.3333333335, ans=0.125 2023-11-26 22:30:08,733 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.991e+01 9.032e+01 9.689e+01 1.034e+02 1.564e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-26 22:30:12,653 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 22:30:17,363 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3601146.6666666665, ans=0.1 2023-11-26 22:30:18,389 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3601146.6666666665, ans=0.0 2023-11-26 22:30:23,216 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.16 vs. limit=15.0 2023-11-26 22:30:26,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3601213.3333333335, ans=0.0 2023-11-26 22:30:37,334 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 540200 2023-11-26 22:30:42,447 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 11150, loss[loss=0.06318, simple_loss=0.0873, pruned_loss=0.00954, audio_tagging_loss=0.009991, over 15981.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.08984, pruned_loss=0.01209, audio_tagging_loss=0.008943, over 3056148.05 frames. ], batch size: 60, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:30:52,753 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3601413.3333333335, ans=0.07 2023-11-26 22:30:56,810 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.54 vs. limit=6.0 2023-11-26 22:31:01,306 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3601413.3333333335, ans=0.1 2023-11-26 22:31:30,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3601613.3333333335, ans=0.0 2023-11-26 22:31:33,874 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 540250 2023-11-26 22:31:38,611 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 11200, loss[loss=0.05732, simple_loss=0.08298, pruned_loss=0.007643, audio_tagging_loss=0.008185, over 16491.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08919, pruned_loss=0.01194, audio_tagging_loss=0.008999, over 3050554.41 frames. ], batch size: 62, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:32:01,319 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.000e+01 8.768e+01 9.515e+01 1.029e+02 1.320e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-26 22:32:09,166 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3601813.3333333335, ans=0.125 2023-11-26 22:32:16,652 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3601880.0, ans=0.1 2023-11-26 22:32:28,283 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3601946.6666666665, ans=0.125 2023-11-26 22:32:30,200 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 540300 2023-11-26 22:32:30,270 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3601946.6666666665, ans=0.1 2023-11-26 22:32:34,393 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 11250, loss[loss=0.06117, simple_loss=0.07673, pruned_loss=0.01411, audio_tagging_loss=0.008693, over 14652.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08893, pruned_loss=0.01195, audio_tagging_loss=0.009015, over 3053542.67 frames. ], batch size: 55, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:32:49,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3602080.0, ans=0.125 2023-11-26 22:33:00,166 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3602146.6666666665, ans=0.95 2023-11-26 22:33:25,523 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 540350 2023-11-26 22:33:29,738 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 11300, loss[loss=0.08464, simple_loss=0.1274, pruned_loss=0.01536, audio_tagging_loss=0.005599, over 15889.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08957, pruned_loss=0.01208, audio_tagging_loss=0.008827, over 3046209.27 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:33:30,133 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.03 vs. limit=15.0 2023-11-26 22:33:43,172 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.86 vs. limit=15.0 2023-11-26 22:33:49,625 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3602413.3333333335, ans=0.2 2023-11-26 22:33:49,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3602413.3333333335, ans=0.0 2023-11-26 22:33:54,061 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.703e+01 8.683e+01 9.336e+01 1.007e+02 1.340e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-26 22:34:07,051 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3602546.6666666665, ans=0.1 2023-11-26 22:34:09,293 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3602546.6666666665, ans=0.1 2023-11-26 22:34:21,810 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 540400 2023-11-26 22:34:21,964 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3602613.3333333335, ans=0.125 2023-11-26 22:34:25,429 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3602680.0, ans=0.035 2023-11-26 22:34:26,382 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 11350, loss[loss=0.06992, simple_loss=0.1077, pruned_loss=0.00894, audio_tagging_loss=0.007116, over 14928.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.08983, pruned_loss=0.01224, audio_tagging_loss=0.008771, over 3043876.89 frames. ], batch size: 52, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:34:44,597 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=3602746.6666666665, ans=0.02 2023-11-26 22:34:48,789 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3602813.3333333335, ans=0.0 2023-11-26 22:34:54,396 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.52 vs. limit=15.0 2023-11-26 22:35:01,560 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3602880.0, ans=0.1 2023-11-26 22:35:17,901 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 540450 2023-11-26 22:35:20,650 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3602946.6666666665, ans=0.2 2023-11-26 22:35:22,601 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 11400, loss[loss=0.07957, simple_loss=0.1191, pruned_loss=0.01499, audio_tagging_loss=0.005008, over 14957.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.09064, pruned_loss=0.01232, audio_tagging_loss=0.008534, over 3043879.03 frames. ], batch size: 55, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:35:31,351 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3603013.3333333335, ans=0.5 2023-11-26 22:35:43,151 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3603080.0, ans=0.1 2023-11-26 22:35:46,008 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.908e+01 8.997e+01 9.516e+01 1.035e+02 1.684e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-26 22:35:50,266 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3603146.6666666665, ans=0.125 2023-11-26 22:35:53,067 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3603146.6666666665, ans=0.2 2023-11-26 22:36:13,516 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 540500 2023-11-26 22:36:17,737 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 11450, loss[loss=0.06876, simple_loss=0.09294, pruned_loss=0.0131, audio_tagging_loss=0.009192, over 14244.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.09097, pruned_loss=0.01245, audio_tagging_loss=0.008501, over 3046919.71 frames. ], batch size: 53, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:36:17,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3603346.6666666665, ans=0.125 2023-11-26 22:36:24,742 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.40 vs. limit=12.0 2023-11-26 22:36:32,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3603413.3333333335, ans=0.025 2023-11-26 22:36:42,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3603480.0, ans=0.125 2023-11-26 22:37:02,540 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 22:37:09,962 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 540550 2023-11-26 22:37:14,185 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 11500, loss[loss=0.03903, simple_loss=0.05057, pruned_loss=0.006043, audio_tagging_loss=0.007701, over 16100.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.09034, pruned_loss=0.01245, audio_tagging_loss=0.008441, over 3049235.02 frames. ], batch size: 64, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:37:15,486 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3603680.0, ans=0.0 2023-11-26 22:37:25,342 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3603746.6666666665, ans=0.125 2023-11-26 22:37:36,519 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3603813.3333333335, ans=0.0 2023-11-26 22:37:37,302 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.778e+01 8.979e+01 9.575e+01 1.038e+02 1.869e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-26 22:37:41,869 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3603813.3333333335, ans=0.2 2023-11-26 22:38:02,372 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3603946.6666666665, ans=0.0 2023-11-26 22:38:02,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3603946.6666666665, ans=0.125 2023-11-26 22:38:05,438 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 540600 2023-11-26 22:38:09,897 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 11550, loss[loss=0.1062, simple_loss=0.1428, pruned_loss=0.02753, audio_tagging_loss=0.007264, over 14888.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.09091, pruned_loss=0.0126, audio_tagging_loss=0.008456, over 3050583.49 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:38:19,179 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 22:38:29,671 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.31 vs. limit=15.0 2023-11-26 22:38:46,080 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 22:38:48,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3604213.3333333335, ans=0.125 2023-11-26 22:38:48,778 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.76 vs. limit=15.0 2023-11-26 22:38:53,123 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.47 vs. limit=15.0 2023-11-26 22:38:59,951 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.27 vs. limit=15.0 2023-11-26 22:39:01,625 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 540650 2023-11-26 22:39:05,789 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 11600, loss[loss=0.05812, simple_loss=0.08524, pruned_loss=0.005536, audio_tagging_loss=0.00996, over 14326.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.09111, pruned_loss=0.0126, audio_tagging_loss=0.008471, over 3050553.68 frames. ], batch size: 54, lr: 1.49e-03, grad_scale: 32.0 2023-11-26 22:39:13,910 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3604346.6666666665, ans=0.5 2023-11-26 22:39:30,778 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.533e+01 8.938e+01 9.802e+01 1.035e+02 1.553e+02, threshold=1.960e+02, percent-clipped=0.0 2023-11-26 22:39:43,238 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3604546.6666666665, ans=0.0 2023-11-26 22:39:50,787 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3604613.3333333335, ans=0.0 2023-11-26 22:39:58,049 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 540700 2023-11-26 22:39:59,294 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3604613.3333333335, ans=0.1 2023-11-26 22:40:00,632 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.74 vs. limit=15.0 2023-11-26 22:40:02,169 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 11650, loss[loss=0.054, simple_loss=0.08672, pruned_loss=0.0048, audio_tagging_loss=0.005837, over 15109.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.09032, pruned_loss=0.01245, audio_tagging_loss=0.008579, over 3051506.54 frames. ], batch size: 58, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:40:14,558 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3604746.6666666665, ans=0.2 2023-11-26 22:40:29,112 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3604813.3333333335, ans=0.0 2023-11-26 22:40:48,341 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3604946.6666666665, ans=0.0 2023-11-26 22:40:53,887 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 540750 2023-11-26 22:40:54,619 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.79 vs. limit=15.0 2023-11-26 22:40:55,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3604946.6666666665, ans=0.1 2023-11-26 22:40:58,075 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 11700, loss[loss=0.04527, simple_loss=0.06121, pruned_loss=0.006419, audio_tagging_loss=0.008246, over 16664.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08929, pruned_loss=0.01227, audio_tagging_loss=0.008663, over 3053286.57 frames. ], batch size: 64, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:41:10,015 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3605080.0, ans=0.0 2023-11-26 22:41:19,604 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3605146.6666666665, ans=0.1 2023-11-26 22:41:22,841 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.60 vs. limit=10.0 2023-11-26 22:41:23,207 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.360e+01 8.888e+01 9.584e+01 1.031e+02 1.555e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-26 22:41:25,593 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3605146.6666666665, ans=0.1 2023-11-26 22:41:37,835 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3605213.3333333335, ans=0.2 2023-11-26 22:41:49,351 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 540800 2023-11-26 22:41:54,391 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 11750, loss[loss=0.05845, simple_loss=0.07565, pruned_loss=0.009736, audio_tagging_loss=0.01089, over 14583.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.08859, pruned_loss=0.01204, audio_tagging_loss=0.008766, over 3053474.85 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:41:54,589 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3605346.6666666665, ans=0.125 2023-11-26 22:42:23,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3605480.0, ans=0.125 2023-11-26 22:42:24,644 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3605480.0, ans=0.125 2023-11-26 22:42:45,801 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 540850 2023-11-26 22:42:48,613 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3605613.3333333335, ans=0.125 2023-11-26 22:42:50,423 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 11800, loss[loss=0.065, simple_loss=0.09909, pruned_loss=0.008391, audio_tagging_loss=0.00706, over 16296.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08896, pruned_loss=0.01197, audio_tagging_loss=0.008684, over 3050844.10 frames. ], batch size: 60, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:42:54,819 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3605680.0, ans=0.125 2023-11-26 22:42:59,514 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3605680.0, ans=0.1 2023-11-26 22:42:59,591 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3605680.0, ans=0.125 2023-11-26 22:43:04,973 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3605746.6666666665, ans=0.125 2023-11-26 22:43:14,288 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.320e+01 8.659e+01 9.498e+01 1.042e+02 1.310e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-26 22:43:35,502 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.35 vs. limit=15.0 2023-11-26 22:43:42,127 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 540900 2023-11-26 22:43:46,313 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 11850, loss[loss=0.06382, simple_loss=0.07593, pruned_loss=0.01506, audio_tagging_loss=0.0108, over 14939.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.08862, pruned_loss=0.01196, audio_tagging_loss=0.008818, over 3043642.54 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:43:47,583 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3606013.3333333335, ans=0.125 2023-11-26 22:44:17,908 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3606146.6666666665, ans=0.125 2023-11-26 22:44:18,058 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3606146.6666666665, ans=0.2 2023-11-26 22:44:26,991 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3606213.3333333335, ans=0.125 2023-11-26 22:44:29,188 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3606213.3333333335, ans=0.125 2023-11-26 22:44:36,749 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.93 vs. limit=22.5 2023-11-26 22:44:37,455 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 540950 2023-11-26 22:44:41,651 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 11900, loss[loss=0.07324, simple_loss=0.09959, pruned_loss=0.0127, audio_tagging_loss=0.01075, over 15390.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08885, pruned_loss=0.01205, audio_tagging_loss=0.008945, over 3042701.25 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:44:59,257 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3606413.3333333335, ans=0.1 2023-11-26 22:45:04,826 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.95 vs. limit=15.0 2023-11-26 22:45:07,012 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.637e+01 8.979e+01 9.678e+01 1.018e+02 1.926e+02, threshold=1.936e+02, percent-clipped=1.0 2023-11-26 22:45:12,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3606480.0, ans=0.05 2023-11-26 22:45:21,088 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3606546.6666666665, ans=0.125 2023-11-26 22:45:29,778 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3606613.3333333335, ans=0.1 2023-11-26 22:45:33,275 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 541000 2023-11-26 22:45:33,508 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3606613.3333333335, ans=0.125 2023-11-26 22:45:36,169 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.28 vs. limit=10.0 2023-11-26 22:45:36,713 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3606680.0, ans=0.125 2023-11-26 22:45:38,254 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 11950, loss[loss=0.05063, simple_loss=0.06408, pruned_loss=0.009827, audio_tagging_loss=0.008766, over 13744.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08953, pruned_loss=0.01204, audio_tagging_loss=0.008911, over 3044873.40 frames. ], batch size: 53, lr: 1.49e-03, grad_scale: 16.0 2023-11-26 22:45:40,059 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.30 vs. limit=15.0 2023-11-26 22:45:45,786 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3606680.0, ans=0.125 2023-11-26 22:46:05,330 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3606813.3333333335, ans=0.0 2023-11-26 22:46:09,577 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3606880.0, ans=10.0 2023-11-26 22:46:14,213 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.42 vs. limit=15.0 2023-11-26 22:46:27,978 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 541050 2023-11-26 22:46:31,995 INFO [train_asr.py:1235] (3/4) Epoch 45, batch 12000, loss[loss=0.07214, simple_loss=0.09773, pruned_loss=0.01595, audio_tagging_loss=0.007332, over 15391.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.08963, pruned_loss=0.01216, audio_tagging_loss=0.008943, over 3039413.75 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 32.0 2023-11-26 22:46:31,996 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-26 22:47:04,436 INFO [train_asr.py:1267] (3/4) Epoch 45, validation: loss=0.05747, simple_loss=0.05048, pruned_loss=0.005268, audio_tagging_loss=0.02696, over 4681554.00 frames. 2023-11-26 22:47:04,436 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-26 22:47:10,746 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3607013.3333333335, ans=0.2 2023-11-26 22:47:11,900 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3607013.3333333335, ans=0.0 2023-11-26 22:47:12,801 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3607013.3333333335, ans=0.2 2023-11-26 22:47:18,230 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.45 vs. limit=6.0 2023-11-26 22:47:26,931 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.640e+01 9.102e+01 9.829e+01 1.057e+02 1.323e+02, threshold=1.966e+02, percent-clipped=0.0 2023-11-26 22:47:58,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3607186.6666666665, ans=0.0 2023-11-26 22:47:58,257 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.91 vs. limit=15.0 2023-11-26 22:48:02,330 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 0, loss[loss=0.08053, simple_loss=0.09232, pruned_loss=0.01364, audio_tagging_loss=0.02073, over 15318.00 frames. ], tot_loss[loss=0.08053, simple_loss=0.09232, pruned_loss=0.01364, audio_tagging_loss=0.02073, over 15318.00 frames. ], batch size: 57, lr: 1.48e-03, grad_scale: 32.0 2023-11-26 22:48:02,331 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-26 22:48:12,993 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.3349, 4.8221, 5.1970, 4.5673], device='cuda:3') 2023-11-26 22:48:33,842 INFO [train_asr.py:1267] (3/4) Epoch 46, validation: loss=0.05779, simple_loss=0.05056, pruned_loss=0.005325, audio_tagging_loss=0.02718, over 4681554.00 frames. 2023-11-26 22:48:33,843 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-26 22:48:36,136 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3607186.6666666665, ans=0.0 2023-11-26 22:48:44,584 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3607253.3333333335, ans=0.0 2023-11-26 22:48:51,309 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3607253.3333333335, ans=0.0 2023-11-26 22:48:51,365 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3607253.3333333335, ans=0.125 2023-11-26 22:48:55,509 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 541100 2023-11-26 22:49:05,444 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3607386.6666666665, ans=0.0 2023-11-26 22:49:06,435 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 22:49:28,561 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.77 vs. limit=15.0 2023-11-26 22:49:28,984 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 50, loss[loss=0.09759, simple_loss=0.1205, pruned_loss=0.0195, audio_tagging_loss=0.01782, over 15747.00 frames. ], tot_loss[loss=0.07518, simple_loss=0.09091, pruned_loss=0.01263, audio_tagging_loss=0.0171, over 695558.59 frames. ], batch size: 57, lr: 1.48e-03, grad_scale: 32.0 2023-11-26 22:49:31,157 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.07 vs. limit=15.0 2023-11-26 22:49:50,758 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 541150 2023-11-26 22:49:52,013 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-26 22:49:54,580 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3607653.3333333335, ans=0.125 2023-11-26 22:50:03,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3607720.0, ans=0.0 2023-11-26 22:50:05,870 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3607720.0, ans=0.125 2023-11-26 22:50:20,083 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.136e+01 9.821e+01 1.049e+02 1.148e+02 1.594e+02, threshold=2.098e+02, percent-clipped=0.0 2023-11-26 22:50:24,342 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 100, loss[loss=0.06414, simple_loss=0.08581, pruned_loss=0.01021, audio_tagging_loss=0.01103, over 15340.00 frames. ], tot_loss[loss=0.07312, simple_loss=0.08975, pruned_loss=0.01213, audio_tagging_loss=0.01612, over 1214773.00 frames. ], batch size: 57, lr: 1.48e-03, grad_scale: 32.0 2023-11-26 22:50:40,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3607920.0, ans=0.07 2023-11-26 22:50:47,217 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 541200 2023-11-26 22:50:49,719 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3607986.6666666665, ans=0.125 2023-11-26 22:50:54,922 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3607986.6666666665, ans=0.0 2023-11-26 22:50:58,767 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3608053.3333333335, ans=0.125 2023-11-26 22:51:15,818 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3608120.0, ans=0.125 2023-11-26 22:51:20,223 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 150, loss[loss=0.06995, simple_loss=0.09659, pruned_loss=0.01306, audio_tagging_loss=0.008598, over 15281.00 frames. ], tot_loss[loss=0.07103, simple_loss=0.08893, pruned_loss=0.01204, audio_tagging_loss=0.01452, over 1622020.42 frames. ], batch size: 56, lr: 1.48e-03, grad_scale: 32.0 2023-11-26 22:51:26,952 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3608186.6666666665, ans=0.125 2023-11-26 22:51:35,966 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.85 vs. limit=22.5 2023-11-26 22:51:43,311 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 541250 2023-11-26 22:52:12,648 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.648e+01 9.369e+01 9.812e+01 1.037e+02 1.267e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-26 22:52:16,856 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 200, loss[loss=0.07263, simple_loss=0.1015, pruned_loss=0.01243, audio_tagging_loss=0.009459, over 15299.00 frames. ], tot_loss[loss=0.07007, simple_loss=0.09006, pruned_loss=0.01217, audio_tagging_loss=0.01287, over 1942807.47 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 22:52:30,351 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3608586.6666666665, ans=0.2 2023-11-26 22:52:38,593 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 541300 2023-11-26 22:52:50,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3608720.0, ans=0.1 2023-11-26 22:52:54,103 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3608720.0, ans=0.0 2023-11-26 22:52:59,505 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3608720.0, ans=0.125 2023-11-26 22:53:09,621 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3608786.6666666665, ans=0.0 2023-11-26 22:53:10,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3608786.6666666665, ans=0.1 2023-11-26 22:53:12,694 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 250, loss[loss=0.06116, simple_loss=0.08487, pruned_loss=0.009308, audio_tagging_loss=0.009422, over 15215.00 frames. ], tot_loss[loss=0.06919, simple_loss=0.09021, pruned_loss=0.01242, audio_tagging_loss=0.01166, over 2185252.74 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 22:53:24,832 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3608920.0, ans=0.125 2023-11-26 22:53:35,460 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 541350 2023-11-26 22:53:44,536 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3608986.6666666665, ans=0.1 2023-11-26 22:53:56,959 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3609120.0, ans=0.125 2023-11-26 22:54:01,351 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3609120.0, ans=0.125 2023-11-26 22:54:05,261 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.789e+01 9.060e+01 9.556e+01 1.038e+02 1.375e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-26 22:54:09,070 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 300, loss[loss=0.06121, simple_loss=0.08289, pruned_loss=0.01186, audio_tagging_loss=0.007909, over 15026.00 frames. ], tot_loss[loss=0.0684, simple_loss=0.09035, pruned_loss=0.01241, audio_tagging_loss=0.0108, over 2375079.94 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 22:54:32,032 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 541400 2023-11-26 22:54:36,774 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3609320.0, ans=0.125 2023-11-26 22:54:36,830 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3609320.0, ans=0.125 2023-11-26 22:54:52,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3609453.3333333335, ans=0.125 2023-11-26 22:54:57,991 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.91 vs. limit=15.0 2023-11-26 22:55:04,954 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 350, loss[loss=0.06013, simple_loss=0.09143, pruned_loss=0.006789, audio_tagging_loss=0.007627, over 15090.00 frames. ], tot_loss[loss=0.06733, simple_loss=0.08994, pruned_loss=0.01222, audio_tagging_loss=0.01014, over 2531019.09 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 22:55:11,022 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3609520.0, ans=0.0 2023-11-26 22:55:13,865 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3609520.0, ans=0.1 2023-11-26 22:55:18,071 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3609586.6666666665, ans=0.125 2023-11-26 22:55:24,344 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3609586.6666666665, ans=0.0 2023-11-26 22:55:27,481 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 541450 2023-11-26 22:55:28,654 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3609653.3333333335, ans=0.1 2023-11-26 22:55:57,956 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.483e+01 8.928e+01 9.545e+01 1.017e+02 1.635e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-26 22:56:01,277 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 400, loss[loss=0.06508, simple_loss=0.09809, pruned_loss=0.01085, audio_tagging_loss=0.005191, over 15147.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.08944, pruned_loss=0.01214, audio_tagging_loss=0.009834, over 2641615.21 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 22:56:08,915 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3609853.3333333335, ans=0.0 2023-11-26 22:56:11,208 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3609920.0, ans=0.125 2023-11-26 22:56:11,306 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.64 vs. limit=15.0 2023-11-26 22:56:13,329 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3609920.0, ans=0.0 2023-11-26 22:56:20,703 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3609920.0, ans=0.0 2023-11-26 22:56:23,744 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 541500 2023-11-26 22:56:31,196 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3609986.6666666665, ans=0.05 2023-11-26 22:56:43,900 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3610053.3333333335, ans=0.0 2023-11-26 22:56:43,986 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3610053.3333333335, ans=0.125 2023-11-26 22:56:56,571 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 450, loss[loss=0.04875, simple_loss=0.06329, pruned_loss=0.004811, audio_tagging_loss=0.0123, over 16080.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.08951, pruned_loss=0.01226, audio_tagging_loss=0.009536, over 2731884.56 frames. ], batch size: 63, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 22:57:07,827 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.12 vs. limit=12.0 2023-11-26 22:57:18,532 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3610320.0, ans=0.125 2023-11-26 22:57:20,019 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 541550 2023-11-26 22:57:20,189 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3610320.0, ans=0.125 2023-11-26 22:57:21,212 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3610320.0, ans=0.125 2023-11-26 22:57:49,768 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.752e+01 9.008e+01 9.621e+01 1.046e+02 1.513e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-26 22:57:53,099 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 500, loss[loss=0.05856, simple_loss=0.0823, pruned_loss=0.008149, audio_tagging_loss=0.009259, over 15144.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.08943, pruned_loss=0.01228, audio_tagging_loss=0.009399, over 2800153.11 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 22:58:10,410 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.72 vs. limit=12.0 2023-11-26 22:58:15,235 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 541600 2023-11-26 22:58:31,713 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff3.min_abs, batch_count=3610720.0, ans=0.2 2023-11-26 22:58:42,230 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3610786.6666666665, ans=0.1 2023-11-26 22:58:48,828 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3610853.3333333335, ans=0.125 2023-11-26 22:58:49,531 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 550, loss[loss=0.05284, simple_loss=0.0778, pruned_loss=0.007413, audio_tagging_loss=0.006525, over 16415.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08864, pruned_loss=0.01195, audio_tagging_loss=0.009239, over 2856865.53 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 22:58:53,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3610853.3333333335, ans=0.125 2023-11-26 22:59:03,437 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3610920.0, ans=0.07 2023-11-26 22:59:03,577 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3610920.0, ans=0.125 2023-11-26 22:59:07,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3610920.0, ans=0.0 2023-11-26 22:59:11,752 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 541650 2023-11-26 22:59:15,458 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.89 vs. limit=22.5 2023-11-26 22:59:26,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3611053.3333333335, ans=0.125 2023-11-26 22:59:31,154 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3611053.3333333335, ans=0.125 2023-11-26 22:59:42,592 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.191e+01 8.846e+01 9.307e+01 1.018e+02 1.266e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-26 22:59:44,776 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 600, loss[loss=0.06318, simple_loss=0.08015, pruned_loss=0.01114, audio_tagging_loss=0.01196, over 16539.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.08952, pruned_loss=0.01192, audio_tagging_loss=0.009177, over 2901921.97 frames. ], batch size: 63, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 22:59:47,603 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3611186.6666666665, ans=0.125 2023-11-26 22:59:49,707 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3611186.6666666665, ans=0.0 2023-11-26 22:59:58,950 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3611253.3333333335, ans=0.125 2023-11-26 23:00:01,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=3611253.3333333335, ans=22.5 2023-11-26 23:00:03,071 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3611253.3333333335, ans=0.0 2023-11-26 23:00:03,512 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.78 vs. limit=22.5 2023-11-26 23:00:04,198 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3611253.3333333335, ans=0.0 2023-11-26 23:00:07,242 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 541700 2023-11-26 23:00:13,483 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.24 vs. limit=15.0 2023-11-26 23:00:14,771 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3611320.0, ans=0.125 2023-11-26 23:00:15,163 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.20 vs. limit=22.5 2023-11-26 23:00:18,090 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3611386.6666666665, ans=0.125 2023-11-26 23:00:20,179 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 23:00:24,383 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3611386.6666666665, ans=0.0 2023-11-26 23:00:32,281 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3611453.3333333335, ans=0.125 2023-11-26 23:00:41,241 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 650, loss[loss=0.05839, simple_loss=0.0794, pruned_loss=0.01014, audio_tagging_loss=0.008552, over 15934.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08936, pruned_loss=0.01187, audio_tagging_loss=0.009146, over 2936500.02 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:00:44,012 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.60 vs. limit=15.0 2023-11-26 23:00:51,502 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.85 vs. limit=22.5 2023-11-26 23:00:53,797 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3611586.6666666665, ans=0.0 2023-11-26 23:01:03,902 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 541750 2023-11-26 23:01:09,471 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=3611653.3333333335, ans=0.5 2023-11-26 23:01:32,919 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3611786.6666666665, ans=0.125 2023-11-26 23:01:35,265 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.404e+01 8.767e+01 9.555e+01 1.054e+02 1.204e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-26 23:01:37,484 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 700, loss[loss=0.05593, simple_loss=0.07151, pruned_loss=0.01033, audio_tagging_loss=0.009852, over 14891.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.08957, pruned_loss=0.01198, audio_tagging_loss=0.009029, over 2955273.39 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:01:46,751 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3611853.3333333335, ans=0.1 2023-11-26 23:01:59,830 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 541800 2023-11-26 23:02:33,560 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 750, loss[loss=0.07109, simple_loss=0.09455, pruned_loss=0.0142, audio_tagging_loss=0.009611, over 14337.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08941, pruned_loss=0.01204, audio_tagging_loss=0.009029, over 2973583.43 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:02:35,978 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3612186.6666666665, ans=0.0 2023-11-26 23:02:46,607 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3612253.3333333335, ans=0.125 2023-11-26 23:02:55,925 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 541850 2023-11-26 23:02:57,018 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3612320.0, ans=0.0 2023-11-26 23:03:00,869 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3612320.0, ans=0.125 2023-11-26 23:03:14,143 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3612386.6666666665, ans=0.125 2023-11-26 23:03:16,244 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3612386.6666666665, ans=0.07 2023-11-26 23:03:21,040 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3612453.3333333335, ans=0.125 2023-11-26 23:03:24,532 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.38 vs. limit=15.0 2023-11-26 23:03:27,821 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.932e+01 9.015e+01 9.591e+01 1.028e+02 1.389e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-26 23:03:29,963 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 800, loss[loss=0.08943, simple_loss=0.1305, pruned_loss=0.01692, audio_tagging_loss=0.007263, over 15185.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.08955, pruned_loss=0.012, audio_tagging_loss=0.009046, over 2990459.58 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:03:34,574 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3612520.0, ans=0.04949747468305833 2023-11-26 23:03:44,710 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3612586.6666666665, ans=0.0 2023-11-26 23:03:52,077 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 541900 2023-11-26 23:04:00,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3612653.3333333335, ans=0.2 2023-11-26 23:04:01,135 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3612653.3333333335, ans=0.1 2023-11-26 23:04:05,049 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3612720.0, ans=0.125 2023-11-26 23:04:11,753 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3612720.0, ans=0.125 2023-11-26 23:04:13,807 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3612786.6666666665, ans=0.0 2023-11-26 23:04:22,263 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3612786.6666666665, ans=0.125 2023-11-26 23:04:25,723 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 850, loss[loss=0.05045, simple_loss=0.057, pruned_loss=0.00965, audio_tagging_loss=0.0123, over 14704.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08935, pruned_loss=0.01202, audio_tagging_loss=0.00908, over 2998858.19 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:04:33,401 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3612853.3333333335, ans=0.0 2023-11-26 23:04:48,355 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 541950 2023-11-26 23:05:19,280 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.559e+01 8.857e+01 9.463e+01 1.019e+02 1.516e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-26 23:05:21,978 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 900, loss[loss=0.08104, simple_loss=0.1168, pruned_loss=0.01519, audio_tagging_loss=0.007434, over 16501.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.08954, pruned_loss=0.01206, audio_tagging_loss=0.00921, over 3012448.81 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:05:25,514 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3613186.6666666665, ans=0.0 2023-11-26 23:05:35,117 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3613253.3333333335, ans=0.1 2023-11-26 23:05:36,187 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3613253.3333333335, ans=0.0 2023-11-26 23:05:43,942 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=3613320.0, ans=15.0 2023-11-26 23:05:44,498 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 542000 2023-11-26 23:05:45,644 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3613320.0, ans=0.125 2023-11-26 23:06:15,163 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3613453.3333333335, ans=0.125 2023-11-26 23:06:18,771 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 950, loss[loss=0.0591, simple_loss=0.07976, pruned_loss=0.01136, audio_tagging_loss=0.00786, over 14487.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.08996, pruned_loss=0.01223, audio_tagging_loss=0.00899, over 3017273.22 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:06:34,618 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 23:06:37,050 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.43 vs. limit=22.5 2023-11-26 23:06:40,884 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 542050 2023-11-26 23:07:01,739 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3613720.0, ans=0.125 2023-11-26 23:07:13,290 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.799e+01 8.708e+01 9.328e+01 1.031e+02 1.282e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-26 23:07:14,455 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 1000, loss[loss=0.06717, simple_loss=0.0862, pruned_loss=0.01545, audio_tagging_loss=0.008623, over 14632.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.09042, pruned_loss=0.0123, audio_tagging_loss=0.008815, over 3021609.02 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:07:16,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3613853.3333333335, ans=0.0 2023-11-26 23:07:27,874 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3613920.0, ans=0.125 2023-11-26 23:07:33,844 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3613920.0, ans=0.125 2023-11-26 23:07:37,446 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 542100 2023-11-26 23:07:38,448 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 23:07:47,717 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3614053.3333333335, ans=0.125 2023-11-26 23:07:49,134 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.79 vs. limit=15.0 2023-11-26 23:08:10,452 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 1050, loss[loss=0.06798, simple_loss=0.09832, pruned_loss=0.01071, audio_tagging_loss=0.008107, over 16311.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.09011, pruned_loss=0.01206, audio_tagging_loss=0.008733, over 3028706.56 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:08:33,556 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 542150 2023-11-26 23:08:48,035 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3614386.6666666665, ans=0.125 2023-11-26 23:08:53,397 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3614386.6666666665, ans=0.07 2023-11-26 23:09:03,548 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3614453.3333333335, ans=0.0 2023-11-26 23:09:05,473 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.666e+01 8.633e+01 9.310e+01 1.034e+02 1.583e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-26 23:09:06,535 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 1100, loss[loss=0.06098, simple_loss=0.08176, pruned_loss=0.01304, audio_tagging_loss=0.007058, over 14256.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08926, pruned_loss=0.01202, audio_tagging_loss=0.00871, over 3033863.26 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:09:09,284 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 23:09:29,089 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 542200 2023-11-26 23:09:34,707 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3614653.3333333335, ans=0.0 2023-11-26 23:09:46,642 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 23:09:49,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3614720.0, ans=0.04949747468305833 2023-11-26 23:09:51,459 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3614786.6666666665, ans=0.2 2023-11-26 23:09:58,816 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 23:10:02,936 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 1150, loss[loss=0.05887, simple_loss=0.08766, pruned_loss=0.008471, audio_tagging_loss=0.006568, over 14018.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.0892, pruned_loss=0.01203, audio_tagging_loss=0.008692, over 3028009.00 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:10:15,536 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3614920.0, ans=0.125 2023-11-26 23:10:16,684 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3614920.0, ans=0.125 2023-11-26 23:10:17,588 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3614920.0, ans=0.125 2023-11-26 23:10:19,074 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.05 vs. limit=22.5 2023-11-26 23:10:25,019 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 542250 2023-11-26 23:10:46,131 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3615053.3333333335, ans=0.0 2023-11-26 23:10:49,757 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3615120.0, ans=0.125 2023-11-26 23:10:57,956 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.595e+01 8.828e+01 9.396e+01 9.982e+01 1.339e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-26 23:10:59,042 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 1200, loss[loss=0.06, simple_loss=0.08019, pruned_loss=0.01187, audio_tagging_loss=0.008035, over 14305.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.089, pruned_loss=0.0121, audio_tagging_loss=0.008637, over 3027401.71 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:11:01,526 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3615186.6666666665, ans=0.0 2023-11-26 23:11:05,214 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.33 vs. limit=8.0 2023-11-26 23:11:21,986 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 542300 2023-11-26 23:11:37,518 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3615386.6666666665, ans=0.125 2023-11-26 23:11:54,895 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 1250, loss[loss=0.07488, simple_loss=0.1053, pruned_loss=0.01501, audio_tagging_loss=0.007215, over 14839.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.0893, pruned_loss=0.01213, audio_tagging_loss=0.008605, over 3031037.05 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:12:17,456 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 542350 2023-11-26 23:12:48,136 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3615786.6666666665, ans=0.2 2023-11-26 23:12:49,947 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.362e+01 8.892e+01 9.390e+01 1.002e+02 1.440e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-26 23:12:51,052 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 1300, loss[loss=0.0491, simple_loss=0.06791, pruned_loss=0.007898, audio_tagging_loss=0.007253, over 13996.00 frames. ], tot_loss[loss=0.06493, simple_loss=0.08894, pruned_loss=0.0119, audio_tagging_loss=0.00856, over 3033115.45 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:12:57,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3615853.3333333335, ans=0.2 2023-11-26 23:13:04,668 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3615920.0, ans=0.2 2023-11-26 23:13:12,893 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 542400 2023-11-26 23:13:15,508 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3615986.6666666665, ans=0.1 2023-11-26 23:13:20,286 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3615986.6666666665, ans=0.1 2023-11-26 23:13:21,428 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3615986.6666666665, ans=0.125 2023-11-26 23:13:37,320 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3616120.0, ans=0.0 2023-11-26 23:13:47,132 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 1350, loss[loss=0.05035, simple_loss=0.06034, pruned_loss=0.008191, audio_tagging_loss=0.01199, over 17218.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.08853, pruned_loss=0.01195, audio_tagging_loss=0.008584, over 3031781.60 frames. ], batch size: 65, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:13:57,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3616253.3333333335, ans=0.125 2023-11-26 23:14:09,906 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 542450 2023-11-26 23:14:13,204 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3616320.0, ans=0.0 2023-11-26 23:14:18,838 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.92 vs. limit=22.5 2023-11-26 23:14:26,377 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 23:14:30,911 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3616453.3333333335, ans=0.2 2023-11-26 23:14:34,215 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.72 vs. limit=15.0 2023-11-26 23:14:41,708 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.060e+01 9.028e+01 9.621e+01 1.020e+02 1.402e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-26 23:14:42,811 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 1400, loss[loss=0.05482, simple_loss=0.0712, pruned_loss=0.01189, audio_tagging_loss=0.007333, over 14735.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.08865, pruned_loss=0.01207, audio_tagging_loss=0.008678, over 3032862.78 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:15:05,790 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 542500 2023-11-26 23:15:21,066 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.05 vs. limit=12.0 2023-11-26 23:15:32,685 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3616786.6666666665, ans=0.125 2023-11-26 23:15:34,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3616786.6666666665, ans=0.125 2023-11-26 23:15:39,458 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 1450, loss[loss=0.06708, simple_loss=0.09493, pruned_loss=0.0121, audio_tagging_loss=0.007519, over 15766.00 frames. ], tot_loss[loss=0.066, simple_loss=0.09009, pruned_loss=0.01223, audio_tagging_loss=0.008721, over 3041572.09 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:15:44,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3616853.3333333335, ans=0.125 2023-11-26 23:15:46,078 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3616853.3333333335, ans=0.1 2023-11-26 23:15:48,772 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3616853.3333333335, ans=0.1 2023-11-26 23:15:49,842 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3616920.0, ans=0.1 2023-11-26 23:16:01,443 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 542550 2023-11-26 23:16:15,496 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3617053.3333333335, ans=0.2 2023-11-26 23:16:17,661 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3617053.3333333335, ans=0.125 2023-11-26 23:16:30,537 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3617120.0, ans=0.125 2023-11-26 23:16:34,552 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.043e+01 9.029e+01 9.878e+01 1.085e+02 1.417e+02, threshold=1.976e+02, percent-clipped=0.0 2023-11-26 23:16:35,646 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 1500, loss[loss=0.07793, simple_loss=0.1133, pruned_loss=0.01428, audio_tagging_loss=0.007022, over 14071.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.09026, pruned_loss=0.01218, audio_tagging_loss=0.008833, over 3042186.77 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:16:40,487 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.04 vs. limit=15.0 2023-11-26 23:16:41,703 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.10 vs. limit=15.0 2023-11-26 23:16:57,234 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3617320.0, ans=0.125 2023-11-26 23:16:58,175 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 542600 2023-11-26 23:17:30,327 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3617520.0, ans=0.1 2023-11-26 23:17:31,313 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 1550, loss[loss=0.07105, simple_loss=0.09693, pruned_loss=0.01315, audio_tagging_loss=0.009439, over 16007.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08967, pruned_loss=0.01202, audio_tagging_loss=0.008761, over 3041244.07 frames. ], batch size: 62, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:17:39,903 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3617520.0, ans=0.125 2023-11-26 23:17:51,295 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3617586.6666666665, ans=0.05 2023-11-26 23:17:54,307 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 542650 2023-11-26 23:18:03,474 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3617653.3333333335, ans=0.125 2023-11-26 23:18:19,129 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.85 vs. limit=15.0 2023-11-26 23:18:26,574 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.455e+01 9.195e+01 9.800e+01 1.042e+02 1.280e+02, threshold=1.960e+02, percent-clipped=0.0 2023-11-26 23:18:26,911 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3617853.3333333335, ans=0.2 2023-11-26 23:18:27,655 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 1600, loss[loss=0.06504, simple_loss=0.08599, pruned_loss=0.01196, audio_tagging_loss=0.01009, over 15067.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08946, pruned_loss=0.01199, audio_tagging_loss=0.008846, over 3049121.45 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:18:32,544 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.02 vs. limit=15.0 2023-11-26 23:18:49,981 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 542700 2023-11-26 23:19:10,403 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3618053.3333333335, ans=0.125 2023-11-26 23:19:20,980 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3618120.0, ans=0.2 2023-11-26 23:19:24,000 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 1650, loss[loss=0.07423, simple_loss=0.1185, pruned_loss=0.009963, audio_tagging_loss=0.005032, over 15156.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08951, pruned_loss=0.01201, audio_tagging_loss=0.008853, over 3046038.33 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:19:33,743 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3618253.3333333335, ans=0.1 2023-11-26 23:19:45,816 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 542750 2023-11-26 23:19:49,921 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.26 vs. limit=12.0 2023-11-26 23:19:56,029 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3618386.6666666665, ans=0.125 2023-11-26 23:20:03,886 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.35 vs. limit=22.5 2023-11-26 23:20:19,455 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.128e+01 8.999e+01 9.528e+01 1.009e+02 1.834e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-26 23:20:19,494 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 1700, loss[loss=0.08993, simple_loss=0.1326, pruned_loss=0.01868, audio_tagging_loss=0.004924, over 16340.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08906, pruned_loss=0.01207, audio_tagging_loss=0.008789, over 3042487.76 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:20:37,056 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.95 vs. limit=15.0 2023-11-26 23:20:41,991 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 542800 2023-11-26 23:20:55,653 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=3618720.0, ans=0.5 2023-11-26 23:21:15,667 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 1750, loss[loss=0.04414, simple_loss=0.05316, pruned_loss=0.003985, audio_tagging_loss=0.01358, over 16525.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.0887, pruned_loss=0.01193, audio_tagging_loss=0.008755, over 3048806.16 frames. ], batch size: 63, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:21:17,204 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.28 vs. limit=15.0 2023-11-26 23:21:22,249 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3618853.3333333335, ans=0.125 2023-11-26 23:21:28,130 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3618920.0, ans=0.125 2023-11-26 23:21:38,134 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 542850 2023-11-26 23:21:38,228 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3618986.6666666665, ans=0.125 2023-11-26 23:21:40,491 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 23:21:40,741 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.97 vs. limit=22.5 2023-11-26 23:21:58,553 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3619053.3333333335, ans=0.125 2023-11-26 23:22:11,433 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.723e+01 8.875e+01 9.667e+01 1.011e+02 1.829e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-26 23:22:11,464 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 1800, loss[loss=0.0474, simple_loss=0.06721, pruned_loss=0.005327, audio_tagging_loss=0.008464, over 14102.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.0892, pruned_loss=0.01197, audio_tagging_loss=0.008673, over 3049001.34 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:22:15,508 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3619186.6666666665, ans=0.1 2023-11-26 23:22:28,402 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3619253.3333333335, ans=0.025 2023-11-26 23:22:28,620 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.33 vs. limit=12.0 2023-11-26 23:22:29,393 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3619253.3333333335, ans=0.125 2023-11-26 23:22:33,123 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3619320.0, ans=10.0 2023-11-26 23:22:34,112 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 542900 2023-11-26 23:22:36,315 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3619320.0, ans=0.1 2023-11-26 23:22:55,062 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.27 vs. limit=15.0 2023-11-26 23:23:06,164 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.22 vs. limit=22.5 2023-11-26 23:23:07,871 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 1850, loss[loss=0.06206, simple_loss=0.07987, pruned_loss=0.01052, audio_tagging_loss=0.0116, over 16922.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.08887, pruned_loss=0.01198, audio_tagging_loss=0.008639, over 3046944.59 frames. ], batch size: 64, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:23:11,565 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.23 vs. limit=10.0 2023-11-26 23:23:20,790 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3619586.6666666665, ans=0.125 2023-11-26 23:23:24,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3619586.6666666665, ans=0.035 2023-11-26 23:23:30,134 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 542950 2023-11-26 23:24:00,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3619786.6666666665, ans=0.125 2023-11-26 23:24:04,167 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 1900, loss[loss=0.05061, simple_loss=0.06738, pruned_loss=0.008012, audio_tagging_loss=0.008909, over 14575.00 frames. ], tot_loss[loss=0.06511, simple_loss=0.0891, pruned_loss=0.01198, audio_tagging_loss=0.008574, over 3047672.01 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 8.0 2023-11-26 23:24:05,242 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.061e+01 9.189e+01 9.752e+01 1.031e+02 1.213e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-26 23:24:21,125 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 23:24:23,335 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3619920.0, ans=0.0 2023-11-26 23:24:26,790 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 543000 2023-11-26 23:24:43,551 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3620053.3333333335, ans=0.125 2023-11-26 23:24:54,799 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3620120.0, ans=10.0 2023-11-26 23:24:59,809 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 1950, loss[loss=0.06102, simple_loss=0.08338, pruned_loss=0.01044, audio_tagging_loss=0.008888, over 14893.00 frames. ], tot_loss[loss=0.06423, simple_loss=0.08769, pruned_loss=0.01184, audio_tagging_loss=0.008552, over 3047458.84 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 8.0 2023-11-26 23:25:00,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3620186.6666666665, ans=0.125 2023-11-26 23:25:22,796 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 543050 2023-11-26 23:25:33,402 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.70 vs. limit=6.0 2023-11-26 23:25:41,674 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3620386.6666666665, ans=0.1 2023-11-26 23:25:56,326 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 2000, loss[loss=0.05122, simple_loss=0.06403, pruned_loss=0.008672, audio_tagging_loss=0.01054, over 14342.00 frames. ], tot_loss[loss=0.06418, simple_loss=0.08743, pruned_loss=0.0118, audio_tagging_loss=0.008667, over 3053140.38 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:25:57,375 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.406e+01 8.817e+01 9.525e+01 1.016e+02 1.209e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-26 23:25:58,706 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3620520.0, ans=0.125 2023-11-26 23:26:09,119 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.16 vs. limit=10.0 2023-11-26 23:26:10,471 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3620586.6666666665, ans=0.125 2023-11-26 23:26:18,545 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 543100 2023-11-26 23:26:18,652 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3620653.3333333335, ans=0.125 2023-11-26 23:26:34,672 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.54 vs. limit=15.0 2023-11-26 23:26:52,275 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 2050, loss[loss=0.0614, simple_loss=0.08937, pruned_loss=0.01035, audio_tagging_loss=0.006364, over 15386.00 frames. ], tot_loss[loss=0.06428, simple_loss=0.08772, pruned_loss=0.01179, audio_tagging_loss=0.008638, over 3045512.30 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:26:54,115 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3620853.3333333335, ans=0.125 2023-11-26 23:27:14,772 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 543150 2023-11-26 23:27:40,305 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.08 vs. limit=15.0 2023-11-26 23:27:45,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3621120.0, ans=0.0 2023-11-26 23:27:48,127 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 2100, loss[loss=0.05695, simple_loss=0.06613, pruned_loss=0.0119, audio_tagging_loss=0.01199, over 13451.00 frames. ], tot_loss[loss=0.06457, simple_loss=0.08829, pruned_loss=0.01185, audio_tagging_loss=0.008581, over 3048219.10 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 8.0 2023-11-26 23:27:50,227 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.517e+01 8.873e+01 9.430e+01 1.020e+02 1.802e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-26 23:28:10,387 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 543200 2023-11-26 23:28:13,589 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3621320.0, ans=0.2 2023-11-26 23:28:17,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3621320.0, ans=0.125 2023-11-26 23:28:20,547 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3621320.0, ans=0.125 2023-11-26 23:28:24,028 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.10 vs. limit=22.5 2023-11-26 23:28:27,312 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.83 vs. limit=22.5 2023-11-26 23:28:31,347 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3621386.6666666665, ans=0.1 2023-11-26 23:28:40,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3621453.3333333335, ans=0.0 2023-11-26 23:28:44,526 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 2150, loss[loss=0.06659, simple_loss=0.08813, pruned_loss=0.01373, audio_tagging_loss=0.0088, over 15855.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08943, pruned_loss=0.01201, audio_tagging_loss=0.00859, over 3048775.82 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 8.0 2023-11-26 23:28:45,210 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.37 vs. limit=22.5 2023-11-26 23:29:07,518 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 543250 2023-11-26 23:29:08,847 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3621653.3333333335, ans=0.07 2023-11-26 23:29:14,959 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3621653.3333333335, ans=0.125 2023-11-26 23:29:17,488 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 23:29:32,554 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3621786.6666666665, ans=0.125 2023-11-26 23:29:41,011 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 2200, loss[loss=0.05728, simple_loss=0.07949, pruned_loss=0.007356, audio_tagging_loss=0.01018, over 15784.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08965, pruned_loss=0.01204, audio_tagging_loss=0.008611, over 3051730.44 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 8.0 2023-11-26 23:29:43,114 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.448e+01 8.935e+01 9.696e+01 1.032e+02 1.602e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-26 23:29:52,711 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.19 vs. limit=15.0 2023-11-26 23:30:03,480 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 543300 2023-11-26 23:30:32,762 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3622120.0, ans=0.125 2023-11-26 23:30:36,862 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 2250, loss[loss=0.09511, simple_loss=0.1366, pruned_loss=0.02066, audio_tagging_loss=0.006146, over 16086.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.09034, pruned_loss=0.01215, audio_tagging_loss=0.00861, over 3051383.40 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 8.0 2023-11-26 23:30:54,513 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3622253.3333333335, ans=0.5 2023-11-26 23:30:56,685 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3622253.3333333335, ans=0.125 2023-11-26 23:30:58,756 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 543350 2023-11-26 23:31:08,916 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3622386.6666666665, ans=0.0 2023-11-26 23:31:13,050 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.33 vs. limit=6.0 2023-11-26 23:31:25,245 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.72 vs. limit=15.0 2023-11-26 23:31:32,012 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 2300, loss[loss=0.07035, simple_loss=0.09908, pruned_loss=0.01348, audio_tagging_loss=0.007335, over 15406.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.09078, pruned_loss=0.01221, audio_tagging_loss=0.008586, over 3054517.78 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 8.0 2023-11-26 23:31:34,118 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.834e+01 8.796e+01 9.547e+01 1.006e+02 1.160e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-26 23:31:40,726 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3622520.0, ans=0.0 2023-11-26 23:31:43,939 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3622586.6666666665, ans=0.2 2023-11-26 23:31:50,278 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3622586.6666666665, ans=0.125 2023-11-26 23:31:52,986 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3622586.6666666665, ans=0.2 2023-11-26 23:31:55,049 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 543400 2023-11-26 23:32:20,296 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 23:32:21,564 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3622786.6666666665, ans=0.0 2023-11-26 23:32:27,719 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 2350, loss[loss=0.05578, simple_loss=0.07668, pruned_loss=0.009111, audio_tagging_loss=0.008332, over 14938.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.09045, pruned_loss=0.01222, audio_tagging_loss=0.008642, over 3051127.54 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 8.0 2023-11-26 23:32:39,391 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3622920.0, ans=0.1 2023-11-26 23:32:44,740 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-26 23:32:51,415 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 543450 2023-11-26 23:33:17,418 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3623120.0, ans=0.0 2023-11-26 23:33:22,482 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.70 vs. limit=22.5 2023-11-26 23:33:25,282 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 2400, loss[loss=0.06857, simple_loss=0.08932, pruned_loss=0.01293, audio_tagging_loss=0.01098, over 15335.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08928, pruned_loss=0.01198, audio_tagging_loss=0.008857, over 3044097.78 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:33:27,449 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.661e+01 8.979e+01 9.586e+01 1.037e+02 1.629e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-26 23:33:27,761 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 23:33:36,998 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3623253.3333333335, ans=0.0 2023-11-26 23:33:43,281 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3623253.3333333335, ans=0.025 2023-11-26 23:33:47,465 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 543500 2023-11-26 23:33:54,381 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.26 vs. limit=15.0 2023-11-26 23:34:21,516 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 2450, loss[loss=0.08367, simple_loss=0.1106, pruned_loss=0.0167, audio_tagging_loss=0.01168, over 15025.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08909, pruned_loss=0.01188, audio_tagging_loss=0.00885, over 3047698.33 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:34:43,432 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3623653.3333333335, ans=0.125 2023-11-26 23:34:44,223 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 543550 2023-11-26 23:34:44,455 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3623653.3333333335, ans=0.125 2023-11-26 23:34:48,057 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3623653.3333333335, ans=0.1 2023-11-26 23:35:02,184 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3623720.0, ans=0.2 2023-11-26 23:35:10,703 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3623786.6666666665, ans=0.125 2023-11-26 23:35:13,854 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3623786.6666666665, ans=0.1 2023-11-26 23:35:14,221 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.16 vs. limit=22.5 2023-11-26 23:35:16,741 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 2500, loss[loss=0.06262, simple_loss=0.08586, pruned_loss=0.01375, audio_tagging_loss=0.005942, over 15026.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08918, pruned_loss=0.01194, audio_tagging_loss=0.0089, over 3045343.99 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:35:18,809 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.712e+01 8.886e+01 9.376e+01 1.002e+02 1.338e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-26 23:35:33,576 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.58 vs. limit=15.0 2023-11-26 23:35:40,206 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 543600 2023-11-26 23:35:43,383 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.70 vs. limit=15.0 2023-11-26 23:35:52,454 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3624053.3333333335, ans=0.0 2023-11-26 23:36:14,158 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 2550, loss[loss=0.05431, simple_loss=0.06856, pruned_loss=0.0103, audio_tagging_loss=0.009735, over 15130.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08893, pruned_loss=0.0121, audio_tagging_loss=0.008843, over 3047795.72 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:36:23,587 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3624186.6666666665, ans=0.2 2023-11-26 23:36:26,667 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3624253.3333333335, ans=0.2 2023-11-26 23:36:33,041 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3624253.3333333335, ans=0.1 2023-11-26 23:36:36,161 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 543650 2023-11-26 23:36:44,473 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3624320.0, ans=0.015 2023-11-26 23:36:55,774 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.45 vs. limit=6.0 2023-11-26 23:37:02,273 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3624453.3333333335, ans=0.0 2023-11-26 23:37:04,696 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.77 vs. limit=15.0 2023-11-26 23:37:06,986 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3624453.3333333335, ans=0.2 2023-11-26 23:37:09,906 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 2600, loss[loss=0.04087, simple_loss=0.0543, pruned_loss=0.006546, audio_tagging_loss=0.007172, over 14392.00 frames. ], tot_loss[loss=0.06492, simple_loss=0.08849, pruned_loss=0.01203, audio_tagging_loss=0.008654, over 3045983.04 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:37:10,153 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3624520.0, ans=0.0 2023-11-26 23:37:11,966 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.383e+01 8.743e+01 9.424e+01 1.014e+02 1.712e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-26 23:37:12,323 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3624520.0, ans=0.2 2023-11-26 23:37:13,227 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3624520.0, ans=0.125 2023-11-26 23:37:16,516 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3624520.0, ans=0.125 2023-11-26 23:37:18,560 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3624520.0, ans=0.0 2023-11-26 23:37:23,303 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.23 vs. limit=15.0 2023-11-26 23:37:24,281 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.45 vs. limit=15.0 2023-11-26 23:37:31,211 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.28 vs. limit=10.0 2023-11-26 23:37:31,802 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 543700 2023-11-26 23:37:40,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3624653.3333333335, ans=0.125 2023-11-26 23:38:02,218 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3624786.6666666665, ans=0.0 2023-11-26 23:38:05,171 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 2650, loss[loss=0.0536, simple_loss=0.0764, pruned_loss=0.007644, audio_tagging_loss=0.007753, over 13828.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08896, pruned_loss=0.01201, audio_tagging_loss=0.00857, over 3044435.25 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:38:28,297 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 543750 2023-11-26 23:39:00,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3625120.0, ans=0.125 2023-11-26 23:39:01,795 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 2700, loss[loss=0.06598, simple_loss=0.09611, pruned_loss=0.01043, audio_tagging_loss=0.007499, over 15227.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08895, pruned_loss=0.01199, audio_tagging_loss=0.008607, over 3040755.11 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:39:03,858 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.301e+01 8.924e+01 9.565e+01 1.006e+02 1.395e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-26 23:39:04,133 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3625186.6666666665, ans=0.0 2023-11-26 23:39:13,878 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3625253.3333333335, ans=0.025 2023-11-26 23:39:24,296 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 543800 2023-11-26 23:39:53,940 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-26 23:39:57,948 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.49 vs. limit=15.0 2023-11-26 23:39:58,528 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 2750, loss[loss=0.06283, simple_loss=0.08967, pruned_loss=0.01058, audio_tagging_loss=0.007417, over 15369.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.08897, pruned_loss=0.01208, audio_tagging_loss=0.008533, over 3048583.44 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:39:59,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3625520.0, ans=0.0 2023-11-26 23:40:17,718 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3625586.6666666665, ans=0.0 2023-11-26 23:40:20,212 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 543850 2023-11-26 23:40:33,717 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3625720.0, ans=0.125 2023-11-26 23:40:43,913 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3625786.6666666665, ans=0.05 2023-11-26 23:40:44,681 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 23:40:49,089 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3625786.6666666665, ans=0.1 2023-11-26 23:40:51,232 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3625786.6666666665, ans=0.125 2023-11-26 23:40:52,241 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3625853.3333333335, ans=0.125 2023-11-26 23:40:53,077 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 2800, loss[loss=0.06337, simple_loss=0.09118, pruned_loss=0.009438, audio_tagging_loss=0.008344, over 14940.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08932, pruned_loss=0.01214, audio_tagging_loss=0.008542, over 3051234.73 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:40:53,336 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3625853.3333333335, ans=0.125 2023-11-26 23:40:55,216 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.871e+01 8.947e+01 9.554e+01 1.028e+02 1.223e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-26 23:40:59,960 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.16 vs. limit=15.0 2023-11-26 23:41:06,142 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3625920.0, ans=0.125 2023-11-26 23:41:15,463 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 543900 2023-11-26 23:41:15,994 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.36 vs. limit=22.5 2023-11-26 23:41:18,828 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3625986.6666666665, ans=0.125 2023-11-26 23:41:20,599 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3625986.6666666665, ans=0.0 2023-11-26 23:41:27,402 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3626053.3333333335, ans=0.0 2023-11-26 23:41:49,638 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 2850, loss[loss=0.0673, simple_loss=0.08903, pruned_loss=0.01443, audio_tagging_loss=0.008351, over 14318.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08967, pruned_loss=0.01228, audio_tagging_loss=0.00855, over 3046697.62 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:41:49,865 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3626186.6666666665, ans=0.2 2023-11-26 23:41:49,926 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3626186.6666666665, ans=0.125 2023-11-26 23:41:54,086 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3626186.6666666665, ans=0.1 2023-11-26 23:42:08,214 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3626253.3333333335, ans=0.2 2023-11-26 23:42:12,197 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 543950 2023-11-26 23:42:14,709 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.59 vs. limit=6.0 2023-11-26 23:42:17,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3626320.0, ans=0.0 2023-11-26 23:42:18,727 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 23:42:27,561 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3626386.6666666665, ans=0.1 2023-11-26 23:42:45,070 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 2900, loss[loss=0.06827, simple_loss=0.0921, pruned_loss=0.0118, audio_tagging_loss=0.01043, over 14622.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08932, pruned_loss=0.01217, audio_tagging_loss=0.008603, over 3044429.06 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:42:47,743 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.048e+01 8.936e+01 9.597e+01 1.046e+02 1.381e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-26 23:43:02,453 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3626586.6666666665, ans=0.2 2023-11-26 23:43:07,771 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 544000 2023-11-26 23:43:07,974 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3626653.3333333335, ans=0.0 2023-11-26 23:43:13,097 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3626653.3333333335, ans=0.125 2023-11-26 23:43:24,130 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3626720.0, ans=0.2 2023-11-26 23:43:36,210 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.08 vs. limit=15.0 2023-11-26 23:43:41,155 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3626786.6666666665, ans=0.035 2023-11-26 23:43:44,236 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 2950, loss[loss=0.05205, simple_loss=0.06613, pruned_loss=0.007013, audio_tagging_loss=0.01198, over 14746.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08992, pruned_loss=0.01225, audio_tagging_loss=0.008633, over 3044581.11 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:44:04,021 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.97 vs. limit=15.0 2023-11-26 23:44:06,774 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 544050 2023-11-26 23:44:12,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3626986.6666666665, ans=0.07 2023-11-26 23:44:15,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3626986.6666666665, ans=0.2 2023-11-26 23:44:31,505 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3627120.0, ans=0.1 2023-11-26 23:44:32,616 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3627120.0, ans=0.1 2023-11-26 23:44:33,590 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3627120.0, ans=0.125 2023-11-26 23:44:33,679 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3627120.0, ans=0.09899494936611666 2023-11-26 23:44:39,419 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3627186.6666666665, ans=0.1 2023-11-26 23:44:40,274 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 3000, loss[loss=0.0901, simple_loss=0.1234, pruned_loss=0.02217, audio_tagging_loss=0.006219, over 15619.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.0896, pruned_loss=0.01226, audio_tagging_loss=0.008701, over 3052228.93 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:44:40,275 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-26 23:45:12,593 INFO [train_asr.py:1267] (3/4) Epoch 46, validation: loss=0.0572, simple_loss=0.05043, pruned_loss=0.00523, audio_tagging_loss=0.02676, over 4681554.00 frames. 2023-11-26 23:45:12,593 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-26 23:45:15,215 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.354e+01 9.002e+01 9.589e+01 1.016e+02 1.351e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-26 23:45:17,518 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3627186.6666666665, ans=0.1 2023-11-26 23:45:19,673 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3627186.6666666665, ans=0.125 2023-11-26 23:45:23,241 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.82 vs. limit=15.0 2023-11-26 23:45:35,148 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 544100 2023-11-26 23:45:35,298 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 23:45:39,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3627320.0, ans=0.125 2023-11-26 23:45:46,460 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3627386.6666666665, ans=0.125 2023-11-26 23:45:46,558 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3627386.6666666665, ans=0.125 2023-11-26 23:46:05,065 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.whiten.whitening_limit, batch_count=3627453.3333333335, ans=15.0 2023-11-26 23:46:06,077 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.83 vs. limit=15.0 2023-11-26 23:46:07,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3627520.0, ans=0.0 2023-11-26 23:46:08,045 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=3627520.0, ans=15.0 2023-11-26 23:46:08,639 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 3050, loss[loss=0.07936, simple_loss=0.1123, pruned_loss=0.01641, audio_tagging_loss=0.006803, over 16414.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08974, pruned_loss=0.01227, audio_tagging_loss=0.008771, over 3052385.99 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:46:21,828 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.41 vs. limit=22.5 2023-11-26 23:46:30,844 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 544150 2023-11-26 23:46:39,917 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 23:46:40,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3627653.3333333335, ans=0.0 2023-11-26 23:47:03,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3627853.3333333335, ans=0.0 2023-11-26 23:47:04,321 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 3100, loss[loss=0.08367, simple_loss=0.1216, pruned_loss=0.01533, audio_tagging_loss=0.00755, over 15886.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.08974, pruned_loss=0.0123, audio_tagging_loss=0.00881, over 3049056.43 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:47:08,080 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.471e+01 9.067e+01 9.651e+01 1.052e+02 1.316e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-26 23:47:20,834 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.32 vs. limit=15.0 2023-11-26 23:47:27,264 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 544200 2023-11-26 23:47:34,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3627986.6666666665, ans=0.0 2023-11-26 23:47:54,543 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3628120.0, ans=0.125 2023-11-26 23:47:56,609 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3628120.0, ans=0.125 2023-11-26 23:48:01,336 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 3150, loss[loss=0.06074, simple_loss=0.08785, pruned_loss=0.007606, audio_tagging_loss=0.009213, over 15248.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.09025, pruned_loss=0.01231, audio_tagging_loss=0.008871, over 3051331.64 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:48:11,282 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3628253.3333333335, ans=0.05 2023-11-26 23:48:17,139 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3628253.3333333335, ans=0.0 2023-11-26 23:48:18,080 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3628253.3333333335, ans=0.0 2023-11-26 23:48:23,419 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 544250 2023-11-26 23:48:28,422 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3628320.0, ans=0.125 2023-11-26 23:48:28,435 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3628320.0, ans=0.0 2023-11-26 23:48:31,188 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3628320.0, ans=0.1 2023-11-26 23:48:55,029 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3628453.3333333335, ans=0.125 2023-11-26 23:48:57,462 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 3200, loss[loss=0.06302, simple_loss=0.08945, pruned_loss=0.009867, audio_tagging_loss=0.008433, over 16929.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08956, pruned_loss=0.0122, audio_tagging_loss=0.008929, over 3050295.31 frames. ], batch size: 63, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:49:00,187 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.93 vs. limit=6.0 2023-11-26 23:49:00,632 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.270e+01 8.824e+01 9.434e+01 1.022e+02 1.249e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-26 23:49:00,911 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3628520.0, ans=0.1 2023-11-26 23:49:04,112 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3628520.0, ans=0.1 2023-11-26 23:49:12,034 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3628586.6666666665, ans=0.125 2023-11-26 23:49:16,910 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3628586.6666666665, ans=0.125 2023-11-26 23:49:19,856 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 544300 2023-11-26 23:49:20,468 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.19 vs. limit=15.0 2023-11-26 23:49:22,160 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3628653.3333333335, ans=0.0 2023-11-26 23:49:42,564 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3628786.6666666665, ans=0.125 2023-11-26 23:49:49,224 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.91 vs. limit=15.0 2023-11-26 23:49:53,386 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 3250, loss[loss=0.08209, simple_loss=0.116, pruned_loss=0.01583, audio_tagging_loss=0.008248, over 15164.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.08991, pruned_loss=0.01226, audio_tagging_loss=0.008928, over 3048563.77 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:50:06,785 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3628920.0, ans=0.125 2023-11-26 23:50:11,904 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=3628920.0, ans=15.0 2023-11-26 23:50:15,634 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 544350 2023-11-26 23:50:24,610 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3628986.6666666665, ans=0.2 2023-11-26 23:50:26,844 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3629053.3333333335, ans=0.125 2023-11-26 23:50:29,926 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3629053.3333333335, ans=0.1 2023-11-26 23:50:39,598 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3629120.0, ans=0.125 2023-11-26 23:50:48,930 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 3300, loss[loss=0.08659, simple_loss=0.1141, pruned_loss=0.01994, audio_tagging_loss=0.009582, over 15020.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.09015, pruned_loss=0.01239, audio_tagging_loss=0.009003, over 3048389.43 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:50:52,764 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.794e+01 9.136e+01 9.828e+01 1.104e+02 1.362e+02, threshold=1.966e+02, percent-clipped=0.0 2023-11-26 23:50:56,343 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3629186.6666666665, ans=0.125 2023-11-26 23:51:03,655 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3629253.3333333335, ans=0.125 2023-11-26 23:51:08,383 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3629253.3333333335, ans=0.125 2023-11-26 23:51:11,464 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 544400 2023-11-26 23:51:12,577 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3629320.0, ans=0.0 2023-11-26 23:51:21,405 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3629320.0, ans=0.0 2023-11-26 23:51:35,797 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3629453.3333333335, ans=0.0 2023-11-26 23:51:42,489 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.02 vs. limit=15.0 2023-11-26 23:51:45,138 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 3350, loss[loss=0.05151, simple_loss=0.06712, pruned_loss=0.01, audio_tagging_loss=0.007941, over 15393.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.08961, pruned_loss=0.01226, audio_tagging_loss=0.008909, over 3051368.20 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:51:46,563 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.43 vs. limit=15.0 2023-11-26 23:52:07,888 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 544450 2023-11-26 23:52:40,872 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 3400, loss[loss=0.07067, simple_loss=0.1009, pruned_loss=0.01131, audio_tagging_loss=0.008918, over 16001.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.0896, pruned_loss=0.01217, audio_tagging_loss=0.008793, over 3051663.72 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:52:45,602 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.259e+01 8.870e+01 9.488e+01 1.024e+02 1.498e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-26 23:52:50,972 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.65 vs. limit=10.0 2023-11-26 23:52:53,731 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3629920.0, ans=0.2 2023-11-26 23:53:01,665 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3629920.0, ans=0.125 2023-11-26 23:53:03,585 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 544500 2023-11-26 23:53:06,835 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3629986.6666666665, ans=0.1 2023-11-26 23:53:06,935 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3629986.6666666665, ans=0.125 2023-11-26 23:53:11,176 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3629986.6666666665, ans=0.0 2023-11-26 23:53:23,598 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3630053.3333333335, ans=0.2 2023-11-26 23:53:37,193 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 3450, loss[loss=0.05675, simple_loss=0.07324, pruned_loss=0.01101, audio_tagging_loss=0.009119, over 14425.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.09003, pruned_loss=0.01223, audio_tagging_loss=0.008603, over 3054807.18 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:53:45,411 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-26 23:53:51,755 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3630253.3333333335, ans=0.2 2023-11-26 23:53:55,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3630253.3333333335, ans=0.0 2023-11-26 23:53:58,893 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 544550 2023-11-26 23:54:08,349 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3630320.0, ans=0.1 2023-11-26 23:54:32,564 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 3500, loss[loss=0.05622, simple_loss=0.08261, pruned_loss=0.007906, audio_tagging_loss=0.007011, over 15012.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.09085, pruned_loss=0.01241, audio_tagging_loss=0.008532, over 3055231.51 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:54:34,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3630520.0, ans=0.125 2023-11-26 23:54:36,791 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.154e+01 9.117e+01 9.795e+01 1.053e+02 1.409e+02, threshold=1.959e+02, percent-clipped=0.0 2023-11-26 23:54:41,255 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3630520.0, ans=0.1 2023-11-26 23:54:43,734 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.08 vs. limit=15.0 2023-11-26 23:54:49,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3630586.6666666665, ans=0.1 2023-11-26 23:54:55,517 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 544600 2023-11-26 23:55:00,251 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3630653.3333333335, ans=0.125 2023-11-26 23:55:01,004 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-26 23:55:01,229 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3630653.3333333335, ans=0.0 2023-11-26 23:55:13,546 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3630720.0, ans=0.1 2023-11-26 23:55:28,211 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 3550, loss[loss=0.08258, simple_loss=0.1183, pruned_loss=0.01548, audio_tagging_loss=0.007935, over 16100.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.09034, pruned_loss=0.01233, audio_tagging_loss=0.008525, over 3055712.15 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:55:35,189 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.19 vs. limit=15.0 2023-11-26 23:55:36,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3630853.3333333335, ans=0.0 2023-11-26 23:55:51,567 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 544650 2023-11-26 23:56:03,586 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3631053.3333333335, ans=0.125 2023-11-26 23:56:05,862 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.70 vs. limit=10.0 2023-11-26 23:56:17,517 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3631120.0, ans=0.125 2023-11-26 23:56:25,421 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 3600, loss[loss=0.05263, simple_loss=0.07092, pruned_loss=0.008158, audio_tagging_loss=0.009014, over 13685.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.0894, pruned_loss=0.01213, audio_tagging_loss=0.008561, over 3053732.70 frames. ], batch size: 52, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:56:29,612 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.853e+01 8.770e+01 9.299e+01 1.012e+02 1.507e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-26 23:56:47,216 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 544700 2023-11-26 23:56:50,574 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3631320.0, ans=0.0 2023-11-26 23:56:50,651 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3631320.0, ans=0.0 2023-11-26 23:56:58,717 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3631386.6666666665, ans=0.0 2023-11-26 23:57:01,798 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.56 vs. limit=15.0 2023-11-26 23:57:20,897 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 3650, loss[loss=0.0444, simple_loss=0.0574, pruned_loss=0.005115, audio_tagging_loss=0.01058, over 14414.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08984, pruned_loss=0.01216, audio_tagging_loss=0.008542, over 3051248.56 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:57:25,324 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3631520.0, ans=0.125 2023-11-26 23:57:35,209 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.90 vs. limit=15.0 2023-11-26 23:57:40,757 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3631586.6666666665, ans=0.125 2023-11-26 23:57:43,362 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 544750 2023-11-26 23:58:16,358 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 3700, loss[loss=0.06295, simple_loss=0.09062, pruned_loss=0.01074, audio_tagging_loss=0.006902, over 15079.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08992, pruned_loss=0.01224, audio_tagging_loss=0.008392, over 3047230.49 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 32.0 2023-11-26 23:58:20,621 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.470e+01 8.914e+01 9.498e+01 1.020e+02 1.600e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-26 23:58:35,433 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3631920.0, ans=0.125 2023-11-26 23:58:37,447 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3631920.0, ans=0.125 2023-11-26 23:58:40,015 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 544800 2023-11-26 23:58:52,418 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3632053.3333333335, ans=0.125 2023-11-26 23:58:53,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3632053.3333333335, ans=0.125 2023-11-26 23:58:55,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3632053.3333333335, ans=0.0 2023-11-26 23:58:56,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3632053.3333333335, ans=0.125 2023-11-26 23:59:13,825 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 3750, loss[loss=0.09522, simple_loss=0.1367, pruned_loss=0.02122, audio_tagging_loss=0.005632, over 16019.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.09043, pruned_loss=0.01232, audio_tagging_loss=0.008453, over 3053478.41 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-26 23:59:25,373 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3632253.3333333335, ans=0.1 2023-11-26 23:59:35,718 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 544850 2023-11-26 23:59:37,912 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3632320.0, ans=0.1 2023-11-26 23:59:45,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3632386.6666666665, ans=0.2 2023-11-26 23:59:48,207 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3632386.6666666665, ans=0.2 2023-11-26 23:59:51,098 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 00:00:09,623 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 3800, loss[loss=0.06644, simple_loss=0.08396, pruned_loss=0.01285, audio_tagging_loss=0.01162, over 14638.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.09068, pruned_loss=0.01243, audio_tagging_loss=0.00854, over 3047542.92 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:00:14,900 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.742e+01 9.124e+01 9.737e+01 1.067e+02 1.479e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-27 00:00:31,615 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 544900 2023-11-27 00:00:47,202 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3632720.0, ans=0.2 2023-11-27 00:00:59,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3632786.6666666665, ans=0.125 2023-11-27 00:01:04,887 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 3850, loss[loss=0.08222, simple_loss=0.1142, pruned_loss=0.01712, audio_tagging_loss=0.007989, over 14274.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.09036, pruned_loss=0.01246, audio_tagging_loss=0.008605, over 3039660.70 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:01:10,409 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=3632853.3333333335, ans=10.0 2023-11-27 00:01:22,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3632920.0, ans=0.125 2023-11-27 00:01:28,069 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 544950 2023-11-27 00:02:01,465 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 3900, loss[loss=0.05946, simple_loss=0.06743, pruned_loss=0.01089, audio_tagging_loss=0.01486, over 16692.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.08967, pruned_loss=0.01234, audio_tagging_loss=0.008797, over 3044738.19 frames. ], batch size: 64, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:02:01,750 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3633186.6666666665, ans=0.125 2023-11-27 00:02:07,293 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.712e+01 8.766e+01 9.510e+01 1.042e+02 1.590e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-27 00:02:10,737 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3633186.6666666665, ans=0.125 2023-11-27 00:02:23,091 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3633320.0, ans=0.1 2023-11-27 00:02:23,949 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 545000 2023-11-27 00:02:58,143 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 3950, loss[loss=0.06394, simple_loss=0.07776, pruned_loss=0.01269, audio_tagging_loss=0.01237, over 14361.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.09011, pruned_loss=0.01237, audio_tagging_loss=0.008928, over 3045476.97 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:02:59,612 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.89 vs. limit=12.0 2023-11-27 00:03:11,223 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3633586.6666666665, ans=0.125 2023-11-27 00:03:12,223 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3633586.6666666665, ans=0.2 2023-11-27 00:03:19,642 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 545050 2023-11-27 00:03:53,740 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 4000, loss[loss=0.06758, simple_loss=0.08556, pruned_loss=0.01515, audio_tagging_loss=0.00965, over 14902.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.09003, pruned_loss=0.01234, audio_tagging_loss=0.009012, over 3043437.29 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:03:56,066 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3633853.3333333335, ans=0.035 2023-11-27 00:03:59,123 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.786e+01 9.088e+01 9.544e+01 1.045e+02 1.311e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-27 00:04:14,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3633920.0, ans=0.125 2023-11-27 00:04:16,123 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 545100 2023-11-27 00:04:17,832 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3633986.6666666665, ans=0.125 2023-11-27 00:04:21,485 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3633986.6666666665, ans=0.1 2023-11-27 00:04:37,390 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3634120.0, ans=0.5 2023-11-27 00:04:49,505 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 4050, loss[loss=0.05314, simple_loss=0.06979, pruned_loss=0.009297, audio_tagging_loss=0.008951, over 15795.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.08987, pruned_loss=0.01218, audio_tagging_loss=0.009021, over 3043680.03 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:04:52,315 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 00:04:55,727 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3634186.6666666665, ans=0.125 2023-11-27 00:04:56,864 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3634186.6666666665, ans=0.0 2023-11-27 00:05:12,255 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 545150 2023-11-27 00:05:14,753 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.38 vs. limit=15.0 2023-11-27 00:05:26,218 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3634386.6666666665, ans=0.0 2023-11-27 00:05:34,639 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3634453.3333333335, ans=0.0 2023-11-27 00:05:43,233 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3634453.3333333335, ans=0.0 2023-11-27 00:05:46,169 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 4100, loss[loss=0.06917, simple_loss=0.09808, pruned_loss=0.01086, audio_tagging_loss=0.00927, over 14702.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.09035, pruned_loss=0.01225, audio_tagging_loss=0.008981, over 3042484.83 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:05:48,646 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3634520.0, ans=0.1 2023-11-27 00:05:50,791 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.95 vs. limit=12.0 2023-11-27 00:05:52,466 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.629e+01 8.888e+01 9.665e+01 1.037e+02 1.522e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-27 00:05:57,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3634586.6666666665, ans=0.125 2023-11-27 00:06:07,385 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 545200 2023-11-27 00:06:22,104 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.56 vs. limit=12.0 2023-11-27 00:06:41,832 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 4150, loss[loss=0.05537, simple_loss=0.09011, pruned_loss=0.006316, audio_tagging_loss=0.003997, over 15074.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.09029, pruned_loss=0.01224, audio_tagging_loss=0.008751, over 3044166.50 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:06:46,254 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3634853.3333333335, ans=0.125 2023-11-27 00:07:04,248 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 545250 2023-11-27 00:07:20,308 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 00:07:20,782 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.02 vs. limit=22.5 2023-11-27 00:07:22,302 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 00:07:37,621 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 4200, loss[loss=0.05286, simple_loss=0.06624, pruned_loss=0.009852, audio_tagging_loss=0.009886, over 16312.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.09013, pruned_loss=0.01208, audio_tagging_loss=0.008635, over 3041495.69 frames. ], batch size: 62, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:07:44,533 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.471e+01 9.031e+01 9.580e+01 1.007e+02 1.196e+02, threshold=1.916e+02, percent-clipped=0.0 2023-11-27 00:07:45,943 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3635186.6666666665, ans=0.0 2023-11-27 00:07:46,951 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3635186.6666666665, ans=0.125 2023-11-27 00:08:00,807 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 545300 2023-11-27 00:08:01,043 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3635320.0, ans=0.125 2023-11-27 00:08:08,858 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.44 vs. limit=22.5 2023-11-27 00:08:18,358 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.03 vs. limit=15.0 2023-11-27 00:08:23,579 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3635453.3333333335, ans=0.2 2023-11-27 00:08:33,921 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 4250, loss[loss=0.03213, simple_loss=0.03474, pruned_loss=0.004334, audio_tagging_loss=0.01043, over 14586.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08968, pruned_loss=0.01193, audio_tagging_loss=0.008521, over 3034889.19 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:08:33,990 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3635520.0, ans=0.125 2023-11-27 00:08:40,015 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3635520.0, ans=0.125 2023-11-27 00:08:45,866 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3635586.6666666665, ans=0.125 2023-11-27 00:08:52,166 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3635586.6666666665, ans=0.125 2023-11-27 00:08:56,311 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 545350 2023-11-27 00:08:57,672 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.24 vs. limit=15.0 2023-11-27 00:09:10,543 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.45 vs. limit=15.0 2023-11-27 00:09:23,451 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3635786.6666666665, ans=0.0 2023-11-27 00:09:26,555 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3635786.6666666665, ans=0.125 2023-11-27 00:09:30,141 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 4300, loss[loss=0.06336, simple_loss=0.08811, pruned_loss=0.01067, audio_tagging_loss=0.008635, over 15117.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08985, pruned_loss=0.01209, audio_tagging_loss=0.008605, over 3036132.09 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:09:30,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3635853.3333333335, ans=0.125 2023-11-27 00:09:33,710 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3635853.3333333335, ans=0.125 2023-11-27 00:09:36,590 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.859e+01 9.001e+01 9.508e+01 1.030e+02 1.268e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-27 00:09:40,051 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3635920.0, ans=0.07 2023-11-27 00:09:49,617 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3635920.0, ans=0.125 2023-11-27 00:09:52,687 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 545400 2023-11-27 00:10:17,401 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3636120.0, ans=0.0 2023-11-27 00:10:19,581 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3636120.0, ans=0.0 2023-11-27 00:10:25,661 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 4350, loss[loss=0.05416, simple_loss=0.06365, pruned_loss=0.01238, audio_tagging_loss=0.009961, over 13922.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.09015, pruned_loss=0.01212, audio_tagging_loss=0.008578, over 3031962.17 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 8.0 2023-11-27 00:10:49,130 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 545450 2023-11-27 00:10:52,493 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3636320.0, ans=0.1 2023-11-27 00:10:53,910 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.45 vs. limit=6.0 2023-11-27 00:11:03,707 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3636386.6666666665, ans=0.125 2023-11-27 00:11:14,283 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3636453.3333333335, ans=0.125 2023-11-27 00:11:16,709 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.02 vs. limit=22.5 2023-11-27 00:11:18,610 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.59 vs. limit=15.0 2023-11-27 00:11:21,624 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3636520.0, ans=10.0 2023-11-27 00:11:22,382 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 4400, loss[loss=0.05759, simple_loss=0.08123, pruned_loss=0.009313, audio_tagging_loss=0.007661, over 14712.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.09081, pruned_loss=0.01241, audio_tagging_loss=0.008508, over 3042338.58 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:11:30,499 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.966e+01 9.047e+01 9.734e+01 1.041e+02 1.241e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-27 00:11:45,032 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 545500 2023-11-27 00:11:45,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3636653.3333333335, ans=0.1 2023-11-27 00:11:54,512 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3636653.3333333335, ans=0.125 2023-11-27 00:11:58,518 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.24 vs. limit=15.0 2023-11-27 00:12:12,945 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.86 vs. limit=15.0 2023-11-27 00:12:18,831 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 4450, loss[loss=0.05225, simple_loss=0.07037, pruned_loss=0.0109, audio_tagging_loss=0.006162, over 14877.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.09058, pruned_loss=0.0124, audio_tagging_loss=0.008441, over 3049668.27 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:12:21,651 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.60 vs. limit=15.0 2023-11-27 00:12:27,001 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3636853.3333333335, ans=0.0 2023-11-27 00:12:41,855 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 545550 2023-11-27 00:12:53,123 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3637053.3333333335, ans=0.2 2023-11-27 00:12:54,127 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3637053.3333333335, ans=0.2 2023-11-27 00:13:14,887 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 4500, loss[loss=0.066, simple_loss=0.1051, pruned_loss=0.007584, audio_tagging_loss=0.00587, over 15602.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.09032, pruned_loss=0.01226, audio_tagging_loss=0.008468, over 3043890.54 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:13:23,384 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.527e+01 8.728e+01 9.573e+01 1.027e+02 1.215e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-27 00:13:34,290 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3637253.3333333335, ans=0.125 2023-11-27 00:13:37,798 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 545600 2023-11-27 00:13:39,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3637320.0, ans=0.2 2023-11-27 00:13:39,329 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.55 vs. limit=22.5 2023-11-27 00:13:43,547 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3637320.0, ans=0.2 2023-11-27 00:13:52,511 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3637386.6666666665, ans=0.125 2023-11-27 00:14:11,576 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 4550, loss[loss=0.08089, simple_loss=0.12, pruned_loss=0.01679, audio_tagging_loss=0.004107, over 15471.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08995, pruned_loss=0.01221, audio_tagging_loss=0.008506, over 3036144.88 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:14:24,539 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.37 vs. limit=15.0 2023-11-27 00:14:33,598 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 545650 2023-11-27 00:14:46,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3637720.0, ans=0.1 2023-11-27 00:14:54,422 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 00:14:55,799 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3637786.6666666665, ans=0.125 2023-11-27 00:15:07,689 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 4600, loss[loss=0.07166, simple_loss=0.1012, pruned_loss=0.01337, audio_tagging_loss=0.007682, over 17306.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08989, pruned_loss=0.01213, audio_tagging_loss=0.008541, over 3040836.26 frames. ], batch size: 65, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:15:15,119 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.389e+01 8.975e+01 9.578e+01 1.039e+02 1.809e+02, threshold=1.916e+02, percent-clipped=0.0 2023-11-27 00:15:26,877 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3637920.0, ans=0.125 2023-11-27 00:15:28,932 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3637986.6666666665, ans=0.1 2023-11-27 00:15:29,863 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 545700 2023-11-27 00:15:34,088 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.64 vs. limit=22.5 2023-11-27 00:15:44,091 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.01 vs. limit=22.5 2023-11-27 00:15:47,437 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.28 vs. limit=15.0 2023-11-27 00:15:48,501 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.63 vs. limit=15.0 2023-11-27 00:16:02,963 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 4650, loss[loss=0.0737, simple_loss=0.09279, pruned_loss=0.01903, audio_tagging_loss=0.008272, over 14260.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08892, pruned_loss=0.01204, audio_tagging_loss=0.008713, over 3045626.06 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:16:12,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3638186.6666666665, ans=0.125 2023-11-27 00:16:15,340 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3638253.3333333335, ans=0.035 2023-11-27 00:16:18,511 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 00:16:22,819 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3638253.3333333335, ans=0.125 2023-11-27 00:16:26,552 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 545750 2023-11-27 00:16:28,734 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3638320.0, ans=0.07 2023-11-27 00:16:28,818 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3638320.0, ans=0.0 2023-11-27 00:16:36,190 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3638386.6666666665, ans=0.125 2023-11-27 00:16:54,275 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3638453.3333333335, ans=0.125 2023-11-27 00:16:59,985 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 4700, loss[loss=0.07606, simple_loss=0.1029, pruned_loss=0.0155, audio_tagging_loss=0.009112, over 15474.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.09006, pruned_loss=0.01216, audio_tagging_loss=0.008713, over 3046983.23 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:17:07,411 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.556e+01 9.156e+01 9.734e+01 1.046e+02 1.264e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-27 00:17:08,759 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3638520.0, ans=0.125 2023-11-27 00:17:10,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3638586.6666666665, ans=0.125 2023-11-27 00:17:21,958 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 545800 2023-11-27 00:17:54,769 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3638786.6666666665, ans=0.125 2023-11-27 00:17:55,825 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3638853.3333333335, ans=0.125 2023-11-27 00:17:56,608 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 4750, loss[loss=0.06514, simple_loss=0.09235, pruned_loss=0.009823, audio_tagging_loss=0.009137, over 15266.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08984, pruned_loss=0.012, audio_tagging_loss=0.008717, over 3046771.32 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:18:01,199 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3638853.3333333335, ans=0.125 2023-11-27 00:18:15,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3638920.0, ans=0.0 2023-11-27 00:18:16,109 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3638920.0, ans=0.2 2023-11-27 00:18:17,705 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3638986.6666666665, ans=0.0 2023-11-27 00:18:18,629 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 545850 2023-11-27 00:18:26,687 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3638986.6666666665, ans=0.0 2023-11-27 00:18:28,908 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3639053.3333333335, ans=0.125 2023-11-27 00:18:41,516 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.58 vs. limit=15.0 2023-11-27 00:18:51,445 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 4800, loss[loss=0.07067, simple_loss=0.09947, pruned_loss=0.01299, audio_tagging_loss=0.00795, over 16864.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08975, pruned_loss=0.01194, audio_tagging_loss=0.008826, over 3043940.52 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:18:59,447 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.631e+01 8.803e+01 9.667e+01 1.040e+02 1.360e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-27 00:19:02,496 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3639253.3333333335, ans=0.2 2023-11-27 00:19:12,650 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3639253.3333333335, ans=0.2 2023-11-27 00:19:13,580 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3639320.0, ans=0.125 2023-11-27 00:19:14,554 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 545900 2023-11-27 00:19:37,305 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.74 vs. limit=22.5 2023-11-27 00:19:42,791 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3639453.3333333335, ans=0.1 2023-11-27 00:19:43,814 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 00:19:48,942 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 4850, loss[loss=0.06819, simple_loss=0.09776, pruned_loss=0.00949, audio_tagging_loss=0.00982, over 14755.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.09003, pruned_loss=0.01191, audio_tagging_loss=0.00885, over 3049249.11 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:19:51,792 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.77 vs. limit=6.0 2023-11-27 00:19:52,427 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3639520.0, ans=0.125 2023-11-27 00:19:56,020 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.44 vs. limit=15.0 2023-11-27 00:20:08,963 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3639586.6666666665, ans=0.0 2023-11-27 00:20:10,920 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 545950 2023-11-27 00:20:14,265 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3639653.3333333335, ans=0.0 2023-11-27 00:20:39,341 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3639786.6666666665, ans=0.125 2023-11-27 00:20:44,072 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3639853.3333333335, ans=0.125 2023-11-27 00:20:44,965 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 4900, loss[loss=0.05739, simple_loss=0.0837, pruned_loss=0.007859, audio_tagging_loss=0.007677, over 15859.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08972, pruned_loss=0.01203, audio_tagging_loss=0.008808, over 3049525.10 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:20:52,383 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.349e+01 8.929e+01 9.407e+01 1.023e+02 1.723e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-27 00:21:00,714 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.76 vs. limit=15.0 2023-11-27 00:21:06,442 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 546000 2023-11-27 00:21:06,561 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3639986.6666666665, ans=0.1 2023-11-27 00:21:13,283 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3639986.6666666665, ans=0.0 2023-11-27 00:21:26,580 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3640053.3333333335, ans=0.0 2023-11-27 00:21:33,456 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.21 vs. limit=10.0 2023-11-27 00:21:34,101 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3640120.0, ans=0.0 2023-11-27 00:21:40,256 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 4950, loss[loss=0.0572, simple_loss=0.08204, pruned_loss=0.008743, audio_tagging_loss=0.007438, over 17066.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08917, pruned_loss=0.01195, audio_tagging_loss=0.008628, over 3042789.80 frames. ], batch size: 64, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:21:44,704 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3640186.6666666665, ans=0.2 2023-11-27 00:21:46,683 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3640186.6666666665, ans=0.2 2023-11-27 00:21:50,410 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3640253.3333333335, ans=0.125 2023-11-27 00:21:53,564 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3640253.3333333335, ans=0.125 2023-11-27 00:21:57,400 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3640253.3333333335, ans=0.0 2023-11-27 00:21:59,905 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3640253.3333333335, ans=0.125 2023-11-27 00:22:02,999 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 546050 2023-11-27 00:22:04,215 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3640320.0, ans=0.0 2023-11-27 00:22:16,433 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3640386.6666666665, ans=0.1 2023-11-27 00:22:20,825 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3640386.6666666665, ans=0.0 2023-11-27 00:22:22,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3640386.6666666665, ans=0.1 2023-11-27 00:22:26,183 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3640453.3333333335, ans=0.2 2023-11-27 00:22:35,936 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 5000, loss[loss=0.09064, simple_loss=0.1225, pruned_loss=0.02096, audio_tagging_loss=0.008423, over 15549.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08938, pruned_loss=0.01199, audio_tagging_loss=0.008568, over 3036633.01 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:22:37,218 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3640520.0, ans=0.125 2023-11-27 00:22:42,417 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3640520.0, ans=0.125 2023-11-27 00:22:44,481 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.730e+01 8.925e+01 9.606e+01 1.023e+02 1.240e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-27 00:22:46,298 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.40 vs. limit=6.0 2023-11-27 00:22:50,482 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.74 vs. limit=22.5 2023-11-27 00:22:53,471 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.22 vs. limit=12.0 2023-11-27 00:22:56,073 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3640586.6666666665, ans=0.0 2023-11-27 00:22:59,178 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 546100 2023-11-27 00:23:13,038 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3640720.0, ans=0.2 2023-11-27 00:23:32,443 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 5050, loss[loss=0.06071, simple_loss=0.09012, pruned_loss=0.008666, audio_tagging_loss=0.006982, over 15864.00 frames. ], tot_loss[loss=0.06457, simple_loss=0.0884, pruned_loss=0.01184, audio_tagging_loss=0.008529, over 3038822.56 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:23:38,400 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 00:23:44,891 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 00:23:53,428 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3640986.6666666665, ans=0.0 2023-11-27 00:23:54,340 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 546150 2023-11-27 00:23:54,550 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3640986.6666666665, ans=0.0 2023-11-27 00:24:01,398 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3640986.6666666665, ans=0.1 2023-11-27 00:24:11,527 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.22 vs. limit=15.0 2023-11-27 00:24:12,184 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3641053.3333333335, ans=0.0 2023-11-27 00:24:19,128 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=3641120.0, ans=0.5 2023-11-27 00:24:22,405 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=3641120.0, ans=0.1 2023-11-27 00:24:24,581 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3641120.0, ans=0.0 2023-11-27 00:24:28,499 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 5100, loss[loss=0.0595, simple_loss=0.08179, pruned_loss=0.009724, audio_tagging_loss=0.008878, over 16187.00 frames. ], tot_loss[loss=0.06411, simple_loss=0.08767, pruned_loss=0.01178, audio_tagging_loss=0.008496, over 3035639.37 frames. ], batch size: 62, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:24:33,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3641186.6666666665, ans=0.1 2023-11-27 00:24:35,998 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.517e+01 8.921e+01 9.596e+01 1.036e+02 1.225e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-27 00:24:45,266 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3641253.3333333335, ans=0.0 2023-11-27 00:24:51,074 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 546200 2023-11-27 00:24:52,603 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.88 vs. limit=6.0 2023-11-27 00:25:01,045 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3641320.0, ans=0.125 2023-11-27 00:25:10,709 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3641386.6666666665, ans=0.125 2023-11-27 00:25:24,921 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 5150, loss[loss=0.06036, simple_loss=0.08598, pruned_loss=0.01024, audio_tagging_loss=0.007128, over 16656.00 frames. ], tot_loss[loss=0.06447, simple_loss=0.08814, pruned_loss=0.01186, audio_tagging_loss=0.008542, over 3040536.07 frames. ], batch size: 65, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:25:30,987 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3641520.0, ans=0.125 2023-11-27 00:25:39,045 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3641586.6666666665, ans=0.0 2023-11-27 00:25:44,234 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.91 vs. limit=12.0 2023-11-27 00:25:47,970 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 546250 2023-11-27 00:26:13,647 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3641786.6666666665, ans=0.5 2023-11-27 00:26:20,888 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 5200, loss[loss=0.07313, simple_loss=0.1059, pruned_loss=0.01488, audio_tagging_loss=0.005316, over 14956.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.09, pruned_loss=0.01226, audio_tagging_loss=0.008444, over 3038897.29 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:26:28,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3641853.3333333335, ans=0.125 2023-11-27 00:26:29,257 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.652e+01 9.022e+01 9.726e+01 1.018e+02 1.270e+02, threshold=1.945e+02, percent-clipped=0.0 2023-11-27 00:26:30,556 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3641853.3333333335, ans=0.125 2023-11-27 00:26:34,831 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3641920.0, ans=0.09899494936611666 2023-11-27 00:26:43,157 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 546300 2023-11-27 00:26:52,534 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.47 vs. limit=15.0 2023-11-27 00:26:58,110 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3642053.3333333335, ans=0.1 2023-11-27 00:27:15,718 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3642186.6666666665, ans=0.125 2023-11-27 00:27:16,514 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 5250, loss[loss=0.0639, simple_loss=0.09696, pruned_loss=0.01065, audio_tagging_loss=0.004774, over 15581.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.08999, pruned_loss=0.01228, audio_tagging_loss=0.008465, over 3044045.96 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:27:35,311 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3642253.3333333335, ans=0.1 2023-11-27 00:27:38,972 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 546350 2023-11-27 00:28:05,650 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3642453.3333333335, ans=0.0 2023-11-27 00:28:11,781 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 5300, loss[loss=0.06607, simple_loss=0.08262, pruned_loss=0.01648, audio_tagging_loss=0.008283, over 16559.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.08906, pruned_loss=0.01209, audio_tagging_loss=0.008468, over 3043069.20 frames. ], batch size: 63, lr: 1.47e-03, grad_scale: 8.0 2023-11-27 00:28:11,987 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3642520.0, ans=0.0 2023-11-27 00:28:18,817 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.31 vs. limit=22.5 2023-11-27 00:28:22,503 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.462e+01 9.037e+01 9.686e+01 1.067e+02 1.240e+02, threshold=1.937e+02, percent-clipped=0.0 2023-11-27 00:28:22,734 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3642586.6666666665, ans=0.2 2023-11-27 00:28:31,727 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3642586.6666666665, ans=0.04949747468305833 2023-11-27 00:28:31,804 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3642586.6666666665, ans=0.125 2023-11-27 00:28:35,407 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 546400 2023-11-27 00:28:44,197 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3642653.3333333335, ans=0.05 2023-11-27 00:28:48,483 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3642720.0, ans=0.0 2023-11-27 00:29:08,575 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 5350, loss[loss=0.07746, simple_loss=0.1033, pruned_loss=0.01722, audio_tagging_loss=0.00859, over 13834.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08952, pruned_loss=0.01203, audio_tagging_loss=0.008532, over 3046003.06 frames. ], batch size: 52, lr: 1.47e-03, grad_scale: 8.0 2023-11-27 00:29:19,425 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3642920.0, ans=0.125 2023-11-27 00:29:24,859 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3642920.0, ans=0.125 2023-11-27 00:29:30,197 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3642986.6666666665, ans=0.125 2023-11-27 00:29:31,164 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 546450 2023-11-27 00:29:33,375 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3642986.6666666665, ans=0.125 2023-11-27 00:29:37,701 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3642986.6666666665, ans=0.95 2023-11-27 00:29:57,417 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3643120.0, ans=0.125 2023-11-27 00:30:01,647 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3643120.0, ans=0.09899494936611666 2023-11-27 00:30:05,136 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 5400, loss[loss=0.06399, simple_loss=0.08939, pruned_loss=0.00946, audio_tagging_loss=0.009834, over 16112.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08928, pruned_loss=0.01205, audio_tagging_loss=0.008719, over 3046256.91 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 8.0 2023-11-27 00:30:13,181 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.21 vs. limit=15.0 2023-11-27 00:30:14,662 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.823e+01 8.994e+01 9.613e+01 1.047e+02 1.327e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-27 00:30:15,984 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3643253.3333333335, ans=0.0 2023-11-27 00:30:27,033 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 546500 2023-11-27 00:30:34,108 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3643320.0, ans=0.1 2023-11-27 00:30:49,289 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.88 vs. limit=10.0 2023-11-27 00:30:55,223 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3643453.3333333335, ans=0.0 2023-11-27 00:31:00,369 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 5450, loss[loss=0.07468, simple_loss=0.09315, pruned_loss=0.01952, audio_tagging_loss=0.008586, over 15459.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08946, pruned_loss=0.01211, audio_tagging_loss=0.008754, over 3048048.60 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 8.0 2023-11-27 00:31:08,680 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3643520.0, ans=0.0 2023-11-27 00:31:11,295 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3643586.6666666665, ans=0.125 2023-11-27 00:31:23,089 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 546550 2023-11-27 00:31:51,275 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.47 vs. limit=15.0 2023-11-27 00:31:56,713 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 5500, loss[loss=0.04701, simple_loss=0.06323, pruned_loss=0.005216, audio_tagging_loss=0.01018, over 15344.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08947, pruned_loss=0.01204, audio_tagging_loss=0.008772, over 3049582.66 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 8.0 2023-11-27 00:32:03,325 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3643853.3333333335, ans=0.0 2023-11-27 00:32:06,995 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.209e+01 8.879e+01 9.698e+01 1.044e+02 1.314e+02, threshold=1.940e+02, percent-clipped=0.0 2023-11-27 00:32:13,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3643920.0, ans=0.125 2023-11-27 00:32:19,448 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 546600 2023-11-27 00:32:24,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3643986.6666666665, ans=0.0 2023-11-27 00:32:27,338 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3643986.6666666665, ans=0.0 2023-11-27 00:32:36,599 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3644053.3333333335, ans=0.125 2023-11-27 00:32:39,954 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3644053.3333333335, ans=0.1 2023-11-27 00:32:53,006 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 5550, loss[loss=0.07594, simple_loss=0.1087, pruned_loss=0.01416, audio_tagging_loss=0.007456, over 15802.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08924, pruned_loss=0.01202, audio_tagging_loss=0.008884, over 3040392.18 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 8.0 2023-11-27 00:33:15,229 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 546650 2023-11-27 00:33:24,530 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3644320.0, ans=0.125 2023-11-27 00:33:26,620 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3644386.6666666665, ans=0.2 2023-11-27 00:33:34,394 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3644386.6666666665, ans=0.0 2023-11-27 00:33:45,720 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3644453.3333333335, ans=0.125 2023-11-27 00:33:47,758 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3644520.0, ans=0.125 2023-11-27 00:33:48,616 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 5600, loss[loss=0.07313, simple_loss=0.1056, pruned_loss=0.01169, audio_tagging_loss=0.008633, over 15529.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.09002, pruned_loss=0.01214, audio_tagging_loss=0.008945, over 3047191.11 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:33:52,028 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3644520.0, ans=0.0 2023-11-27 00:33:52,998 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3644520.0, ans=0.0 2023-11-27 00:33:58,691 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.232e+01 8.835e+01 9.433e+01 1.028e+02 1.297e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-27 00:34:01,961 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.00 vs. limit=15.0 2023-11-27 00:34:05,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3644586.6666666665, ans=0.0 2023-11-27 00:34:06,079 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.04 vs. limit=22.5 2023-11-27 00:34:11,089 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 546700 2023-11-27 00:34:27,072 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3644720.0, ans=0.0 2023-11-27 00:34:28,987 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 00:34:44,704 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 5650, loss[loss=0.0498, simple_loss=0.0664, pruned_loss=0.007617, audio_tagging_loss=0.008982, over 15730.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08966, pruned_loss=0.01208, audio_tagging_loss=0.009001, over 3049641.33 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:34:44,915 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3644853.3333333335, ans=0.125 2023-11-27 00:34:51,303 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3644853.3333333335, ans=0.125 2023-11-27 00:34:56,049 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3644920.0, ans=0.2 2023-11-27 00:34:59,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3644920.0, ans=0.05 2023-11-27 00:35:01,629 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.46 vs. limit=22.5 2023-11-27 00:35:02,680 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.13 vs. limit=6.0 2023-11-27 00:35:06,511 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 546750 2023-11-27 00:35:21,708 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3645053.3333333335, ans=0.125 2023-11-27 00:35:30,897 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3645120.0, ans=0.125 2023-11-27 00:35:39,150 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.03 vs. limit=6.0 2023-11-27 00:35:40,811 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 5700, loss[loss=0.0661, simple_loss=0.09205, pruned_loss=0.01032, audio_tagging_loss=0.009748, over 15485.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08893, pruned_loss=0.01187, audio_tagging_loss=0.00901, over 3047260.64 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:35:48,465 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3645186.6666666665, ans=0.1 2023-11-27 00:35:50,416 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.868e+01 8.853e+01 9.368e+01 1.022e+02 1.504e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-27 00:36:03,316 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 546800 2023-11-27 00:36:09,595 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3645320.0, ans=0.0 2023-11-27 00:36:10,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3645320.0, ans=0.125 2023-11-27 00:36:22,637 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3645386.6666666665, ans=0.125 2023-11-27 00:36:23,677 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3645386.6666666665, ans=0.125 2023-11-27 00:36:25,197 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.39 vs. limit=22.5 2023-11-27 00:36:36,204 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 5750, loss[loss=0.08052, simple_loss=0.1275, pruned_loss=0.01255, audio_tagging_loss=0.004233, over 16070.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.08868, pruned_loss=0.01186, audio_tagging_loss=0.008855, over 3051658.53 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:36:41,462 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.71 vs. limit=22.5 2023-11-27 00:36:50,376 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3645586.6666666665, ans=0.125 2023-11-27 00:36:59,312 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 546850 2023-11-27 00:36:59,446 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3645653.3333333335, ans=0.125 2023-11-27 00:37:32,707 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 5800, loss[loss=0.06005, simple_loss=0.08372, pruned_loss=0.008306, audio_tagging_loss=0.009881, over 15017.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08902, pruned_loss=0.01208, audio_tagging_loss=0.008777, over 3052776.43 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:37:42,694 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.557e+01 8.951e+01 9.661e+01 1.044e+02 1.253e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-27 00:37:50,996 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3645920.0, ans=0.125 2023-11-27 00:37:55,135 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 546900 2023-11-27 00:38:15,903 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3646053.3333333335, ans=0.1 2023-11-27 00:38:29,077 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 5850, loss[loss=0.0685, simple_loss=0.09658, pruned_loss=0.01415, audio_tagging_loss=0.006068, over 15316.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08919, pruned_loss=0.01218, audio_tagging_loss=0.008727, over 3055077.98 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:38:50,967 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 546950 2023-11-27 00:39:14,026 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3646453.3333333335, ans=0.125 2023-11-27 00:39:18,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3646453.3333333335, ans=0.0 2023-11-27 00:39:24,506 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 5900, loss[loss=0.07153, simple_loss=0.08628, pruned_loss=0.01838, audio_tagging_loss=0.01001, over 15032.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.0898, pruned_loss=0.01233, audio_tagging_loss=0.008545, over 3048719.39 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:39:27,846 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3646520.0, ans=0.2 2023-11-27 00:39:31,118 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3646520.0, ans=0.125 2023-11-27 00:39:34,492 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.679e+01 8.740e+01 9.357e+01 9.859e+01 1.378e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-27 00:39:36,379 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3646586.6666666665, ans=0.07 2023-11-27 00:39:42,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3646586.6666666665, ans=10.0 2023-11-27 00:39:47,251 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 547000 2023-11-27 00:39:51,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=3646653.3333333335, ans=10.0 2023-11-27 00:40:14,770 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3646786.6666666665, ans=0.1 2023-11-27 00:40:20,830 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 5950, loss[loss=0.07038, simple_loss=0.09401, pruned_loss=0.01575, audio_tagging_loss=0.007626, over 15735.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.089, pruned_loss=0.01214, audio_tagging_loss=0.008569, over 3048420.31 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:40:34,322 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3646920.0, ans=0.1 2023-11-27 00:40:42,363 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3646986.6666666665, ans=0.2 2023-11-27 00:40:43,311 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 547050 2023-11-27 00:40:50,920 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3646986.6666666665, ans=0.125 2023-11-27 00:41:16,153 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 6000, loss[loss=0.05894, simple_loss=0.07569, pruned_loss=0.01066, audio_tagging_loss=0.01043, over 15253.00 frames. ], tot_loss[loss=0.06486, simple_loss=0.08841, pruned_loss=0.01203, audio_tagging_loss=0.008621, over 3045214.57 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:41:16,154 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-27 00:41:43,246 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.7883, 4.9657, 5.0685, 4.9247], device='cuda:3') 2023-11-27 00:41:44,104 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.5146, 3.5108, 3.7639, 3.6672], device='cuda:3') 2023-11-27 00:41:48,440 INFO [train_asr.py:1267] (3/4) Epoch 46, validation: loss=0.05759, simple_loss=0.05057, pruned_loss=0.005367, audio_tagging_loss=0.02694, over 4681554.00 frames. 2023-11-27 00:41:48,440 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-27 00:41:58,336 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.728e+01 8.712e+01 9.506e+01 1.018e+02 1.169e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-27 00:42:10,645 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 547100 2023-11-27 00:42:27,944 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 00:42:29,233 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3647386.6666666665, ans=0.0 2023-11-27 00:42:41,312 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3647453.3333333335, ans=0.0 2023-11-27 00:42:42,865 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3647520.0, ans=0.125 2023-11-27 00:42:44,302 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 6050, loss[loss=0.06429, simple_loss=0.08127, pruned_loss=0.01449, audio_tagging_loss=0.009159, over 15663.00 frames. ], tot_loss[loss=0.06473, simple_loss=0.08822, pruned_loss=0.01198, audio_tagging_loss=0.008639, over 3051659.44 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:43:06,147 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 547150 2023-11-27 00:43:40,386 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 6100, loss[loss=0.06932, simple_loss=0.099, pruned_loss=0.01179, audio_tagging_loss=0.008034, over 14329.00 frames. ], tot_loss[loss=0.0652, simple_loss=0.08926, pruned_loss=0.01201, audio_tagging_loss=0.008566, over 3051013.56 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:43:44,745 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3647853.3333333335, ans=0.0 2023-11-27 00:43:49,751 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.697e+01 8.942e+01 9.763e+01 1.039e+02 1.274e+02, threshold=1.953e+02, percent-clipped=0.0 2023-11-27 00:44:01,499 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 547200 2023-11-27 00:44:01,733 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3647986.6666666665, ans=0.125 2023-11-27 00:44:08,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3647986.6666666665, ans=0.0 2023-11-27 00:44:10,494 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3647986.6666666665, ans=0.1 2023-11-27 00:44:14,185 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3648053.3333333335, ans=0.2 2023-11-27 00:44:16,346 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3648053.3333333335, ans=0.05 2023-11-27 00:44:24,225 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3648120.0, ans=0.125 2023-11-27 00:44:24,255 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3648120.0, ans=0.0 2023-11-27 00:44:25,246 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3648120.0, ans=0.125 2023-11-27 00:44:35,852 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 6150, loss[loss=0.06538, simple_loss=0.08829, pruned_loss=0.01145, audio_tagging_loss=0.009778, over 15481.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08892, pruned_loss=0.01198, audio_tagging_loss=0.008722, over 3056299.65 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:44:53,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3648253.3333333335, ans=0.1 2023-11-27 00:44:55,107 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3648253.3333333335, ans=0.0 2023-11-27 00:44:58,679 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 547250 2023-11-27 00:44:58,757 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3648320.0, ans=0.125 2023-11-27 00:45:03,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3648320.0, ans=0.125 2023-11-27 00:45:09,221 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.52 vs. limit=10.0 2023-11-27 00:45:11,008 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3648386.6666666665, ans=0.125 2023-11-27 00:45:23,565 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3648453.3333333335, ans=0.1 2023-11-27 00:45:24,802 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3648453.3333333335, ans=0.125 2023-11-27 00:45:31,495 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 6200, loss[loss=0.07234, simple_loss=0.1007, pruned_loss=0.01294, audio_tagging_loss=0.00903, over 14973.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.08866, pruned_loss=0.01192, audio_tagging_loss=0.008806, over 3054316.99 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:45:33,658 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.91 vs. limit=15.0 2023-11-27 00:45:42,346 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3648586.6666666665, ans=0.05 2023-11-27 00:45:43,666 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.816e+01 8.925e+01 9.447e+01 1.055e+02 1.440e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-27 00:45:54,303 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 547300 2023-11-27 00:46:12,115 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.78 vs. limit=15.0 2023-11-27 00:46:14,574 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3648720.0, ans=0.0 2023-11-27 00:46:17,711 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3648786.6666666665, ans=0.0 2023-11-27 00:46:28,167 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 6250, loss[loss=0.08947, simple_loss=0.1232, pruned_loss=0.01862, audio_tagging_loss=0.009223, over 16262.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08929, pruned_loss=0.01212, audio_tagging_loss=0.0088, over 3058028.71 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:46:49,452 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 547350 2023-11-27 00:46:51,753 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3648986.6666666665, ans=0.125 2023-11-27 00:47:01,430 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.34 vs. limit=15.0 2023-11-27 00:47:05,507 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3649053.3333333335, ans=0.1 2023-11-27 00:47:11,381 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3649120.0, ans=0.0 2023-11-27 00:47:14,661 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3649120.0, ans=0.0 2023-11-27 00:47:16,621 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3649120.0, ans=0.1 2023-11-27 00:47:22,756 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 6300, loss[loss=0.04721, simple_loss=0.06494, pruned_loss=0.007251, audio_tagging_loss=0.007491, over 14308.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08862, pruned_loss=0.01193, audio_tagging_loss=0.008879, over 3056881.42 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:47:28,359 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3649186.6666666665, ans=0.125 2023-11-27 00:47:33,389 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.486e+01 8.827e+01 9.482e+01 1.035e+02 1.564e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-27 00:47:45,066 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 547400 2023-11-27 00:47:46,324 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3649320.0, ans=0.1 2023-11-27 00:48:14,249 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.09 vs. limit=15.0 2023-11-27 00:48:18,651 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 6350, loss[loss=0.06748, simple_loss=0.08991, pruned_loss=0.01376, audio_tagging_loss=0.008763, over 15132.00 frames. ], tot_loss[loss=0.06482, simple_loss=0.08808, pruned_loss=0.01173, audio_tagging_loss=0.009044, over 3055959.64 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 00:48:26,813 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3649520.0, ans=0.125 2023-11-27 00:48:41,682 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 547450 2023-11-27 00:48:43,953 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3649653.3333333335, ans=0.0 2023-11-27 00:49:10,637 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3649786.6666666665, ans=0.2 2023-11-27 00:49:12,914 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 00:49:13,785 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3649853.3333333335, ans=0.1 2023-11-27 00:49:15,278 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 6400, loss[loss=0.09827, simple_loss=0.1321, pruned_loss=0.02555, audio_tagging_loss=0.00668, over 15752.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08942, pruned_loss=0.01204, audio_tagging_loss=0.008955, over 3052305.35 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:49:26,398 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.405e+01 8.880e+01 9.472e+01 1.045e+02 1.391e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-27 00:49:26,670 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3649920.0, ans=0.0 2023-11-27 00:49:31,899 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3649920.0, ans=0.125 2023-11-27 00:49:37,183 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 547500 2023-11-27 00:49:59,269 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.07 vs. limit=12.0 2023-11-27 00:50:08,147 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3650120.0, ans=0.0 2023-11-27 00:50:11,002 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 6450, loss[loss=0.07987, simple_loss=0.1052, pruned_loss=0.01857, audio_tagging_loss=0.008683, over 14135.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08943, pruned_loss=0.01191, audio_tagging_loss=0.009029, over 3052892.36 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:50:13,379 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3650186.6666666665, ans=0.0 2023-11-27 00:50:16,584 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3650186.6666666665, ans=0.0 2023-11-27 00:50:19,794 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3650186.6666666665, ans=0.125 2023-11-27 00:50:28,784 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3650253.3333333335, ans=0.2 2023-11-27 00:50:33,256 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 547550 2023-11-27 00:50:33,442 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3650320.0, ans=0.0 2023-11-27 00:51:00,922 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3650453.3333333335, ans=0.125 2023-11-27 00:51:05,275 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.88 vs. limit=10.0 2023-11-27 00:51:05,935 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 6500, loss[loss=0.05615, simple_loss=0.07221, pruned_loss=0.009018, audio_tagging_loss=0.01102, over 15609.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08882, pruned_loss=0.0117, audio_tagging_loss=0.009082, over 3047747.45 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:51:11,564 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3650520.0, ans=0.125 2023-11-27 00:51:17,708 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.492e+01 8.951e+01 9.386e+01 1.000e+02 1.193e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-27 00:51:29,078 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 547600 2023-11-27 00:51:30,558 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.43 vs. limit=6.0 2023-11-27 00:51:51,874 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3650786.6666666665, ans=0.2 2023-11-27 00:52:02,833 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 6550, loss[loss=0.07707, simple_loss=0.1128, pruned_loss=0.01552, audio_tagging_loss=0.005159, over 14600.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08877, pruned_loss=0.0117, audio_tagging_loss=0.008932, over 3046497.01 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:52:25,150 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 547650 2023-11-27 00:52:26,357 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3650986.6666666665, ans=0.2 2023-11-27 00:52:34,118 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3650986.6666666665, ans=0.1 2023-11-27 00:52:44,171 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3651053.3333333335, ans=0.125 2023-11-27 00:52:50,093 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 00:52:58,297 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 6600, loss[loss=0.06728, simple_loss=0.09702, pruned_loss=0.01292, audio_tagging_loss=0.005853, over 14899.00 frames. ], tot_loss[loss=0.065, simple_loss=0.08874, pruned_loss=0.01186, audio_tagging_loss=0.008775, over 3043600.22 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:53:02,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3651186.6666666665, ans=0.2 2023-11-27 00:53:09,340 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.438e+01 8.821e+01 9.435e+01 1.031e+02 1.384e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-27 00:53:17,465 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3651253.3333333335, ans=0.1 2023-11-27 00:53:21,060 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 547700 2023-11-27 00:53:24,488 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3651320.0, ans=0.125 2023-11-27 00:53:25,440 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3651320.0, ans=0.07 2023-11-27 00:53:30,222 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3651320.0, ans=0.125 2023-11-27 00:53:34,532 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3651386.6666666665, ans=0.125 2023-11-27 00:53:53,403 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3651520.0, ans=0.1 2023-11-27 00:53:54,166 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 6650, loss[loss=0.06081, simple_loss=0.07954, pruned_loss=0.01211, audio_tagging_loss=0.008932, over 16374.00 frames. ], tot_loss[loss=0.06465, simple_loss=0.08811, pruned_loss=0.01185, audio_tagging_loss=0.008743, over 3037980.99 frames. ], batch size: 63, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:53:59,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3651520.0, ans=0.125 2023-11-27 00:54:16,251 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3651653.3333333335, ans=0.2 2023-11-27 00:54:17,095 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 547750 2023-11-27 00:54:32,338 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3651720.0, ans=0.125 2023-11-27 00:54:37,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3651786.6666666665, ans=0.0 2023-11-27 00:54:39,881 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3651786.6666666665, ans=0.2 2023-11-27 00:54:50,314 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 6700, loss[loss=0.07031, simple_loss=0.08767, pruned_loss=0.0152, audio_tagging_loss=0.01127, over 15112.00 frames. ], tot_loss[loss=0.06472, simple_loss=0.08853, pruned_loss=0.01181, audio_tagging_loss=0.008646, over 3039774.11 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:54:50,617 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3651853.3333333335, ans=0.125 2023-11-27 00:55:01,537 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.189e+01 8.865e+01 9.450e+01 1.017e+02 1.235e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-27 00:55:12,861 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 547800 2023-11-27 00:55:43,453 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3652120.0, ans=0.125 2023-11-27 00:55:46,479 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 6750, loss[loss=0.06978, simple_loss=0.1072, pruned_loss=0.01047, audio_tagging_loss=0.005721, over 14752.00 frames. ], tot_loss[loss=0.0644, simple_loss=0.08823, pruned_loss=0.01164, audio_tagging_loss=0.008652, over 3038865.78 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:55:49,827 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3652186.6666666665, ans=0.125 2023-11-27 00:55:49,933 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3652186.6666666665, ans=0.07 2023-11-27 00:55:52,090 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3652186.6666666665, ans=0.125 2023-11-27 00:56:01,016 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3652253.3333333335, ans=0.125 2023-11-27 00:56:01,275 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.70 vs. limit=22.5 2023-11-27 00:56:09,337 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 547850 2023-11-27 00:56:24,665 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3652386.6666666665, ans=0.125 2023-11-27 00:56:32,250 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3652453.3333333335, ans=0.0 2023-11-27 00:56:42,051 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 6800, loss[loss=0.05184, simple_loss=0.07338, pruned_loss=0.006162, audio_tagging_loss=0.008989, over 14789.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08925, pruned_loss=0.01191, audio_tagging_loss=0.008587, over 3030987.34 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:56:53,721 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.866e+01 8.978e+01 9.815e+01 1.051e+02 1.384e+02, threshold=1.963e+02, percent-clipped=0.0 2023-11-27 00:56:56,188 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3652586.6666666665, ans=0.0 2023-11-27 00:56:57,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3652586.6666666665, ans=0.0 2023-11-27 00:56:58,148 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3652586.6666666665, ans=0.125 2023-11-27 00:57:04,887 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 547900 2023-11-27 00:57:15,551 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3652720.0, ans=0.1 2023-11-27 00:57:24,722 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3652720.0, ans=0.125 2023-11-27 00:57:38,345 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 6850, loss[loss=0.0729, simple_loss=0.09338, pruned_loss=0.01412, audio_tagging_loss=0.01209, over 14490.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.08883, pruned_loss=0.01191, audio_tagging_loss=0.008547, over 3033861.17 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:57:51,945 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3652920.0, ans=0.125 2023-11-27 00:58:00,230 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 547950 2023-11-27 00:58:07,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3652986.6666666665, ans=0.125 2023-11-27 00:58:34,335 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 6900, loss[loss=0.06281, simple_loss=0.08072, pruned_loss=0.009207, audio_tagging_loss=0.01324, over 15870.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08974, pruned_loss=0.0121, audio_tagging_loss=0.008446, over 3047327.27 frames. ], batch size: 62, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:58:37,640 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3653186.6666666665, ans=0.2 2023-11-27 00:58:40,929 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3653186.6666666665, ans=0.125 2023-11-27 00:58:43,177 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3653186.6666666665, ans=0.0 2023-11-27 00:58:45,074 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.731e+01 8.904e+01 9.598e+01 1.032e+02 1.208e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-27 00:58:46,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3653253.3333333335, ans=0.125 2023-11-27 00:58:56,289 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 548000 2023-11-27 00:58:57,994 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3653320.0, ans=0.0 2023-11-27 00:59:06,696 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3653320.0, ans=0.0 2023-11-27 00:59:09,902 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3653386.6666666665, ans=0.0 2023-11-27 00:59:10,850 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3653386.6666666665, ans=0.125 2023-11-27 00:59:19,657 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 00:59:31,379 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 6950, loss[loss=0.05056, simple_loss=0.06724, pruned_loss=0.009193, audio_tagging_loss=0.007748, over 14686.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08976, pruned_loss=0.01202, audio_tagging_loss=0.008524, over 3048922.41 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 00:59:33,741 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3653520.0, ans=0.125 2023-11-27 00:59:45,212 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.62 vs. limit=15.0 2023-11-27 00:59:49,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3653586.6666666665, ans=0.07 2023-11-27 00:59:53,304 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3653653.3333333335, ans=0.0 2023-11-27 00:59:54,813 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 548050 2023-11-27 00:59:58,209 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3653653.3333333335, ans=0.0 2023-11-27 01:00:12,901 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3653720.0, ans=10.0 2023-11-27 01:00:17,104 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3653786.6666666665, ans=0.05 2023-11-27 01:00:24,426 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3653786.6666666665, ans=0.0 2023-11-27 01:00:27,984 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 7000, loss[loss=0.0501, simple_loss=0.07042, pruned_loss=0.007449, audio_tagging_loss=0.007439, over 16147.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.09019, pruned_loss=0.01213, audio_tagging_loss=0.008648, over 3050828.83 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 32.0 2023-11-27 01:00:34,023 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.07 vs. limit=22.5 2023-11-27 01:00:35,879 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.14 vs. limit=15.0 2023-11-27 01:00:39,178 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.987e+01 8.912e+01 9.354e+01 1.017e+02 1.441e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-27 01:00:40,809 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.23 vs. limit=15.0 2023-11-27 01:00:48,691 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3653986.6666666665, ans=0.125 2023-11-27 01:00:49,674 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 548100 2023-11-27 01:01:07,109 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3654053.3333333335, ans=0.2 2023-11-27 01:01:23,268 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 7050, loss[loss=0.05363, simple_loss=0.06383, pruned_loss=0.01142, audio_tagging_loss=0.0103, over 15554.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08954, pruned_loss=0.01204, audio_tagging_loss=0.00875, over 3059140.23 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 01:01:44,567 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 548150 2023-11-27 01:01:53,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3654320.0, ans=0.1 2023-11-27 01:01:57,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3654386.6666666665, ans=0.125 2023-11-27 01:01:59,648 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 01:02:05,456 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3654386.6666666665, ans=0.1 2023-11-27 01:02:12,945 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3654453.3333333335, ans=0.125 2023-11-27 01:02:18,132 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 7100, loss[loss=0.07214, simple_loss=0.1004, pruned_loss=0.0141, audio_tagging_loss=0.007824, over 14970.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08953, pruned_loss=0.01204, audio_tagging_loss=0.008754, over 3055743.72 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 01:02:26,627 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3654520.0, ans=0.1 2023-11-27 01:02:26,714 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3654520.0, ans=0.025 2023-11-27 01:02:30,244 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.735e+01 8.909e+01 9.590e+01 1.018e+02 1.394e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-27 01:02:36,266 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 01:02:40,417 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 548200 2023-11-27 01:02:46,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3654653.3333333335, ans=0.0 2023-11-27 01:02:48,684 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3654653.3333333335, ans=0.5 2023-11-27 01:02:55,375 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.68 vs. limit=6.0 2023-11-27 01:03:08,861 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3654786.6666666665, ans=0.125 2023-11-27 01:03:13,936 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 7150, loss[loss=0.06019, simple_loss=0.08663, pruned_loss=0.008499, audio_tagging_loss=0.008378, over 15533.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.08966, pruned_loss=0.01209, audio_tagging_loss=0.008841, over 3055691.14 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 01:03:28,527 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.14 vs. limit=15.0 2023-11-27 01:03:33,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3654920.0, ans=0.1 2023-11-27 01:03:36,464 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 548250 2023-11-27 01:03:55,971 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3655053.3333333335, ans=0.125 2023-11-27 01:04:09,625 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 7200, loss[loss=0.05608, simple_loss=0.06463, pruned_loss=0.009879, audio_tagging_loss=0.01389, over 14248.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08859, pruned_loss=0.01194, audio_tagging_loss=0.008955, over 3058231.10 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 01:04:16,038 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3655186.6666666665, ans=0.125 2023-11-27 01:04:22,353 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.766e+01 9.112e+01 9.564e+01 1.040e+02 1.454e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-27 01:04:28,914 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3655253.3333333335, ans=0.1 2023-11-27 01:04:30,988 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 548300 2023-11-27 01:04:32,135 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3655320.0, ans=0.04949747468305833 2023-11-27 01:04:39,723 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3655320.0, ans=0.125 2023-11-27 01:04:41,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3655386.6666666665, ans=0.0 2023-11-27 01:04:41,800 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3655386.6666666665, ans=0.2 2023-11-27 01:05:00,771 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3655453.3333333335, ans=0.125 2023-11-27 01:05:03,876 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3655520.0, ans=0.125 2023-11-27 01:05:04,728 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 7250, loss[loss=0.08614, simple_loss=0.1239, pruned_loss=0.01803, audio_tagging_loss=0.006175, over 13982.00 frames. ], tot_loss[loss=0.065, simple_loss=0.08861, pruned_loss=0.01175, audio_tagging_loss=0.008941, over 3050227.20 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 01:05:16,789 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.44 vs. limit=15.0 2023-11-27 01:05:19,962 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3655586.6666666665, ans=0.125 2023-11-27 01:05:27,651 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 548350 2023-11-27 01:05:52,707 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3655786.6666666665, ans=0.125 2023-11-27 01:05:53,690 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3655786.6666666665, ans=0.0 2023-11-27 01:05:56,355 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.19 vs. limit=15.0 2023-11-27 01:05:59,832 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 7300, loss[loss=0.06891, simple_loss=0.09465, pruned_loss=0.01154, audio_tagging_loss=0.01005, over 16704.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08937, pruned_loss=0.01196, audio_tagging_loss=0.008814, over 3046031.26 frames. ], batch size: 63, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 01:06:02,316 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3655853.3333333335, ans=0.1 2023-11-27 01:06:12,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3655920.0, ans=0.125 2023-11-27 01:06:13,359 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3655920.0, ans=0.2 2023-11-27 01:06:14,684 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.602e+01 8.978e+01 9.664e+01 1.039e+02 1.460e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-27 01:06:19,130 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 01:06:23,175 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 548400 2023-11-27 01:06:25,786 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 01:06:46,035 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3656120.0, ans=0.125 2023-11-27 01:06:47,046 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3656120.0, ans=0.125 2023-11-27 01:06:47,482 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.13 vs. limit=15.0 2023-11-27 01:06:47,536 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.87 vs. limit=15.0 2023-11-27 01:06:57,552 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 7350, loss[loss=0.05153, simple_loss=0.07932, pruned_loss=0.004289, audio_tagging_loss=0.007581, over 16049.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.08873, pruned_loss=0.01195, audio_tagging_loss=0.008777, over 3043031.48 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 01:07:01,883 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3656186.6666666665, ans=0.0 2023-11-27 01:07:04,323 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.98 vs. limit=15.0 2023-11-27 01:07:05,087 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3656186.6666666665, ans=0.1 2023-11-27 01:07:06,180 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3656186.6666666665, ans=0.1 2023-11-27 01:07:08,240 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3656253.3333333335, ans=0.125 2023-11-27 01:07:13,455 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3656253.3333333335, ans=0.125 2023-11-27 01:07:17,898 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3656320.0, ans=0.125 2023-11-27 01:07:18,765 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 548450 2023-11-27 01:07:18,975 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3656320.0, ans=0.125 2023-11-27 01:07:49,596 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.26 vs. limit=22.5 2023-11-27 01:07:52,334 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 7400, loss[loss=0.04561, simple_loss=0.06339, pruned_loss=0.00448, audio_tagging_loss=0.009432, over 14576.00 frames. ], tot_loss[loss=0.0646, simple_loss=0.08809, pruned_loss=0.01187, audio_tagging_loss=0.008688, over 3036891.27 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 01:08:05,013 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.471e+01 8.855e+01 9.450e+01 1.015e+02 1.303e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-27 01:08:14,705 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 548500 2023-11-27 01:08:23,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3656653.3333333335, ans=0.0 2023-11-27 01:08:47,515 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 7450, loss[loss=0.05927, simple_loss=0.08427, pruned_loss=0.008262, audio_tagging_loss=0.008872, over 14606.00 frames. ], tot_loss[loss=0.06472, simple_loss=0.08837, pruned_loss=0.01196, audio_tagging_loss=0.008577, over 3032833.31 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 01:09:01,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3656920.0, ans=0.125 2023-11-27 01:09:05,405 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3656920.0, ans=0.0 2023-11-27 01:09:10,526 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 548550 2023-11-27 01:09:22,866 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=3657053.3333333335, ans=15.0 2023-11-27 01:09:26,703 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3657053.3333333335, ans=0.125 2023-11-27 01:09:27,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3657053.3333333335, ans=0.2 2023-11-27 01:09:39,433 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3657120.0, ans=0.125 2023-11-27 01:09:41,545 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3657120.0, ans=0.0 2023-11-27 01:09:42,628 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3657186.6666666665, ans=0.125 2023-11-27 01:09:43,442 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 7500, loss[loss=0.06192, simple_loss=0.07915, pruned_loss=0.01167, audio_tagging_loss=0.01068, over 15883.00 frames. ], tot_loss[loss=0.06461, simple_loss=0.08825, pruned_loss=0.01192, audio_tagging_loss=0.008559, over 3038122.25 frames. ], batch size: 63, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 01:09:44,708 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3657186.6666666665, ans=0.2 2023-11-27 01:09:54,328 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3657253.3333333335, ans=0.0 2023-11-27 01:09:57,374 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.455e+01 8.963e+01 9.690e+01 1.036e+02 1.410e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-27 01:09:58,663 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3657253.3333333335, ans=0.125 2023-11-27 01:09:58,733 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3657253.3333333335, ans=0.125 2023-11-27 01:10:00,664 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3657253.3333333335, ans=0.125 2023-11-27 01:10:02,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3657253.3333333335, ans=0.2 2023-11-27 01:10:05,774 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 548600 2023-11-27 01:10:15,629 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.67 vs. limit=15.0 2023-11-27 01:10:16,284 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3657386.6666666665, ans=0.1 2023-11-27 01:10:23,811 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3657386.6666666665, ans=0.125 2023-11-27 01:10:29,836 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.08 vs. limit=15.0 2023-11-27 01:10:33,155 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3657453.3333333335, ans=0.0 2023-11-27 01:10:39,354 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 7550, loss[loss=0.05697, simple_loss=0.07823, pruned_loss=0.01073, audio_tagging_loss=0.007125, over 14377.00 frames. ], tot_loss[loss=0.06441, simple_loss=0.08816, pruned_loss=0.01178, audio_tagging_loss=0.00855, over 3039095.18 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-27 01:10:43,777 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3657520.0, ans=0.0 2023-11-27 01:10:53,338 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3657586.6666666665, ans=0.125 2023-11-27 01:11:01,652 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 548650 2023-11-27 01:11:03,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3657653.3333333335, ans=0.1 2023-11-27 01:11:06,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3657653.3333333335, ans=0.1 2023-11-27 01:11:06,057 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3657653.3333333335, ans=0.125 2023-11-27 01:11:10,806 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3657653.3333333335, ans=0.1 2023-11-27 01:11:25,963 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3657786.6666666665, ans=0.07 2023-11-27 01:11:34,187 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 7600, loss[loss=0.07158, simple_loss=0.08351, pruned_loss=0.01766, audio_tagging_loss=0.01217, over 13791.00 frames. ], tot_loss[loss=0.06396, simple_loss=0.08735, pruned_loss=0.01173, audio_tagging_loss=0.008551, over 3044076.14 frames. ], batch size: 53, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 01:11:47,466 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.97 vs. limit=22.5 2023-11-27 01:11:47,970 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.020e+01 8.781e+01 9.560e+01 1.034e+02 1.331e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-27 01:11:50,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3657920.0, ans=0.1 2023-11-27 01:11:57,277 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 548700 2023-11-27 01:12:08,512 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3658053.3333333335, ans=0.125 2023-11-27 01:12:11,716 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3658053.3333333335, ans=0.125 2023-11-27 01:12:29,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3658186.6666666665, ans=0.0 2023-11-27 01:12:30,372 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 7650, loss[loss=0.0613, simple_loss=0.08854, pruned_loss=0.009952, audio_tagging_loss=0.007079, over 15859.00 frames. ], tot_loss[loss=0.06396, simple_loss=0.08746, pruned_loss=0.01175, audio_tagging_loss=0.008483, over 3039205.15 frames. ], batch size: 62, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 01:12:41,864 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3658253.3333333335, ans=0.09899494936611666 2023-11-27 01:12:52,766 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 548750 2023-11-27 01:13:08,887 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3658386.6666666665, ans=0.125 2023-11-27 01:13:10,984 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3658386.6666666665, ans=0.07 2023-11-27 01:13:26,549 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 7700, loss[loss=0.07764, simple_loss=0.1071, pruned_loss=0.0139, audio_tagging_loss=0.01017, over 15026.00 frames. ], tot_loss[loss=0.06478, simple_loss=0.08874, pruned_loss=0.01198, audio_tagging_loss=0.008428, over 3037127.39 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:13:27,886 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3658520.0, ans=0.1 2023-11-27 01:13:30,192 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.13 vs. limit=12.0 2023-11-27 01:13:40,201 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.592e+01 8.982e+01 9.750e+01 1.038e+02 1.363e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-27 01:13:45,208 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.99 vs. limit=15.0 2023-11-27 01:13:48,763 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 548800 2023-11-27 01:13:58,705 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3658720.0, ans=0.0 2023-11-27 01:14:11,862 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.79 vs. limit=15.0 2023-11-27 01:14:21,534 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 7750, loss[loss=0.05722, simple_loss=0.07521, pruned_loss=0.009759, audio_tagging_loss=0.009856, over 16787.00 frames. ], tot_loss[loss=0.06468, simple_loss=0.08854, pruned_loss=0.01191, audio_tagging_loss=0.008497, over 3034444.27 frames. ], batch size: 61, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:14:22,072 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=3658853.3333333335, ans=15.0 2023-11-27 01:14:23,887 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3658853.3333333335, ans=0.07 2023-11-27 01:14:30,045 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3658853.3333333335, ans=10.0 2023-11-27 01:14:44,220 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 548850 2023-11-27 01:15:01,709 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3659053.3333333335, ans=0.05 2023-11-27 01:15:05,944 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3659120.0, ans=0.0 2023-11-27 01:15:17,428 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 7800, loss[loss=0.07388, simple_loss=0.1028, pruned_loss=0.01363, audio_tagging_loss=0.008833, over 14919.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08976, pruned_loss=0.01217, audio_tagging_loss=0.00859, over 3038886.58 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:15:31,595 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.098e+01 9.034e+01 9.648e+01 1.056e+02 1.237e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-27 01:15:34,926 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3659253.3333333335, ans=0.1 2023-11-27 01:15:39,501 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 548900 2023-11-27 01:15:48,615 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.52 vs. limit=22.5 2023-11-27 01:15:49,251 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3659386.6666666665, ans=0.125 2023-11-27 01:16:09,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3659453.3333333335, ans=0.0 2023-11-27 01:16:11,473 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.50 vs. limit=15.0 2023-11-27 01:16:12,937 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 7850, loss[loss=0.06808, simple_loss=0.09648, pruned_loss=0.01247, audio_tagging_loss=0.007372, over 16390.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08906, pruned_loss=0.01209, audio_tagging_loss=0.008627, over 3036171.36 frames. ], batch size: 62, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:16:21,445 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.54 vs. limit=22.5 2023-11-27 01:16:35,379 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 548950 2023-11-27 01:16:35,454 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3659653.3333333335, ans=0.125 2023-11-27 01:16:39,411 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.84 vs. limit=15.0 2023-11-27 01:16:42,493 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3659653.3333333335, ans=0.0 2023-11-27 01:17:08,652 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 7900, loss[loss=0.09541, simple_loss=0.1395, pruned_loss=0.0191, audio_tagging_loss=0.006544, over 16621.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08886, pruned_loss=0.01212, audio_tagging_loss=0.008721, over 3043978.12 frames. ], batch size: 60, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:17:15,167 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3659853.3333333335, ans=0.1 2023-11-27 01:17:19,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3659920.0, ans=0.125 2023-11-27 01:17:23,301 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.697e+01 9.289e+01 9.929e+01 1.057e+02 1.408e+02, threshold=1.986e+02, percent-clipped=0.0 2023-11-27 01:17:23,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3659920.0, ans=0.0 2023-11-27 01:17:31,358 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 549000 2023-11-27 01:17:41,370 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3660053.3333333335, ans=0.125 2023-11-27 01:17:59,231 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3660120.0, ans=0.0 2023-11-27 01:18:04,845 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 7950, loss[loss=0.06123, simple_loss=0.08831, pruned_loss=0.007628, audio_tagging_loss=0.009445, over 15059.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08926, pruned_loss=0.01192, audio_tagging_loss=0.00877, over 3045195.15 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:18:16,266 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3660253.3333333335, ans=0.125 2023-11-27 01:18:18,138 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 01:18:21,434 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3660253.3333333335, ans=0.1 2023-11-27 01:18:23,743 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3660253.3333333335, ans=0.1 2023-11-27 01:18:24,767 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3660253.3333333335, ans=0.125 2023-11-27 01:18:26,794 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 549050 2023-11-27 01:18:33,839 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3660320.0, ans=0.0 2023-11-27 01:18:44,669 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 01:18:47,818 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3660386.6666666665, ans=0.125 2023-11-27 01:18:53,164 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3660453.3333333335, ans=0.0 2023-11-27 01:18:54,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3660453.3333333335, ans=0.0 2023-11-27 01:18:57,285 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.76 vs. limit=15.0 2023-11-27 01:19:00,889 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 8000, loss[loss=0.04536, simple_loss=0.05269, pruned_loss=0.007176, audio_tagging_loss=0.01183, over 15110.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08937, pruned_loss=0.01204, audio_tagging_loss=0.00887, over 3044433.48 frames. ], batch size: 60, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 01:19:14,454 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.336e+01 9.017e+01 9.575e+01 1.027e+02 1.291e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-27 01:19:22,502 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 549100 2023-11-27 01:19:48,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3660786.6666666665, ans=0.1 2023-11-27 01:19:55,672 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 8050, loss[loss=0.07113, simple_loss=0.09452, pruned_loss=0.0143, audio_tagging_loss=0.009567, over 15080.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.0886, pruned_loss=0.01194, audio_tagging_loss=0.008927, over 3047748.58 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 01:19:58,130 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3660853.3333333335, ans=0.025 2023-11-27 01:20:01,787 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3660853.3333333335, ans=0.5 2023-11-27 01:20:14,435 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3660920.0, ans=0.125 2023-11-27 01:20:16,313 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3660920.0, ans=0.0 2023-11-27 01:20:18,371 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 549150 2023-11-27 01:20:19,791 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.65 vs. limit=15.0 2023-11-27 01:20:27,294 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3660986.6666666665, ans=0.2 2023-11-27 01:20:29,365 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3661053.3333333335, ans=0.125 2023-11-27 01:20:37,289 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.03 vs. limit=10.0 2023-11-27 01:20:50,501 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.09 vs. limit=15.0 2023-11-27 01:20:51,913 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 8100, loss[loss=0.06093, simple_loss=0.08142, pruned_loss=0.009867, audio_tagging_loss=0.01035, over 15617.00 frames. ], tot_loss[loss=0.06455, simple_loss=0.08767, pruned_loss=0.01185, audio_tagging_loss=0.008861, over 3047755.61 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:20:56,506 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.88 vs. limit=6.0 2023-11-27 01:21:07,269 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.755e+01 8.808e+01 9.534e+01 1.042e+02 1.593e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-27 01:21:09,691 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3661253.3333333335, ans=0.125 2023-11-27 01:21:13,732 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 549200 2023-11-27 01:21:37,240 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3661453.3333333335, ans=0.0 2023-11-27 01:21:44,402 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3661453.3333333335, ans=0.125 2023-11-27 01:21:47,908 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 8150, loss[loss=0.06999, simple_loss=0.09657, pruned_loss=0.01358, audio_tagging_loss=0.008121, over 14827.00 frames. ], tot_loss[loss=0.0646, simple_loss=0.08788, pruned_loss=0.01189, audio_tagging_loss=0.008767, over 3046342.81 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:21:49,352 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.49 vs. limit=15.0 2023-11-27 01:21:50,173 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3661520.0, ans=0.05 2023-11-27 01:22:08,643 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.93 vs. limit=15.0 2023-11-27 01:22:09,095 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 549250 2023-11-27 01:22:10,785 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3661653.3333333335, ans=0.125 2023-11-27 01:22:13,998 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.33 vs. limit=15.0 2023-11-27 01:22:23,422 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3661720.0, ans=0.125 2023-11-27 01:22:35,722 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=3661786.6666666665, ans=0.05 2023-11-27 01:22:41,926 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 01:22:42,946 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 8200, loss[loss=0.06738, simple_loss=0.09292, pruned_loss=0.01257, audio_tagging_loss=0.008348, over 15035.00 frames. ], tot_loss[loss=0.06469, simple_loss=0.08807, pruned_loss=0.01195, audio_tagging_loss=0.008703, over 3043804.29 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:22:58,790 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.772e+01 8.840e+01 9.434e+01 1.030e+02 1.387e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-27 01:23:05,250 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 549300 2023-11-27 01:23:14,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3661986.6666666665, ans=0.125 2023-11-27 01:23:14,538 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3661986.6666666665, ans=0.125 2023-11-27 01:23:20,999 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.94 vs. limit=15.0 2023-11-27 01:23:38,491 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 8250, loss[loss=0.07622, simple_loss=0.1007, pruned_loss=0.01924, audio_tagging_loss=0.006611, over 15351.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08966, pruned_loss=0.01217, audio_tagging_loss=0.008453, over 3045920.07 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:23:50,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3662253.3333333335, ans=0.125 2023-11-27 01:24:00,864 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 549350 2023-11-27 01:24:03,000 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3662320.0, ans=0.125 2023-11-27 01:24:09,899 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.33 vs. limit=15.0 2023-11-27 01:24:10,447 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3662386.6666666665, ans=0.1 2023-11-27 01:24:14,147 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.36 vs. limit=15.0 2023-11-27 01:24:16,830 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3662386.6666666665, ans=0.125 2023-11-27 01:24:33,352 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3662520.0, ans=0.1 2023-11-27 01:24:34,735 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 8300, loss[loss=0.06406, simple_loss=0.08726, pruned_loss=0.0121, audio_tagging_loss=0.008337, over 15137.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08991, pruned_loss=0.01216, audio_tagging_loss=0.008469, over 3048907.22 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:24:34,885 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3662520.0, ans=0.0 2023-11-27 01:24:49,550 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.731e+01 9.008e+01 9.718e+01 1.064e+02 1.333e+02, threshold=1.944e+02, percent-clipped=0.0 2023-11-27 01:24:56,117 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 549400 2023-11-27 01:25:06,119 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3662653.3333333335, ans=0.125 2023-11-27 01:25:07,417 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.93 vs. limit=10.0 2023-11-27 01:25:10,222 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3662720.0, ans=0.125 2023-11-27 01:25:14,422 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3662720.0, ans=0.125 2023-11-27 01:25:29,695 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 8350, loss[loss=0.0681, simple_loss=0.09814, pruned_loss=0.01108, audio_tagging_loss=0.007947, over 15850.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08972, pruned_loss=0.01217, audio_tagging_loss=0.008447, over 3043277.53 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 8.0 2023-11-27 01:25:52,451 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 549450 2023-11-27 01:26:03,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3663053.3333333335, ans=0.125 2023-11-27 01:26:19,938 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.18 vs. limit=10.0 2023-11-27 01:26:24,607 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 8400, loss[loss=0.05379, simple_loss=0.07371, pruned_loss=0.008286, audio_tagging_loss=0.008651, over 15943.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08953, pruned_loss=0.01213, audio_tagging_loss=0.008435, over 3047043.41 frames. ], batch size: 59, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:26:42,700 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.425e+01 8.598e+01 9.317e+01 1.002e+02 1.221e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-27 01:26:48,134 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 549500 2023-11-27 01:26:55,622 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3663320.0, ans=0.125 2023-11-27 01:27:02,931 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=3663386.6666666665, ans=10.0 2023-11-27 01:27:06,691 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.95 vs. limit=15.0 2023-11-27 01:27:17,629 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.19 vs. limit=6.0 2023-11-27 01:27:21,401 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 8450, loss[loss=0.05146, simple_loss=0.07265, pruned_loss=0.00721, audio_tagging_loss=0.007926, over 16140.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08899, pruned_loss=0.01208, audio_tagging_loss=0.00844, over 3050433.04 frames. ], batch size: 63, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:27:36,129 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.59 vs. limit=15.0 2023-11-27 01:27:43,109 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 549550 2023-11-27 01:28:15,341 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.82 vs. limit=6.0 2023-11-27 01:28:16,775 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 8500, loss[loss=0.05235, simple_loss=0.07276, pruned_loss=0.007866, audio_tagging_loss=0.008102, over 16332.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08945, pruned_loss=0.01199, audio_tagging_loss=0.008461, over 3053679.91 frames. ], batch size: 60, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:28:21,134 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3663853.3333333335, ans=0.125 2023-11-27 01:28:22,491 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.48 vs. limit=10.0 2023-11-27 01:28:25,427 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3663853.3333333335, ans=0.1 2023-11-27 01:28:32,911 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.581e+01 8.917e+01 9.803e+01 1.059e+02 2.470e+02, threshold=1.961e+02, percent-clipped=1.0 2023-11-27 01:28:38,839 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 549600 2023-11-27 01:28:53,282 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.99 vs. limit=22.5 2023-11-27 01:29:11,611 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 8550, loss[loss=0.05104, simple_loss=0.06457, pruned_loss=0.009228, audio_tagging_loss=0.009523, over 16376.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08933, pruned_loss=0.01183, audio_tagging_loss=0.008522, over 3058870.00 frames. ], batch size: 63, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:29:26,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3664253.3333333335, ans=0.0 2023-11-27 01:29:30,050 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3664253.3333333335, ans=0.0 2023-11-27 01:29:35,114 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 549650 2023-11-27 01:30:03,860 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3664453.3333333335, ans=0.2 2023-11-27 01:30:07,864 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 8600, loss[loss=0.06913, simple_loss=0.09889, pruned_loss=0.0118, audio_tagging_loss=0.007878, over 14836.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.08986, pruned_loss=0.01198, audio_tagging_loss=0.008646, over 3065580.38 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:30:15,935 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3664520.0, ans=0.2 2023-11-27 01:30:22,277 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3664586.6666666665, ans=0.125 2023-11-27 01:30:24,204 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.616e+01 8.820e+01 9.467e+01 9.988e+01 1.186e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-27 01:30:29,570 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 549700 2023-11-27 01:30:29,805 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3664653.3333333335, ans=0.125 2023-11-27 01:30:35,041 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3664653.3333333335, ans=0.0 2023-11-27 01:30:47,756 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3664720.0, ans=0.1 2023-11-27 01:31:02,664 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3664853.3333333335, ans=0.125 2023-11-27 01:31:02,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3664853.3333333335, ans=0.125 2023-11-27 01:31:03,578 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 8650, loss[loss=0.0762, simple_loss=0.1042, pruned_loss=0.01485, audio_tagging_loss=0.009247, over 15973.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08948, pruned_loss=0.01189, audio_tagging_loss=0.008661, over 3062460.28 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:31:22,829 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.59 vs. limit=15.0 2023-11-27 01:31:26,030 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 549750 2023-11-27 01:31:26,234 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3664986.6666666665, ans=0.0 2023-11-27 01:31:26,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3664986.6666666665, ans=0.04949747468305833 2023-11-27 01:31:32,518 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3664986.6666666665, ans=0.07 2023-11-27 01:31:39,758 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3665053.3333333335, ans=0.015 2023-11-27 01:31:39,933 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3665053.3333333335, ans=0.09899494936611666 2023-11-27 01:31:51,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3665120.0, ans=0.125 2023-11-27 01:31:53,661 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3665120.0, ans=0.95 2023-11-27 01:31:58,579 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 8700, loss[loss=0.05998, simple_loss=0.07901, pruned_loss=0.01047, audio_tagging_loss=0.01001, over 14064.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.09085, pruned_loss=0.01207, audio_tagging_loss=0.008687, over 3058540.80 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:32:07,248 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3665186.6666666665, ans=0.05 2023-11-27 01:32:15,443 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.944e+01 9.069e+01 9.762e+01 1.053e+02 1.470e+02, threshold=1.952e+02, percent-clipped=0.0 2023-11-27 01:32:16,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3665253.3333333335, ans=0.07 2023-11-27 01:32:21,544 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 549800 2023-11-27 01:32:36,434 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3665386.6666666665, ans=0.125 2023-11-27 01:32:38,446 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3665386.6666666665, ans=0.125 2023-11-27 01:32:53,332 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3665453.3333333335, ans=0.1 2023-11-27 01:32:55,287 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 8750, loss[loss=0.06705, simple_loss=0.08282, pruned_loss=0.01564, audio_tagging_loss=0.01001, over 14920.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.09084, pruned_loss=0.01219, audio_tagging_loss=0.00877, over 3051511.22 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:32:56,435 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3665520.0, ans=0.125 2023-11-27 01:33:17,395 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 549850 2023-11-27 01:33:28,182 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.91 vs. limit=15.0 2023-11-27 01:33:39,135 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.60 vs. limit=22.5 2023-11-27 01:33:50,742 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 8800, loss[loss=0.09247, simple_loss=0.135, pruned_loss=0.0198, audio_tagging_loss=0.005161, over 16152.00 frames. ], tot_loss[loss=0.06689, simple_loss=0.0914, pruned_loss=0.01233, audio_tagging_loss=0.008858, over 3055664.25 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:33:53,521 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3665853.3333333335, ans=0.125 2023-11-27 01:34:08,226 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.927e+01 8.987e+01 9.532e+01 1.025e+02 1.979e+02, threshold=1.906e+02, percent-clipped=1.0 2023-11-27 01:34:08,646 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.74 vs. limit=10.0 2023-11-27 01:34:13,063 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 549900 2023-11-27 01:34:17,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3665986.6666666665, ans=0.0 2023-11-27 01:34:31,323 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3666053.3333333335, ans=0.1 2023-11-27 01:34:46,290 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 8850, loss[loss=0.05404, simple_loss=0.06649, pruned_loss=0.01183, audio_tagging_loss=0.008961, over 15272.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.09054, pruned_loss=0.01227, audio_tagging_loss=0.008897, over 3048942.05 frames. ], batch size: 59, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:34:47,517 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3666186.6666666665, ans=0.1 2023-11-27 01:34:47,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3666186.6666666665, ans=0.0 2023-11-27 01:34:55,322 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 01:35:09,299 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 549950 2023-11-27 01:35:23,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3666386.6666666665, ans=0.125 2023-11-27 01:35:41,023 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.79 vs. limit=12.0 2023-11-27 01:35:42,759 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 8900, loss[loss=0.06408, simple_loss=0.0868, pruned_loss=0.01112, audio_tagging_loss=0.009557, over 14807.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.09128, pruned_loss=0.01231, audio_tagging_loss=0.008768, over 3045303.63 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:36:00,346 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.867e+01 8.952e+01 9.534e+01 1.026e+02 1.525e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-27 01:36:05,262 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 550000 2023-11-27 01:36:16,262 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3666720.0, ans=0.125 2023-11-27 01:36:17,188 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3666720.0, ans=0.1 2023-11-27 01:36:18,367 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3666720.0, ans=0.125 2023-11-27 01:36:18,434 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3666720.0, ans=0.125 2023-11-27 01:36:21,559 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3666720.0, ans=0.0 2023-11-27 01:36:38,837 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 8950, loss[loss=0.08599, simple_loss=0.1172, pruned_loss=0.02015, audio_tagging_loss=0.007255, over 15779.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.09049, pruned_loss=0.01221, audio_tagging_loss=0.008638, over 3048024.48 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 8.0 2023-11-27 01:36:43,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3666853.3333333335, ans=0.0 2023-11-27 01:36:50,244 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3666920.0, ans=0.0 2023-11-27 01:37:00,462 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 550050 2023-11-27 01:37:07,290 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.96 vs. limit=15.0 2023-11-27 01:37:27,991 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.29 vs. limit=15.0 2023-11-27 01:37:34,330 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 9000, loss[loss=0.0639, simple_loss=0.08524, pruned_loss=0.01147, audio_tagging_loss=0.009814, over 13846.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.09066, pruned_loss=0.01226, audio_tagging_loss=0.008497, over 3055147.37 frames. ], batch size: 53, lr: 1.46e-03, grad_scale: 8.0 2023-11-27 01:37:34,331 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-27 01:38:07,100 INFO [train_asr.py:1267] (3/4) Epoch 46, validation: loss=0.05879, simple_loss=0.05049, pruned_loss=0.005306, audio_tagging_loss=0.02824, over 4681554.00 frames. 2023-11-27 01:38:07,100 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-27 01:38:13,086 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3667186.6666666665, ans=0.125 2023-11-27 01:38:15,155 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3667186.6666666665, ans=0.125 2023-11-27 01:38:25,417 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.363e+01 8.928e+01 9.533e+01 1.025e+02 1.320e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-27 01:38:26,809 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3667253.3333333335, ans=0.125 2023-11-27 01:38:29,412 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 550100 2023-11-27 01:38:33,147 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.88 vs. limit=12.0 2023-11-27 01:38:34,848 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3667320.0, ans=0.125 2023-11-27 01:38:38,055 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3667320.0, ans=0.0 2023-11-27 01:38:42,256 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3667386.6666666665, ans=0.0 2023-11-27 01:39:02,707 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 9050, loss[loss=0.07071, simple_loss=0.106, pruned_loss=0.01075, audio_tagging_loss=0.006951, over 15271.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.09093, pruned_loss=0.01233, audio_tagging_loss=0.008342, over 3056324.95 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 4.0 2023-11-27 01:39:03,942 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3667520.0, ans=0.0 2023-11-27 01:39:12,993 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3667586.6666666665, ans=0.0 2023-11-27 01:39:24,572 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 550150 2023-11-27 01:39:33,162 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3667653.3333333335, ans=0.05 2023-11-27 01:39:58,464 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 9100, loss[loss=0.08149, simple_loss=0.1236, pruned_loss=0.01385, audio_tagging_loss=0.005826, over 16417.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.09071, pruned_loss=0.01214, audio_tagging_loss=0.00826, over 3054426.13 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 8.0 2023-11-27 01:40:13,787 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3667920.0, ans=0.2 2023-11-27 01:40:19,179 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.684e+01 9.136e+01 9.567e+01 1.016e+02 1.322e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-27 01:40:21,398 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 550200 2023-11-27 01:40:51,222 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3668120.0, ans=0.125 2023-11-27 01:40:54,723 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 9150, loss[loss=0.0644, simple_loss=0.09289, pruned_loss=0.008956, audio_tagging_loss=0.009003, over 16552.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.09097, pruned_loss=0.01215, audio_tagging_loss=0.008287, over 3053481.80 frames. ], batch size: 60, lr: 1.46e-03, grad_scale: 8.0 2023-11-27 01:41:07,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3668253.3333333335, ans=0.015 2023-11-27 01:41:13,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3668253.3333333335, ans=0.1 2023-11-27 01:41:16,513 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 550250 2023-11-27 01:41:37,707 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.51 vs. limit=10.0 2023-11-27 01:41:40,582 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3668453.3333333335, ans=0.0 2023-11-27 01:41:46,383 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3668453.3333333335, ans=0.125 2023-11-27 01:41:50,293 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 9200, loss[loss=0.04713, simple_loss=0.06297, pruned_loss=0.00558, audio_tagging_loss=0.01007, over 15050.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.09048, pruned_loss=0.01215, audio_tagging_loss=0.008258, over 3050974.18 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:41:57,809 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3668520.0, ans=0.125 2023-11-27 01:42:09,843 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.485e+01 8.971e+01 9.683e+01 1.056e+02 2.334e+02, threshold=1.937e+02, percent-clipped=1.0 2023-11-27 01:42:12,024 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 550300 2023-11-27 01:42:21,757 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3668653.3333333335, ans=0.0 2023-11-27 01:42:27,547 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3668720.0, ans=0.125 2023-11-27 01:42:41,364 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3668786.6666666665, ans=0.125 2023-11-27 01:42:45,341 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 9250, loss[loss=0.06744, simple_loss=0.09395, pruned_loss=0.01334, audio_tagging_loss=0.00712, over 15421.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08966, pruned_loss=0.01218, audio_tagging_loss=0.00838, over 3048970.95 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:42:48,289 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.33 vs. limit=22.5 2023-11-27 01:42:53,581 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3668853.3333333335, ans=0.125 2023-11-27 01:43:08,926 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 550350 2023-11-27 01:43:09,159 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 01:43:41,742 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 9300, loss[loss=0.0608, simple_loss=0.08567, pruned_loss=0.008872, audio_tagging_loss=0.009098, over 15354.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.0897, pruned_loss=0.01214, audio_tagging_loss=0.008397, over 3056875.33 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:43:45,719 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3669186.6666666665, ans=0.125 2023-11-27 01:43:48,787 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3669186.6666666665, ans=0.125 2023-11-27 01:44:01,833 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.974e+01 8.933e+01 9.435e+01 1.011e+02 1.310e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-27 01:44:04,075 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 550400 2023-11-27 01:44:12,224 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.16 vs. limit=22.5 2023-11-27 01:44:17,342 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.19 vs. limit=10.0 2023-11-27 01:44:18,408 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.74 vs. limit=6.0 2023-11-27 01:44:27,147 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3669453.3333333335, ans=0.0 2023-11-27 01:44:38,041 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 9350, loss[loss=0.04518, simple_loss=0.0535, pruned_loss=0.009206, audio_tagging_loss=0.009219, over 15135.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08979, pruned_loss=0.01193, audio_tagging_loss=0.00843, over 3057430.04 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:44:38,230 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3669520.0, ans=0.1 2023-11-27 01:44:59,438 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 550450 2023-11-27 01:45:14,482 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.55 vs. limit=15.0 2023-11-27 01:45:16,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3669720.0, ans=0.0 2023-11-27 01:45:17,694 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3669720.0, ans=0.0 2023-11-27 01:45:28,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3669786.6666666665, ans=0.125 2023-11-27 01:45:33,197 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 9400, loss[loss=0.06757, simple_loss=0.09273, pruned_loss=0.01446, audio_tagging_loss=0.006742, over 15628.00 frames. ], tot_loss[loss=0.06494, simple_loss=0.08896, pruned_loss=0.01186, audio_tagging_loss=0.008609, over 3046871.41 frames. ], batch size: 59, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:45:37,763 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3669853.3333333335, ans=0.125 2023-11-27 01:45:38,170 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.71 vs. limit=10.0 2023-11-27 01:45:39,762 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3669853.3333333335, ans=0.1 2023-11-27 01:45:54,113 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.770e+01 8.852e+01 9.637e+01 1.052e+02 1.350e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-27 01:45:55,383 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3669986.6666666665, ans=0.125 2023-11-27 01:45:56,321 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 550500 2023-11-27 01:46:21,251 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3670120.0, ans=0.1 2023-11-27 01:46:23,390 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3670120.0, ans=0.2 2023-11-27 01:46:25,312 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 01:46:25,574 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3670120.0, ans=0.0 2023-11-27 01:46:29,034 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 9450, loss[loss=0.05354, simple_loss=0.06421, pruned_loss=0.01103, audio_tagging_loss=0.0104, over 15335.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.08858, pruned_loss=0.01179, audio_tagging_loss=0.008733, over 3051336.52 frames. ], batch size: 61, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:46:38,799 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3670186.6666666665, ans=0.125 2023-11-27 01:46:45,770 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3670253.3333333335, ans=0.0 2023-11-27 01:46:51,968 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 550550 2023-11-27 01:46:55,347 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3670320.0, ans=0.0 2023-11-27 01:47:02,635 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3670386.6666666665, ans=0.125 2023-11-27 01:47:25,737 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 9500, loss[loss=0.07671, simple_loss=0.1149, pruned_loss=0.01253, audio_tagging_loss=0.006735, over 15331.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.08878, pruned_loss=0.01186, audio_tagging_loss=0.008788, over 3045079.95 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:47:33,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3670520.0, ans=0.2 2023-11-27 01:47:42,914 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3670586.6666666665, ans=0.125 2023-11-27 01:47:44,653 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.655e+01 9.026e+01 9.482e+01 1.013e+02 1.263e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-27 01:47:46,838 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 550600 2023-11-27 01:48:03,373 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 01:48:17,118 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.44 vs. limit=15.0 2023-11-27 01:48:20,137 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.50 vs. limit=10.0 2023-11-27 01:48:20,640 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 9550, loss[loss=0.06519, simple_loss=0.09039, pruned_loss=0.009868, audio_tagging_loss=0.01012, over 15172.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.08825, pruned_loss=0.01179, audio_tagging_loss=0.008951, over 3047551.41 frames. ], batch size: 59, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:48:31,451 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3670920.0, ans=0.2 2023-11-27 01:48:39,418 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3670920.0, ans=0.0 2023-11-27 01:48:43,566 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 550650 2023-11-27 01:48:45,777 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3670986.6666666665, ans=0.0 2023-11-27 01:49:08,669 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3671120.0, ans=0.1 2023-11-27 01:49:15,919 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 9600, loss[loss=0.07905, simple_loss=0.1133, pruned_loss=0.01604, audio_tagging_loss=0.00635, over 16161.00 frames. ], tot_loss[loss=0.0649, simple_loss=0.0882, pruned_loss=0.01186, audio_tagging_loss=0.008944, over 3046270.39 frames. ], batch size: 59, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 01:49:21,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3671186.6666666665, ans=0.125 2023-11-27 01:49:37,136 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.585e+01 8.787e+01 9.468e+01 1.030e+02 1.227e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-27 01:49:39,340 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 550700 2023-11-27 01:49:42,828 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.23 vs. limit=15.0 2023-11-27 01:49:54,413 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3671386.6666666665, ans=0.125 2023-11-27 01:49:54,593 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3671386.6666666665, ans=0.0 2023-11-27 01:50:12,745 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 9650, loss[loss=0.0572, simple_loss=0.07326, pruned_loss=0.01067, audio_tagging_loss=0.009901, over 15233.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.08812, pruned_loss=0.01209, audio_tagging_loss=0.008921, over 3039356.82 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 01:50:16,091 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.48 vs. limit=15.0 2023-11-27 01:50:32,581 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3671586.6666666665, ans=0.2 2023-11-27 01:50:33,774 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.04 vs. limit=6.0 2023-11-27 01:50:34,488 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 550750 2023-11-27 01:50:46,818 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3671720.0, ans=0.0 2023-11-27 01:51:08,166 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 9700, loss[loss=0.08441, simple_loss=0.1082, pruned_loss=0.02138, audio_tagging_loss=0.008922, over 15921.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08909, pruned_loss=0.01196, audio_tagging_loss=0.008748, over 3043150.71 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:51:14,711 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3671853.3333333335, ans=0.2 2023-11-27 01:51:17,084 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.20 vs. limit=22.5 2023-11-27 01:51:17,922 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3671920.0, ans=0.0 2023-11-27 01:51:22,486 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.78 vs. limit=15.0 2023-11-27 01:51:23,410 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 01:51:28,970 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.432e+01 8.991e+01 9.696e+01 1.056e+02 1.366e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-27 01:51:30,140 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 550800 2023-11-27 01:51:35,366 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3671986.6666666665, ans=0.125 2023-11-27 01:51:57,472 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3672120.0, ans=0.125 2023-11-27 01:52:03,770 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 9750, loss[loss=0.07579, simple_loss=0.1058, pruned_loss=0.01682, audio_tagging_loss=0.006069, over 15177.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.08934, pruned_loss=0.01199, audio_tagging_loss=0.008652, over 3042423.81 frames. ], batch size: 53, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:52:14,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3672253.3333333335, ans=0.0 2023-11-27 01:52:27,146 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 550850 2023-11-27 01:52:44,174 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3672386.6666666665, ans=0.5 2023-11-27 01:52:46,274 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3672386.6666666665, ans=0.125 2023-11-27 01:52:59,843 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 9800, loss[loss=0.05138, simple_loss=0.06953, pruned_loss=0.008591, audio_tagging_loss=0.008028, over 14255.00 frames. ], tot_loss[loss=0.06473, simple_loss=0.08882, pruned_loss=0.01176, audio_tagging_loss=0.008564, over 3039327.42 frames. ], batch size: 53, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:53:20,920 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.534e+01 8.968e+01 9.826e+01 1.047e+02 1.265e+02, threshold=1.965e+02, percent-clipped=0.0 2023-11-27 01:53:22,068 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 550900 2023-11-27 01:53:31,792 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3672720.0, ans=0.125 2023-11-27 01:53:47,915 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 01:53:55,683 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 9850, loss[loss=0.07982, simple_loss=0.117, pruned_loss=0.01429, audio_tagging_loss=0.007041, over 15200.00 frames. ], tot_loss[loss=0.066, simple_loss=0.09079, pruned_loss=0.01215, audio_tagging_loss=0.008451, over 3043360.77 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 8.0 2023-11-27 01:53:55,958 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3672853.3333333335, ans=0.125 2023-11-27 01:53:58,002 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3672853.3333333335, ans=0.125 2023-11-27 01:54:06,584 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3672920.0, ans=0.1 2023-11-27 01:54:13,012 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3672920.0, ans=0.1 2023-11-27 01:54:17,508 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 550950 2023-11-27 01:54:31,457 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3673053.3333333335, ans=0.0 2023-11-27 01:54:32,889 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3673053.3333333335, ans=0.2 2023-11-27 01:54:34,161 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.37 vs. limit=15.0 2023-11-27 01:54:47,245 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.36 vs. limit=15.0 2023-11-27 01:54:50,665 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 9900, loss[loss=0.06984, simple_loss=0.1023, pruned_loss=0.01173, audio_tagging_loss=0.006979, over 15322.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.09108, pruned_loss=0.01221, audio_tagging_loss=0.008312, over 3047814.18 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 8.0 2023-11-27 01:54:50,853 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3673186.6666666665, ans=0.1 2023-11-27 01:54:55,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3673186.6666666665, ans=0.0 2023-11-27 01:54:56,219 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3673186.6666666665, ans=0.125 2023-11-27 01:55:06,164 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.46 vs. limit=12.0 2023-11-27 01:55:10,574 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=3673253.3333333335, ans=0.5 2023-11-27 01:55:13,595 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.560e+01 8.996e+01 9.617e+01 1.030e+02 1.836e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-27 01:55:13,727 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 551000 2023-11-27 01:55:21,231 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3673320.0, ans=0.0 2023-11-27 01:55:24,940 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.71 vs. limit=6.0 2023-11-27 01:55:26,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3673386.6666666665, ans=0.0 2023-11-27 01:55:33,724 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3673386.6666666665, ans=0.125 2023-11-27 01:55:36,908 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3673453.3333333335, ans=0.1 2023-11-27 01:55:47,251 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 9950, loss[loss=0.07724, simple_loss=0.1095, pruned_loss=0.01494, audio_tagging_loss=0.007556, over 15703.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.0911, pruned_loss=0.01223, audio_tagging_loss=0.008346, over 3047678.06 frames. ], batch size: 60, lr: 1.46e-03, grad_scale: 8.0 2023-11-27 01:56:09,644 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 551050 2023-11-27 01:56:09,739 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3673653.3333333335, ans=0.0 2023-11-27 01:56:34,498 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.28 vs. limit=15.0 2023-11-27 01:56:42,970 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 10000, loss[loss=0.05973, simple_loss=0.07895, pruned_loss=0.01218, audio_tagging_loss=0.00807, over 15049.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.09015, pruned_loss=0.01222, audio_tagging_loss=0.008364, over 3043152.56 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:56:43,518 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.05 vs. limit=22.5 2023-11-27 01:56:46,954 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3673853.3333333335, ans=0.2 2023-11-27 01:56:49,088 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3673853.3333333335, ans=0.0 2023-11-27 01:56:52,676 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.14 vs. limit=15.0 2023-11-27 01:57:02,339 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.05 vs. limit=15.0 2023-11-27 01:57:05,441 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.059e+01 8.920e+01 9.463e+01 1.026e+02 1.255e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-27 01:57:05,541 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 551100 2023-11-27 01:57:12,651 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3673986.6666666665, ans=0.125 2023-11-27 01:57:19,861 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.46 vs. limit=15.0 2023-11-27 01:57:22,585 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3674053.3333333335, ans=0.0 2023-11-27 01:57:24,690 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 01:57:26,096 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.55 vs. limit=15.0 2023-11-27 01:57:31,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3674120.0, ans=0.1 2023-11-27 01:57:38,694 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 10050, loss[loss=0.072, simple_loss=0.08231, pruned_loss=0.01692, audio_tagging_loss=0.01392, over 13734.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08979, pruned_loss=0.01212, audio_tagging_loss=0.008452, over 3039115.84 frames. ], batch size: 52, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:58:01,644 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 551150 2023-11-27 01:58:03,229 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.39 vs. limit=15.0 2023-11-27 01:58:11,742 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.25 vs. limit=15.0 2023-11-27 01:58:12,851 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3674386.6666666665, ans=0.0 2023-11-27 01:58:22,232 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3674453.3333333335, ans=0.0 2023-11-27 01:58:24,269 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3674453.3333333335, ans=0.015 2023-11-27 01:58:32,526 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.94 vs. limit=15.0 2023-11-27 01:58:34,212 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 10100, loss[loss=0.06801, simple_loss=0.09323, pruned_loss=0.01506, audio_tagging_loss=0.006335, over 14779.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.09004, pruned_loss=0.01207, audio_tagging_loss=0.008478, over 3040322.65 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:58:44,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3674520.0, ans=0.0 2023-11-27 01:58:56,273 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3674653.3333333335, ans=0.0 2023-11-27 01:58:57,102 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.829e+01 8.911e+01 9.483e+01 1.012e+02 1.276e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-27 01:58:57,199 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 551200 2023-11-27 01:59:04,394 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.26 vs. limit=15.0 2023-11-27 01:59:18,139 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 01:59:30,771 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 10150, loss[loss=0.05096, simple_loss=0.06254, pruned_loss=0.008166, audio_tagging_loss=0.01152, over 14934.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.09015, pruned_loss=0.01219, audio_tagging_loss=0.00859, over 3051027.17 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 01:59:35,307 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3674853.3333333335, ans=0.125 2023-11-27 01:59:35,432 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3674853.3333333335, ans=0.125 2023-11-27 01:59:48,226 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.30 vs. limit=15.0 2023-11-27 01:59:52,888 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 551250 2023-11-27 01:59:54,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3674986.6666666665, ans=0.125 2023-11-27 01:59:55,504 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 02:00:26,903 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 10200, loss[loss=0.06699, simple_loss=0.09439, pruned_loss=0.009889, audio_tagging_loss=0.009911, over 15315.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08942, pruned_loss=0.01202, audio_tagging_loss=0.00875, over 3053908.76 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:00:34,739 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten.whitening_limit, batch_count=3675186.6666666665, ans=15.0 2023-11-27 02:00:39,286 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=3675253.3333333335, ans=0.05 2023-11-27 02:00:45,172 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3675253.3333333335, ans=0.125 2023-11-27 02:00:45,954 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 02:00:49,094 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.594e+01 8.864e+01 9.560e+01 1.043e+02 1.445e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-27 02:00:49,188 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 551300 2023-11-27 02:01:10,673 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3675453.3333333335, ans=0.125 2023-11-27 02:01:12,811 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3675453.3333333335, ans=0.0 2023-11-27 02:01:22,518 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 10250, loss[loss=0.06697, simple_loss=0.09122, pruned_loss=0.01296, audio_tagging_loss=0.008393, over 15301.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08928, pruned_loss=0.012, audio_tagging_loss=0.008875, over 3055203.84 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:01:28,667 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3675520.0, ans=0.0 2023-11-27 02:01:30,169 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.84 vs. limit=15.0 2023-11-27 02:01:30,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3675520.0, ans=0.035 2023-11-27 02:01:44,914 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 551350 2023-11-27 02:01:52,926 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3675653.3333333335, ans=0.125 2023-11-27 02:01:53,964 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3675653.3333333335, ans=0.125 2023-11-27 02:02:01,789 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.43 vs. limit=15.0 2023-11-27 02:02:02,767 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.66 vs. limit=15.0 2023-11-27 02:02:07,433 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.32 vs. limit=22.5 2023-11-27 02:02:18,466 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 10300, loss[loss=0.07151, simple_loss=0.09392, pruned_loss=0.01459, audio_tagging_loss=0.00996, over 15024.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08835, pruned_loss=0.01206, audio_tagging_loss=0.008892, over 3053132.39 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:02:18,907 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.57 vs. limit=12.0 2023-11-27 02:02:23,589 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.09 vs. limit=8.0 2023-11-27 02:02:25,135 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3675853.3333333335, ans=0.05 2023-11-27 02:02:27,217 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3675853.3333333335, ans=0.1 2023-11-27 02:02:32,547 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3675920.0, ans=0.125 2023-11-27 02:02:32,999 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.87 vs. limit=6.0 2023-11-27 02:02:40,220 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.373e+01 8.971e+01 9.641e+01 1.026e+02 1.769e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-27 02:02:40,310 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 551400 2023-11-27 02:02:41,469 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3675986.6666666665, ans=0.125 2023-11-27 02:02:55,720 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3676053.3333333335, ans=0.125 2023-11-27 02:03:13,886 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 10350, loss[loss=0.07554, simple_loss=0.1063, pruned_loss=0.01242, audio_tagging_loss=0.009976, over 15108.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08823, pruned_loss=0.01201, audio_tagging_loss=0.009015, over 3053309.83 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:03:18,676 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.30 vs. limit=22.5 2023-11-27 02:03:26,233 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3676253.3333333335, ans=0.1 2023-11-27 02:03:36,794 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 551450 2023-11-27 02:03:41,212 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3676320.0, ans=0.125 2023-11-27 02:03:42,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3676320.0, ans=0.0 2023-11-27 02:03:57,511 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3676453.3333333335, ans=0.0 2023-11-27 02:04:09,399 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 10400, loss[loss=0.0564, simple_loss=0.0812, pruned_loss=0.008742, audio_tagging_loss=0.007061, over 16208.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08821, pruned_loss=0.0119, audio_tagging_loss=0.008987, over 3054906.74 frames. ], batch size: 63, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 02:04:19,054 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3676520.0, ans=0.0 2023-11-27 02:04:32,025 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.446e+01 9.109e+01 9.691e+01 1.057e+02 2.130e+02, threshold=1.938e+02, percent-clipped=1.0 2023-11-27 02:04:32,122 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 551500 2023-11-27 02:04:50,116 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.40 vs. limit=22.5 2023-11-27 02:04:56,372 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3676786.6666666665, ans=0.1 2023-11-27 02:04:58,488 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3676786.6666666665, ans=0.125 2023-11-27 02:05:05,157 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 10450, loss[loss=0.06177, simple_loss=0.08942, pruned_loss=0.01233, audio_tagging_loss=0.004726, over 14083.00 frames. ], tot_loss[loss=0.06489, simple_loss=0.08809, pruned_loss=0.0118, audio_tagging_loss=0.00904, over 3051555.37 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 02:05:15,520 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.80 vs. limit=15.0 2023-11-27 02:05:25,243 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.03 vs. limit=12.0 2023-11-27 02:05:26,716 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 551550 2023-11-27 02:05:29,051 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3676986.6666666665, ans=0.125 2023-11-27 02:05:31,652 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3676986.6666666665, ans=0.1 2023-11-27 02:05:32,678 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 02:05:35,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3676986.6666666665, ans=0.1 2023-11-27 02:05:57,698 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3677120.0, ans=0.1 2023-11-27 02:06:00,586 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 10500, loss[loss=0.09095, simple_loss=0.1371, pruned_loss=0.01658, audio_tagging_loss=0.005828, over 16611.00 frames. ], tot_loss[loss=0.06483, simple_loss=0.08818, pruned_loss=0.01183, audio_tagging_loss=0.008907, over 3057311.11 frames. ], batch size: 61, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 02:06:01,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=3677186.6666666665, ans=6.0 2023-11-27 02:06:02,884 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3677186.6666666665, ans=0.0 2023-11-27 02:06:22,969 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.664e+01 9.041e+01 9.594e+01 1.033e+02 2.053e+02, threshold=1.919e+02, percent-clipped=1.0 2023-11-27 02:06:23,059 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 551600 2023-11-27 02:06:48,856 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3677453.3333333335, ans=0.125 2023-11-27 02:06:56,000 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 10550, loss[loss=0.0661, simple_loss=0.08833, pruned_loss=0.01466, audio_tagging_loss=0.007273, over 15590.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08929, pruned_loss=0.01191, audio_tagging_loss=0.008719, over 3056376.36 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 02:07:00,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3677520.0, ans=0.1 2023-11-27 02:07:10,690 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.41 vs. limit=15.0 2023-11-27 02:07:19,451 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 551650 2023-11-27 02:07:20,737 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3677653.3333333335, ans=0.0 2023-11-27 02:07:20,845 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3677653.3333333335, ans=0.2 2023-11-27 02:07:44,114 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3677786.6666666665, ans=0.125 2023-11-27 02:07:52,991 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 10600, loss[loss=0.06193, simple_loss=0.09603, pruned_loss=0.008319, audio_tagging_loss=0.005591, over 15309.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.08922, pruned_loss=0.01182, audio_tagging_loss=0.008653, over 3055375.39 frames. ], batch size: 62, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 02:07:59,375 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.70 vs. limit=15.0 2023-11-27 02:08:06,606 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3677920.0, ans=0.125 2023-11-27 02:08:07,701 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3677920.0, ans=0.0 2023-11-27 02:08:14,800 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.458e+01 8.957e+01 9.483e+01 1.042e+02 1.260e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-27 02:08:14,896 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 551700 2023-11-27 02:08:27,656 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3678053.3333333335, ans=0.0 2023-11-27 02:08:31,682 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=3678053.3333333335, ans=15.0 2023-11-27 02:08:43,383 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3678120.0, ans=0.125 2023-11-27 02:08:48,509 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 10650, loss[loss=0.08337, simple_loss=0.1158, pruned_loss=0.01938, audio_tagging_loss=0.006106, over 14043.00 frames. ], tot_loss[loss=0.06503, simple_loss=0.08901, pruned_loss=0.01187, audio_tagging_loss=0.008653, over 3052915.08 frames. ], batch size: 53, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:09:09,303 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3678320.0, ans=0.125 2023-11-27 02:09:10,284 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 551750 2023-11-27 02:09:13,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3678320.0, ans=0.125 2023-11-27 02:09:20,291 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.00 vs. limit=22.5 2023-11-27 02:09:26,603 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3678386.6666666665, ans=0.1 2023-11-27 02:09:26,840 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.07 vs. limit=22.5 2023-11-27 02:09:30,651 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3678386.6666666665, ans=0.125 2023-11-27 02:09:43,015 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 10700, loss[loss=0.07398, simple_loss=0.09976, pruned_loss=0.01697, audio_tagging_loss=0.007135, over 15340.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.09084, pruned_loss=0.0122, audio_tagging_loss=0.008587, over 3049272.94 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:10:00,084 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.83 vs. limit=6.0 2023-11-27 02:10:06,384 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 551800 2023-11-27 02:10:06,493 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3678653.3333333335, ans=0.125 2023-11-27 02:10:07,316 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.732e+01 8.981e+01 9.456e+01 1.028e+02 1.264e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-27 02:10:09,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3678653.3333333335, ans=0.125 2023-11-27 02:10:18,757 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=3678720.0, ans=15.0 2023-11-27 02:10:34,246 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.98 vs. limit=15.0 2023-11-27 02:10:40,252 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 10750, loss[loss=0.06998, simple_loss=0.09796, pruned_loss=0.01285, audio_tagging_loss=0.008142, over 16395.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.09052, pruned_loss=0.01204, audio_tagging_loss=0.008592, over 3049100.13 frames. ], batch size: 60, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:10:41,774 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.06 vs. limit=12.0 2023-11-27 02:11:00,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3678920.0, ans=0.1 2023-11-27 02:11:01,942 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 551850 2023-11-27 02:11:11,660 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3679053.3333333335, ans=0.0 2023-11-27 02:11:18,496 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 02:11:25,395 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3679120.0, ans=0.0 2023-11-27 02:11:35,276 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 10800, loss[loss=0.06899, simple_loss=0.09557, pruned_loss=0.01208, audio_tagging_loss=0.009122, over 15354.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.09077, pruned_loss=0.01201, audio_tagging_loss=0.008408, over 3048396.17 frames. ], batch size: 59, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:11:49,367 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3679253.3333333335, ans=0.125 2023-11-27 02:11:57,091 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 551900 2023-11-27 02:11:59,124 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.497e+01 8.776e+01 9.602e+01 1.034e+02 1.420e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-27 02:12:10,801 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.27 vs. limit=15.0 2023-11-27 02:12:30,732 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 10850, loss[loss=0.0656, simple_loss=0.08828, pruned_loss=0.01153, audio_tagging_loss=0.009922, over 15325.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.09054, pruned_loss=0.01214, audio_tagging_loss=0.008485, over 3052550.43 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:12:40,955 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3679586.6666666665, ans=0.125 2023-11-27 02:12:54,127 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 551950 2023-11-27 02:13:02,822 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3679653.3333333335, ans=0.125 2023-11-27 02:13:20,988 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 02:13:25,518 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3679853.3333333335, ans=0.2 2023-11-27 02:13:26,286 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 10900, loss[loss=0.0448, simple_loss=0.05441, pruned_loss=0.005532, audio_tagging_loss=0.01206, over 16721.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.09003, pruned_loss=0.01214, audio_tagging_loss=0.008562, over 3052174.39 frames. ], batch size: 65, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:13:49,354 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 552000 2023-11-27 02:13:50,635 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3679986.6666666665, ans=0.1 2023-11-27 02:13:53,604 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.440e+01 8.930e+01 9.586e+01 1.062e+02 1.591e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-27 02:13:55,902 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3679986.6666666665, ans=10.0 2023-11-27 02:14:04,416 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3680053.3333333335, ans=0.0 2023-11-27 02:14:08,194 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3680053.3333333335, ans=0.125 2023-11-27 02:14:08,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3680053.3333333335, ans=0.05 2023-11-27 02:14:08,217 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3680053.3333333335, ans=0.1 2023-11-27 02:14:16,771 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3680120.0, ans=0.125 2023-11-27 02:14:20,334 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3680120.0, ans=0.125 2023-11-27 02:14:23,606 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3680120.0, ans=0.125 2023-11-27 02:14:25,458 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 10950, loss[loss=0.05806, simple_loss=0.07678, pruned_loss=0.01032, audio_tagging_loss=0.009354, over 15390.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08981, pruned_loss=0.01218, audio_tagging_loss=0.008593, over 3053368.25 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:14:30,825 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3680186.6666666665, ans=0.2 2023-11-27 02:14:31,927 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3680186.6666666665, ans=0.2 2023-11-27 02:14:46,787 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 552050 2023-11-27 02:14:46,937 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3680320.0, ans=0.0 2023-11-27 02:14:49,601 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3680320.0, ans=0.125 2023-11-27 02:14:59,302 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.12 vs. limit=15.0 2023-11-27 02:15:00,039 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3680386.6666666665, ans=0.125 2023-11-27 02:15:00,231 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3680386.6666666665, ans=0.0 2023-11-27 02:15:06,019 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3680386.6666666665, ans=0.2 2023-11-27 02:15:13,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3680453.3333333335, ans=0.125 2023-11-27 02:15:20,768 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 11000, loss[loss=0.08104, simple_loss=0.1153, pruned_loss=0.01729, audio_tagging_loss=0.006118, over 15819.00 frames. ], tot_loss[loss=0.06498, simple_loss=0.08847, pruned_loss=0.01206, audio_tagging_loss=0.008685, over 3047223.14 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:15:27,168 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 02:15:43,601 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 552100 2023-11-27 02:15:46,144 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.719e+01 8.927e+01 9.605e+01 1.045e+02 1.330e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-27 02:15:47,529 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3680653.3333333335, ans=0.125 2023-11-27 02:15:52,745 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3680653.3333333335, ans=0.5 2023-11-27 02:16:03,313 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3680720.0, ans=0.125 2023-11-27 02:16:16,361 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 11050, loss[loss=0.072, simple_loss=0.09841, pruned_loss=0.01551, audio_tagging_loss=0.007284, over 14520.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08854, pruned_loss=0.01198, audio_tagging_loss=0.008737, over 3047282.19 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:16:31,091 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.13 vs. limit=15.0 2023-11-27 02:16:39,090 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 552150 2023-11-27 02:16:40,269 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3680986.6666666665, ans=0.0 2023-11-27 02:16:40,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3680986.6666666665, ans=0.125 2023-11-27 02:16:41,429 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3680986.6666666665, ans=0.125 2023-11-27 02:16:48,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3681053.3333333335, ans=0.125 2023-11-27 02:16:57,837 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3681053.3333333335, ans=0.125 2023-11-27 02:17:13,203 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 11100, loss[loss=0.0945, simple_loss=0.129, pruned_loss=0.02202, audio_tagging_loss=0.007985, over 17063.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08909, pruned_loss=0.01199, audio_tagging_loss=0.00883, over 3046367.25 frames. ], batch size: 62, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:17:31,519 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3681253.3333333335, ans=0.2 2023-11-27 02:17:34,492 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 552200 2023-11-27 02:17:36,805 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.717e+01 8.882e+01 9.437e+01 1.044e+02 2.360e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-27 02:17:45,587 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3681386.6666666665, ans=0.0 2023-11-27 02:17:57,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3681453.3333333335, ans=0.0 2023-11-27 02:18:00,832 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3681453.3333333335, ans=0.125 2023-11-27 02:18:08,111 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 11150, loss[loss=0.08886, simple_loss=0.1229, pruned_loss=0.02062, audio_tagging_loss=0.006769, over 14567.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.08991, pruned_loss=0.01208, audio_tagging_loss=0.008852, over 3050635.26 frames. ], batch size: 53, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:18:15,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3681520.0, ans=0.125 2023-11-27 02:18:21,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3681586.6666666665, ans=0.2 2023-11-27 02:18:27,873 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.95 vs. limit=22.5 2023-11-27 02:18:30,418 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 552250 2023-11-27 02:18:31,566 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3681653.3333333335, ans=0.125 2023-11-27 02:18:31,634 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3681653.3333333335, ans=0.0 2023-11-27 02:18:31,681 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3681653.3333333335, ans=0.04949747468305833 2023-11-27 02:18:46,421 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3681720.0, ans=0.2 2023-11-27 02:19:03,617 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 11200, loss[loss=0.0679, simple_loss=0.08649, pruned_loss=0.01448, audio_tagging_loss=0.01017, over 15584.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08879, pruned_loss=0.01207, audio_tagging_loss=0.008977, over 3052864.30 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 02:19:03,752 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3681853.3333333335, ans=0.0 2023-11-27 02:19:08,780 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3681853.3333333335, ans=0.1 2023-11-27 02:19:14,082 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3681920.0, ans=0.0 2023-11-27 02:19:23,553 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3681920.0, ans=0.1 2023-11-27 02:19:26,613 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 552300 2023-11-27 02:19:26,826 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3681986.6666666665, ans=0.1 2023-11-27 02:19:28,647 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.316e+01 8.953e+01 9.456e+01 1.019e+02 1.233e+02, threshold=1.891e+02, percent-clipped=1.0 2023-11-27 02:19:38,467 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3682053.3333333335, ans=0.09899494936611666 2023-11-27 02:19:52,148 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3682120.0, ans=0.1 2023-11-27 02:19:57,696 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.06 vs. limit=12.0 2023-11-27 02:19:58,536 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3682186.6666666665, ans=0.125 2023-11-27 02:19:59,857 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 11250, loss[loss=0.05335, simple_loss=0.07109, pruned_loss=0.00909, audio_tagging_loss=0.008717, over 14369.00 frames. ], tot_loss[loss=0.06474, simple_loss=0.08765, pruned_loss=0.01189, audio_tagging_loss=0.009031, over 3046931.71 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:20:05,935 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3682186.6666666665, ans=0.2 2023-11-27 02:20:21,697 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 552350 2023-11-27 02:20:31,895 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3682386.6666666665, ans=0.125 2023-11-27 02:20:36,587 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3682386.6666666665, ans=0.125 2023-11-27 02:20:43,595 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3682453.3333333335, ans=0.0 2023-11-27 02:20:50,724 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.25 vs. limit=15.0 2023-11-27 02:20:55,480 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 11300, loss[loss=0.0777, simple_loss=0.1156, pruned_loss=0.01367, audio_tagging_loss=0.00625, over 14833.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.08863, pruned_loss=0.01195, audio_tagging_loss=0.008828, over 3044469.55 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:21:01,130 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.11 vs. limit=15.0 2023-11-27 02:21:12,144 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3682586.6666666665, ans=0.125 2023-11-27 02:21:17,700 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 552400 2023-11-27 02:21:17,886 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3682653.3333333335, ans=0.125 2023-11-27 02:21:21,101 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.046e+01 9.061e+01 9.736e+01 1.047e+02 2.003e+02, threshold=1.947e+02, percent-clipped=1.0 2023-11-27 02:21:21,356 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3682653.3333333335, ans=0.125 2023-11-27 02:21:50,791 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 11350, loss[loss=0.06154, simple_loss=0.08203, pruned_loss=0.01129, audio_tagging_loss=0.009232, over 15438.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.08887, pruned_loss=0.0119, audio_tagging_loss=0.008705, over 3046569.51 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:21:56,682 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3682853.3333333335, ans=0.125 2023-11-27 02:21:59,687 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.50 vs. limit=15.0 2023-11-27 02:22:06,692 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3682920.0, ans=0.125 2023-11-27 02:22:10,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3682920.0, ans=0.0 2023-11-27 02:22:13,339 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 552450 2023-11-27 02:22:32,813 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3683053.3333333335, ans=0.0 2023-11-27 02:22:44,387 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 02:22:46,338 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 11400, loss[loss=0.05926, simple_loss=0.07732, pruned_loss=0.01139, audio_tagging_loss=0.009213, over 14565.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08888, pruned_loss=0.01191, audio_tagging_loss=0.008667, over 3040877.24 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:22:50,667 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.96 vs. limit=12.0 2023-11-27 02:22:59,036 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.49 vs. limit=10.0 2023-11-27 02:23:04,604 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3683253.3333333335, ans=0.125 2023-11-27 02:23:08,777 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 552500 2023-11-27 02:23:10,298 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.17 vs. limit=15.0 2023-11-27 02:23:11,782 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.783e+01 9.008e+01 9.574e+01 1.020e+02 1.271e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-27 02:23:14,145 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3683320.0, ans=0.125 2023-11-27 02:23:28,057 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.08 vs. limit=15.0 2023-11-27 02:23:30,057 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3683453.3333333335, ans=0.0 2023-11-27 02:23:34,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3683453.3333333335, ans=0.2 2023-11-27 02:23:41,792 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 11450, loss[loss=0.0453, simple_loss=0.05629, pruned_loss=0.007298, audio_tagging_loss=0.009854, over 15538.00 frames. ], tot_loss[loss=0.06473, simple_loss=0.08839, pruned_loss=0.01189, audio_tagging_loss=0.008646, over 3040962.02 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:23:49,761 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3683520.0, ans=0.125 2023-11-27 02:24:03,999 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 552550 2023-11-27 02:24:05,283 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3683653.3333333335, ans=0.125 2023-11-27 02:24:27,120 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.56 vs. limit=12.0 2023-11-27 02:24:29,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3683786.6666666665, ans=0.1 2023-11-27 02:24:31,564 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.11 vs. limit=15.0 2023-11-27 02:24:37,127 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 11500, loss[loss=0.06462, simple_loss=0.08737, pruned_loss=0.01385, audio_tagging_loss=0.007081, over 15953.00 frames. ], tot_loss[loss=0.06466, simple_loss=0.08828, pruned_loss=0.01192, audio_tagging_loss=0.008608, over 3045280.29 frames. ], batch size: 60, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:24:47,244 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3683920.0, ans=0.125 2023-11-27 02:24:51,060 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3683920.0, ans=0.125 2023-11-27 02:24:53,045 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3683920.0, ans=0.0 2023-11-27 02:24:57,901 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3683920.0, ans=0.125 2023-11-27 02:24:59,788 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 552600 2023-11-27 02:25:03,200 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.546e+01 8.865e+01 9.337e+01 9.934e+01 1.227e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-27 02:25:15,529 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3684053.3333333335, ans=0.2 2023-11-27 02:25:22,024 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3684120.0, ans=0.125 2023-11-27 02:25:29,720 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3684120.0, ans=0.0 2023-11-27 02:25:33,361 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 11550, loss[loss=0.04896, simple_loss=0.07048, pruned_loss=0.007499, audio_tagging_loss=0.006219, over 14842.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.08878, pruned_loss=0.01207, audio_tagging_loss=0.008586, over 3044708.96 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 16.0 2023-11-27 02:25:41,419 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3684186.6666666665, ans=0.125 2023-11-27 02:25:55,504 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 552650 2023-11-27 02:26:05,040 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 02:26:17,908 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3684453.3333333335, ans=0.0 2023-11-27 02:26:28,799 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 11600, loss[loss=0.06269, simple_loss=0.07987, pruned_loss=0.01348, audio_tagging_loss=0.00928, over 14515.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08945, pruned_loss=0.01218, audio_tagging_loss=0.008555, over 3043633.55 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 02:26:33,454 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.16 vs. limit=15.0 2023-11-27 02:26:37,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3684520.0, ans=0.0 2023-11-27 02:26:50,921 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 552700 2023-11-27 02:26:52,206 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3684653.3333333335, ans=0.125 2023-11-27 02:26:53,973 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.249e+01 8.971e+01 9.757e+01 1.054e+02 1.398e+02, threshold=1.951e+02, percent-clipped=0.0 2023-11-27 02:26:55,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3684653.3333333335, ans=0.0 2023-11-27 02:26:56,354 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.84 vs. limit=15.0 2023-11-27 02:27:14,207 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3684786.6666666665, ans=0.0 2023-11-27 02:27:18,461 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3684786.6666666665, ans=0.0 2023-11-27 02:27:20,622 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3684786.6666666665, ans=0.125 2023-11-27 02:27:24,036 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 11650, loss[loss=0.07817, simple_loss=0.1087, pruned_loss=0.01691, audio_tagging_loss=0.006935, over 15006.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08908, pruned_loss=0.012, audio_tagging_loss=0.008612, over 3042661.50 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 02:27:29,519 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3684853.3333333335, ans=0.2 2023-11-27 02:27:42,064 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3684920.0, ans=0.05 2023-11-27 02:27:46,616 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 552750 2023-11-27 02:27:56,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3685053.3333333335, ans=0.0 2023-11-27 02:27:57,291 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3685053.3333333335, ans=0.0 2023-11-27 02:28:19,274 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 11700, loss[loss=0.05138, simple_loss=0.06578, pruned_loss=0.009113, audio_tagging_loss=0.00937, over 15254.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08858, pruned_loss=0.01214, audio_tagging_loss=0.008719, over 3044835.70 frames. ], batch size: 59, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 02:28:21,144 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3685186.6666666665, ans=0.1 2023-11-27 02:28:32,310 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3685253.3333333335, ans=0.125 2023-11-27 02:28:41,476 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 552800 2023-11-27 02:28:44,819 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.511e+01 8.894e+01 9.455e+01 1.028e+02 1.281e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-27 02:28:45,114 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3685320.0, ans=0.0 2023-11-27 02:28:47,163 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3685320.0, ans=0.0 2023-11-27 02:28:57,609 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3685386.6666666665, ans=0.015 2023-11-27 02:29:00,073 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.41 vs. limit=10.0 2023-11-27 02:29:07,910 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3685453.3333333335, ans=0.0 2023-11-27 02:29:12,677 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3685453.3333333335, ans=0.125 2023-11-27 02:29:15,666 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 11750, loss[loss=0.07686, simple_loss=0.1005, pruned_loss=0.0176, audio_tagging_loss=0.00902, over 15056.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08959, pruned_loss=0.01228, audio_tagging_loss=0.008659, over 3048867.72 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 02:29:23,231 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3685520.0, ans=0.125 2023-11-27 02:29:25,744 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.56 vs. limit=10.0 2023-11-27 02:29:26,599 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3685586.6666666665, ans=0.125 2023-11-27 02:29:29,928 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.54 vs. limit=10.0 2023-11-27 02:29:34,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3685586.6666666665, ans=0.07 2023-11-27 02:29:35,495 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3685586.6666666665, ans=0.2 2023-11-27 02:29:38,025 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 552850 2023-11-27 02:29:41,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3685653.3333333335, ans=0.125 2023-11-27 02:29:50,222 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3685720.0, ans=0.2 2023-11-27 02:30:10,482 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 11800, loss[loss=0.07368, simple_loss=0.08858, pruned_loss=0.01677, audio_tagging_loss=0.01262, over 15840.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08971, pruned_loss=0.01231, audio_tagging_loss=0.008605, over 3044559.87 frames. ], batch size: 61, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 02:30:21,645 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.41 vs. limit=12.0 2023-11-27 02:30:34,019 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 552900 2023-11-27 02:30:34,112 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3685986.6666666665, ans=0.0 2023-11-27 02:30:37,094 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.095e+01 8.976e+01 9.582e+01 1.019e+02 1.579e+02, threshold=1.916e+02, percent-clipped=0.0 2023-11-27 02:30:37,379 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3685986.6666666665, ans=0.0 2023-11-27 02:30:38,583 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.79 vs. limit=15.0 2023-11-27 02:30:43,678 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3686053.3333333335, ans=0.1 2023-11-27 02:30:48,994 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3686053.3333333335, ans=0.1 2023-11-27 02:30:49,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3686053.3333333335, ans=0.125 2023-11-27 02:30:54,309 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3686120.0, ans=0.2 2023-11-27 02:31:06,368 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 11850, loss[loss=0.06516, simple_loss=0.08664, pruned_loss=0.01245, audio_tagging_loss=0.009385, over 15851.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.08973, pruned_loss=0.01228, audio_tagging_loss=0.008687, over 3043095.19 frames. ], batch size: 59, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 02:31:21,392 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3686253.3333333335, ans=0.125 2023-11-27 02:31:28,521 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 552950 2023-11-27 02:31:28,669 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 02:32:02,446 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 11900, loss[loss=0.0503, simple_loss=0.06804, pruned_loss=0.00672, audio_tagging_loss=0.009559, over 16037.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08954, pruned_loss=0.01212, audio_tagging_loss=0.008827, over 3044518.85 frames. ], batch size: 63, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 02:32:23,614 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 553000 2023-11-27 02:32:27,254 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.408e+01 8.694e+01 9.517e+01 1.011e+02 1.462e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-27 02:32:27,498 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3686653.3333333335, ans=0.0 2023-11-27 02:32:42,709 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3686720.0, ans=0.2 2023-11-27 02:32:53,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3686786.6666666665, ans=0.125 2023-11-27 02:32:57,282 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 11950, loss[loss=0.04778, simple_loss=0.05998, pruned_loss=0.004899, audio_tagging_loss=0.01289, over 14029.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08911, pruned_loss=0.01201, audio_tagging_loss=0.008836, over 3044363.91 frames. ], batch size: 53, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 02:33:06,079 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3686853.3333333335, ans=0.125 2023-11-27 02:33:09,762 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3686920.0, ans=0.09899494936611666 2023-11-27 02:33:20,251 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 553050 2023-11-27 02:33:33,356 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3687053.3333333335, ans=0.0 2023-11-27 02:33:51,402 INFO [train_asr.py:1235] (3/4) Epoch 46, batch 12000, loss[loss=0.05797, simple_loss=0.07911, pruned_loss=0.009807, audio_tagging_loss=0.008606, over 14951.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.0894, pruned_loss=0.01208, audio_tagging_loss=0.008933, over 3043895.49 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 32.0 2023-11-27 02:33:51,403 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-27 02:34:18,596 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.8258, 5.8771, 5.9090, 5.9042], device='cuda:3') 2023-11-27 02:34:23,569 INFO [train_asr.py:1267] (3/4) Epoch 46, validation: loss=0.05804, simple_loss=0.0505, pruned_loss=0.005297, audio_tagging_loss=0.02749, over 4681554.00 frames. 2023-11-27 02:34:23,569 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-27 02:34:44,512 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 553100 2023-11-27 02:35:18,841 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.357e+01 8.955e+01 9.759e+01 1.053e+02 1.237e+02, threshold=1.952e+02, percent-clipped=0.0 2023-11-27 02:35:18,869 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 0, loss[loss=0.06308, simple_loss=0.07826, pruned_loss=0.005547, audio_tagging_loss=0.0184, over 15074.00 frames. ], tot_loss[loss=0.06308, simple_loss=0.07826, pruned_loss=0.005547, audio_tagging_loss=0.0184, over 15074.00 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 02:35:18,869 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-27 02:35:50,393 INFO [train_asr.py:1267] (3/4) Epoch 47, validation: loss=0.05785, simple_loss=0.05054, pruned_loss=0.005317, audio_tagging_loss=0.02726, over 4681554.00 frames. 2023-11-27 02:35:50,393 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-27 02:35:57,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3687340.0, ans=0.2 2023-11-27 02:36:15,561 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3687473.3333333335, ans=0.125 2023-11-27 02:36:35,627 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.15 vs. limit=15.0 2023-11-27 02:36:42,294 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 553150 2023-11-27 02:36:45,422 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 50, loss[loss=0.06837, simple_loss=0.08144, pruned_loss=0.01149, audio_tagging_loss=0.01615, over 13574.00 frames. ], tot_loss[loss=0.07214, simple_loss=0.0885, pruned_loss=0.01151, audio_tagging_loss=0.01638, over 690772.15 frames. ], batch size: 53, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 02:36:46,673 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3687673.3333333335, ans=0.125 2023-11-27 02:36:47,141 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.67 vs. limit=15.0 2023-11-27 02:36:49,975 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3687673.3333333335, ans=0.0 2023-11-27 02:37:05,328 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3687740.0, ans=0.1 2023-11-27 02:37:06,546 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 02:37:07,486 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3687806.6666666665, ans=0.1 2023-11-27 02:37:08,611 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3687806.6666666665, ans=0.0 2023-11-27 02:37:08,669 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3687806.6666666665, ans=0.0 2023-11-27 02:37:13,891 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3687806.6666666665, ans=0.125 2023-11-27 02:37:37,431 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 553200 2023-11-27 02:37:41,649 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.064e+01 9.815e+01 1.050e+02 1.145e+02 1.417e+02, threshold=2.101e+02, percent-clipped=0.0 2023-11-27 02:37:41,680 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 100, loss[loss=0.07039, simple_loss=0.08121, pruned_loss=0.01257, audio_tagging_loss=0.01722, over 14824.00 frames. ], tot_loss[loss=0.07293, simple_loss=0.09051, pruned_loss=0.01217, audio_tagging_loss=0.0155, over 1213157.70 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 02:37:57,424 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3688073.3333333335, ans=0.125 2023-11-27 02:37:58,418 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3688073.3333333335, ans=0.125 2023-11-27 02:38:02,545 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3688140.0, ans=0.125 2023-11-27 02:38:08,853 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3688140.0, ans=0.0 2023-11-27 02:38:08,945 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3688140.0, ans=0.0 2023-11-27 02:38:12,247 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3688140.0, ans=0.125 2023-11-27 02:38:14,287 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3688206.6666666665, ans=0.125 2023-11-27 02:38:20,063 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3688206.6666666665, ans=0.2 2023-11-27 02:38:21,655 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3688206.6666666665, ans=0.1 2023-11-27 02:38:22,709 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3688206.6666666665, ans=0.0 2023-11-27 02:38:22,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3688206.6666666665, ans=0.1 2023-11-27 02:38:34,163 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 553250 2023-11-27 02:38:37,321 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 150, loss[loss=0.0611, simple_loss=0.08447, pruned_loss=0.007962, audio_tagging_loss=0.01091, over 15284.00 frames. ], tot_loss[loss=0.07065, simple_loss=0.08942, pruned_loss=0.01177, audio_tagging_loss=0.01417, over 1619221.47 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 8.0 2023-11-27 02:38:45,913 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3688340.0, ans=0.0 2023-11-27 02:38:50,732 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.37 vs. limit=15.0 2023-11-27 02:38:53,558 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3688406.6666666665, ans=0.0 2023-11-27 02:38:55,510 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3688406.6666666665, ans=0.0 2023-11-27 02:38:57,697 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3688473.3333333335, ans=0.07 2023-11-27 02:39:29,317 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 553300 2023-11-27 02:39:30,395 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3688606.6666666665, ans=0.0 2023-11-27 02:39:32,429 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 200, loss[loss=0.08275, simple_loss=0.1083, pruned_loss=0.01932, audio_tagging_loss=0.009308, over 14990.00 frames. ], tot_loss[loss=0.0693, simple_loss=0.0894, pruned_loss=0.01196, audio_tagging_loss=0.01264, over 1933397.23 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 8.0 2023-11-27 02:39:34,370 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.98 vs. limit=15.0 2023-11-27 02:39:34,555 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.211e+01 9.198e+01 9.713e+01 1.048e+02 1.227e+02, threshold=1.943e+02, percent-clipped=0.0 2023-11-27 02:39:37,070 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=3688673.3333333335, ans=22.5 2023-11-27 02:39:41,017 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3688673.3333333335, ans=0.1 2023-11-27 02:40:06,767 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.44 vs. limit=15.0 2023-11-27 02:40:14,857 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3688873.3333333335, ans=0.125 2023-11-27 02:40:24,210 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 553350 2023-11-27 02:40:27,872 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 250, loss[loss=0.08638, simple_loss=0.1252, pruned_loss=0.01778, audio_tagging_loss=0.005988, over 15731.00 frames. ], tot_loss[loss=0.06859, simple_loss=0.09059, pruned_loss=0.01206, audio_tagging_loss=0.01124, over 2187638.70 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 8.0 2023-11-27 02:40:36,707 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3689006.6666666665, ans=0.2 2023-11-27 02:40:44,545 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3689073.3333333335, ans=0.0 2023-11-27 02:40:51,944 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3689140.0, ans=0.125 2023-11-27 02:41:00,456 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3689206.6666666665, ans=0.125 2023-11-27 02:41:09,483 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3689206.6666666665, ans=0.125 2023-11-27 02:41:17,167 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3689273.3333333335, ans=0.0 2023-11-27 02:41:21,385 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 553400 2023-11-27 02:41:24,717 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 300, loss[loss=0.07758, simple_loss=0.1032, pruned_loss=0.01467, audio_tagging_loss=0.01133, over 17096.00 frames. ], tot_loss[loss=0.06839, simple_loss=0.09124, pruned_loss=0.01225, audio_tagging_loss=0.01052, over 2384640.64 frames. ], batch size: 62, lr: 1.44e-03, grad_scale: 8.0 2023-11-27 02:41:26,839 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.875e+01 9.232e+01 1.015e+02 1.128e+02 1.500e+02, threshold=2.030e+02, percent-clipped=0.0 2023-11-27 02:41:28,159 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3689340.0, ans=0.125 2023-11-27 02:41:53,597 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3689473.3333333335, ans=0.2 2023-11-27 02:42:14,527 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3689606.6666666665, ans=0.125 2023-11-27 02:42:16,582 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 553450 2023-11-27 02:42:19,774 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 350, loss[loss=0.07713, simple_loss=0.1131, pruned_loss=0.01481, audio_tagging_loss=0.005785, over 15524.00 frames. ], tot_loss[loss=0.06758, simple_loss=0.09077, pruned_loss=0.01219, audio_tagging_loss=0.01001, over 2545038.90 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 8.0 2023-11-27 02:42:20,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3689673.3333333335, ans=0.2 2023-11-27 02:42:40,257 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3689740.0, ans=0.0 2023-11-27 02:42:54,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3689873.3333333335, ans=0.0 2023-11-27 02:43:11,786 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 553500 2023-11-27 02:43:11,927 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3689940.0, ans=0.1 2023-11-27 02:43:15,457 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 400, loss[loss=0.07204, simple_loss=0.1033, pruned_loss=0.01318, audio_tagging_loss=0.007228, over 15323.00 frames. ], tot_loss[loss=0.06726, simple_loss=0.09056, pruned_loss=0.01231, audio_tagging_loss=0.009671, over 2658686.53 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 02:43:15,646 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3690006.6666666665, ans=0.125 2023-11-27 02:43:18,096 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.006e+01 8.931e+01 9.402e+01 1.042e+02 1.214e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-27 02:43:23,752 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3690006.6666666665, ans=0.125 2023-11-27 02:43:33,177 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3690073.3333333335, ans=0.1 2023-11-27 02:43:38,507 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3690140.0, ans=0.125 2023-11-27 02:43:52,532 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.63 vs. limit=15.0 2023-11-27 02:44:08,449 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 553550 2023-11-27 02:44:08,530 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3690273.3333333335, ans=0.1 2023-11-27 02:44:11,509 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 450, loss[loss=0.05772, simple_loss=0.07081, pruned_loss=0.01086, audio_tagging_loss=0.01146, over 15763.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.08957, pruned_loss=0.01216, audio_tagging_loss=0.009467, over 2744247.64 frames. ], batch size: 60, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 02:44:22,897 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3690406.6666666665, ans=0.1 2023-11-27 02:44:36,603 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3690473.3333333335, ans=0.035 2023-11-27 02:44:37,812 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3690473.3333333335, ans=0.125 2023-11-27 02:44:43,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3690473.3333333335, ans=0.2 2023-11-27 02:45:02,018 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3690606.6666666665, ans=0.0 2023-11-27 02:45:03,906 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 553600 2023-11-27 02:45:05,040 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3690606.6666666665, ans=0.125 2023-11-27 02:45:07,281 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 500, loss[loss=0.04853, simple_loss=0.05976, pruned_loss=0.008198, audio_tagging_loss=0.01046, over 14990.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08894, pruned_loss=0.01207, audio_tagging_loss=0.009273, over 2805264.50 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 02:45:09,480 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.401e+01 8.932e+01 9.491e+01 1.008e+02 1.797e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-27 02:45:20,317 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3690740.0, ans=0.0 2023-11-27 02:45:21,408 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3690740.0, ans=0.125 2023-11-27 02:45:57,905 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.94 vs. limit=15.0 2023-11-27 02:45:59,603 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 553650 2023-11-27 02:46:02,699 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 550, loss[loss=0.05168, simple_loss=0.07453, pruned_loss=0.00664, audio_tagging_loss=0.007772, over 15410.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08899, pruned_loss=0.01208, audio_tagging_loss=0.00912, over 2852311.50 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 02:46:17,373 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3691073.3333333335, ans=0.125 2023-11-27 02:46:21,565 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3691073.3333333335, ans=0.125 2023-11-27 02:46:41,157 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3691206.6666666665, ans=0.1 2023-11-27 02:46:42,585 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.29 vs. limit=10.0 2023-11-27 02:46:43,282 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3691206.6666666665, ans=0.125 2023-11-27 02:46:44,741 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.05 vs. limit=15.0 2023-11-27 02:46:53,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3691273.3333333335, ans=0.125 2023-11-27 02:46:55,927 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 553700 2023-11-27 02:46:59,536 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 600, loss[loss=0.07091, simple_loss=0.1015, pruned_loss=0.01293, audio_tagging_loss=0.007237, over 15290.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08849, pruned_loss=0.01201, audio_tagging_loss=0.00915, over 2895062.98 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 02:47:01,682 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.959e+01 8.835e+01 9.409e+01 1.013e+02 1.233e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-27 02:47:20,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3691473.3333333335, ans=0.125 2023-11-27 02:47:30,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3691473.3333333335, ans=0.125 2023-11-27 02:47:31,023 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.85 vs. limit=15.0 2023-11-27 02:47:51,517 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 553750 2023-11-27 02:47:55,220 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 650, loss[loss=0.0853, simple_loss=0.1173, pruned_loss=0.02026, audio_tagging_loss=0.006401, over 16646.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.08961, pruned_loss=0.01212, audio_tagging_loss=0.009086, over 2938648.29 frames. ], batch size: 63, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 02:48:01,796 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3691673.3333333335, ans=0.0 2023-11-27 02:48:10,296 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-27 02:48:11,857 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 02:48:26,366 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.04 vs. limit=15.0 2023-11-27 02:48:35,488 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=3691873.3333333335, ans=10.0 2023-11-27 02:48:42,498 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3691940.0, ans=0.2 2023-11-27 02:48:47,650 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 553800 2023-11-27 02:48:47,751 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3691940.0, ans=0.0 2023-11-27 02:48:51,055 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 700, loss[loss=0.07252, simple_loss=0.09452, pruned_loss=0.01381, audio_tagging_loss=0.01145, over 15059.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.09013, pruned_loss=0.01204, audio_tagging_loss=0.008966, over 2963138.52 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 02:48:52,249 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3692006.6666666665, ans=0.2 2023-11-27 02:48:53,138 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.109e+01 8.860e+01 9.509e+01 1.038e+02 1.459e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-27 02:49:03,891 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3692073.3333333335, ans=0.0 2023-11-27 02:49:44,256 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 553850 2023-11-27 02:49:47,903 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 750, loss[loss=0.06561, simple_loss=0.08405, pruned_loss=0.01322, audio_tagging_loss=0.01037, over 15529.00 frames. ], tot_loss[loss=0.06668, simple_loss=0.09099, pruned_loss=0.01229, audio_tagging_loss=0.008891, over 2978104.70 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 02:50:09,188 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.34 vs. limit=15.0 2023-11-27 02:50:16,244 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3692473.3333333335, ans=0.125 2023-11-27 02:50:17,136 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3692473.3333333335, ans=0.0 2023-11-27 02:50:40,016 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 553900 2023-11-27 02:50:43,109 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 800, loss[loss=0.07992, simple_loss=0.1039, pruned_loss=0.02084, audio_tagging_loss=0.007126, over 15508.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.0905, pruned_loss=0.01217, audio_tagging_loss=0.008941, over 2995226.95 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 02:50:45,225 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.898e+01 9.051e+01 9.571e+01 1.030e+02 1.342e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-27 02:51:08,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3692806.6666666665, ans=0.125 2023-11-27 02:51:11,846 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.11 vs. limit=15.0 2023-11-27 02:51:20,555 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3692873.3333333335, ans=0.0 2023-11-27 02:51:30,314 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3692940.0, ans=0.125 2023-11-27 02:51:35,972 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 553950 2023-11-27 02:51:39,023 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 850, loss[loss=0.07074, simple_loss=0.0897, pruned_loss=0.01587, audio_tagging_loss=0.01003, over 15999.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.09039, pruned_loss=0.01209, audio_tagging_loss=0.009054, over 3014398.88 frames. ], batch size: 61, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 02:51:46,099 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3693006.6666666665, ans=0.0 2023-11-27 02:52:15,804 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3693206.6666666665, ans=0.125 2023-11-27 02:52:32,227 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 554000 2023-11-27 02:52:32,339 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3693273.3333333335, ans=0.1 2023-11-27 02:52:35,565 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 900, loss[loss=0.06612, simple_loss=0.09366, pruned_loss=0.01159, audio_tagging_loss=0.007696, over 15944.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.08989, pruned_loss=0.01207, audio_tagging_loss=0.009121, over 3020887.97 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 02:52:36,748 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3693340.0, ans=0.125 2023-11-27 02:52:39,245 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.556e+01 8.856e+01 9.562e+01 1.034e+02 1.273e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-27 02:52:52,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3693406.6666666665, ans=0.125 2023-11-27 02:53:16,289 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.18 vs. limit=15.0 2023-11-27 02:53:18,218 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=3693540.0, ans=10.0 2023-11-27 02:53:19,194 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3693606.6666666665, ans=0.07 2023-11-27 02:53:26,154 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3693606.6666666665, ans=0.1 2023-11-27 02:53:28,091 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 554050 2023-11-27 02:53:31,190 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 950, loss[loss=0.04432, simple_loss=0.04734, pruned_loss=0.006726, audio_tagging_loss=0.01393, over 13884.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.09056, pruned_loss=0.01216, audio_tagging_loss=0.008957, over 3033455.62 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 02:53:31,916 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.42 vs. limit=15.0 2023-11-27 02:53:55,734 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3693806.6666666665, ans=0.125 2023-11-27 02:54:04,423 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3693873.3333333335, ans=0.1 2023-11-27 02:54:23,210 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 554100 2023-11-27 02:54:26,296 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 1000, loss[loss=0.0553, simple_loss=0.07219, pruned_loss=0.009889, audio_tagging_loss=0.009316, over 14932.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08925, pruned_loss=0.01173, audio_tagging_loss=0.008824, over 3036194.87 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 02:54:29,441 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.094e+01 9.006e+01 9.495e+01 1.025e+02 1.376e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-27 02:54:29,752 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3694006.6666666665, ans=0.125 2023-11-27 02:54:44,996 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3694073.3333333335, ans=0.2 2023-11-27 02:54:49,637 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 02:54:54,006 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3694140.0, ans=0.125 2023-11-27 02:55:06,742 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff3.min_abs, batch_count=3694206.6666666665, ans=0.2 2023-11-27 02:55:10,112 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.550e-03 2023-11-27 02:55:19,482 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 554150 2023-11-27 02:55:23,095 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 1050, loss[loss=0.09765, simple_loss=0.1309, pruned_loss=0.02198, audio_tagging_loss=0.01022, over 14420.00 frames. ], tot_loss[loss=0.06475, simple_loss=0.08854, pruned_loss=0.01165, audio_tagging_loss=0.008822, over 3035252.74 frames. ], batch size: 53, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 02:55:58,344 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.16 vs. limit=12.0 2023-11-27 02:56:15,436 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 554200 2023-11-27 02:56:17,173 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.00 vs. limit=15.0 2023-11-27 02:56:18,832 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 1100, loss[loss=0.08011, simple_loss=0.1135, pruned_loss=0.01559, audio_tagging_loss=0.007789, over 14884.00 frames. ], tot_loss[loss=0.06454, simple_loss=0.08815, pruned_loss=0.01169, audio_tagging_loss=0.008767, over 3035363.96 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 02:56:19,886 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 02:56:21,956 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.547e+01 8.965e+01 9.717e+01 1.039e+02 1.284e+02, threshold=1.943e+02, percent-clipped=0.0 2023-11-27 02:56:25,322 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3694673.3333333335, ans=0.125 2023-11-27 02:56:40,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3694806.6666666665, ans=0.04949747468305833 2023-11-27 02:56:41,190 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3694806.6666666665, ans=0.2 2023-11-27 02:56:44,493 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.26 vs. limit=15.0 2023-11-27 02:56:48,603 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3694806.6666666665, ans=0.125 2023-11-27 02:56:57,832 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3694873.3333333335, ans=0.1 2023-11-27 02:57:09,410 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3694940.0, ans=0.0 2023-11-27 02:57:10,353 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 554250 2023-11-27 02:57:13,425 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 1150, loss[loss=0.05911, simple_loss=0.08498, pruned_loss=0.008649, audio_tagging_loss=0.007972, over 15274.00 frames. ], tot_loss[loss=0.06471, simple_loss=0.08861, pruned_loss=0.0118, audio_tagging_loss=0.008603, over 3034322.42 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 02:57:21,918 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3695006.6666666665, ans=0.0 2023-11-27 02:57:38,938 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3695140.0, ans=0.125 2023-11-27 02:57:44,270 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3695140.0, ans=0.0 2023-11-27 02:57:45,823 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.50 vs. limit=15.0 2023-11-27 02:57:51,668 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3695206.6666666665, ans=0.0 2023-11-27 02:58:05,417 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 554300 2023-11-27 02:58:08,344 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3695340.0, ans=0.1 2023-11-27 02:58:09,071 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 1200, loss[loss=0.05176, simple_loss=0.07083, pruned_loss=0.007479, audio_tagging_loss=0.008865, over 16199.00 frames. ], tot_loss[loss=0.06474, simple_loss=0.08856, pruned_loss=0.01186, audio_tagging_loss=0.008601, over 3042663.11 frames. ], batch size: 63, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 02:58:09,796 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3695340.0, ans=0.2 2023-11-27 02:58:10,850 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3695340.0, ans=0.2 2023-11-27 02:58:12,706 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.838e+01 8.950e+01 9.657e+01 1.053e+02 1.302e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-27 02:58:16,556 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.39 vs. limit=15.0 2023-11-27 02:58:26,656 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.91 vs. limit=22.5 2023-11-27 02:58:32,633 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3695473.3333333335, ans=0.0 2023-11-27 02:58:47,549 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3695540.0, ans=0.2 2023-11-27 02:59:02,061 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 554350 2023-11-27 02:59:05,177 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 1250, loss[loss=0.05181, simple_loss=0.07184, pruned_loss=0.007112, audio_tagging_loss=0.008779, over 14775.00 frames. ], tot_loss[loss=0.06474, simple_loss=0.08854, pruned_loss=0.01191, audio_tagging_loss=0.00856, over 3041974.00 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 02:59:13,880 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3695673.3333333335, ans=0.2 2023-11-27 02:59:15,592 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.75 vs. limit=5.0 2023-11-27 02:59:20,354 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3695740.0, ans=0.125 2023-11-27 02:59:21,390 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3695740.0, ans=0.0 2023-11-27 02:59:21,629 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.82 vs. limit=15.0 2023-11-27 02:59:28,745 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3695806.6666666665, ans=0.125 2023-11-27 02:59:57,588 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 554400 2023-11-27 02:59:57,827 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3695940.0, ans=0.0 2023-11-27 03:00:01,010 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 1300, loss[loss=0.06734, simple_loss=0.09323, pruned_loss=0.009378, audio_tagging_loss=0.01134, over 15192.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.089, pruned_loss=0.01178, audio_tagging_loss=0.00852, over 3044886.26 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:00:04,121 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.705e+01 8.987e+01 9.539e+01 1.033e+02 1.348e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-27 03:00:31,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3696140.0, ans=0.125 2023-11-27 03:00:41,114 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.31 vs. limit=15.0 2023-11-27 03:00:50,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3696273.3333333335, ans=0.125 2023-11-27 03:00:53,265 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 554450 2023-11-27 03:00:56,948 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 1350, loss[loss=0.07302, simple_loss=0.1047, pruned_loss=0.01485, audio_tagging_loss=0.005818, over 14992.00 frames. ], tot_loss[loss=0.06465, simple_loss=0.08847, pruned_loss=0.01184, audio_tagging_loss=0.008574, over 3038120.54 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:01:04,145 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3696340.0, ans=0.1 2023-11-27 03:01:25,382 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3696473.3333333335, ans=0.125 2023-11-27 03:01:35,753 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 03:01:50,033 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 554500 2023-11-27 03:01:53,140 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 1400, loss[loss=0.06406, simple_loss=0.08922, pruned_loss=0.009756, audio_tagging_loss=0.009696, over 16045.00 frames. ], tot_loss[loss=0.06482, simple_loss=0.08864, pruned_loss=0.01188, audio_tagging_loss=0.008616, over 3050290.78 frames. ], batch size: 61, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:01:55,427 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3696673.3333333335, ans=0.125 2023-11-27 03:01:57,319 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.439e+01 8.796e+01 9.481e+01 1.017e+02 1.266e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-27 03:02:00,753 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3696673.3333333335, ans=0.125 2023-11-27 03:02:04,269 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.56 vs. limit=6.0 2023-11-27 03:02:07,030 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3696740.0, ans=0.0 2023-11-27 03:02:09,207 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3696740.0, ans=0.125 2023-11-27 03:02:29,130 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3696873.3333333335, ans=0.2 2023-11-27 03:02:37,001 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.30 vs. limit=10.0 2023-11-27 03:02:44,922 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 554550 2023-11-27 03:02:48,029 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 1450, loss[loss=0.05964, simple_loss=0.08154, pruned_loss=0.01028, audio_tagging_loss=0.008597, over 15932.00 frames. ], tot_loss[loss=0.06455, simple_loss=0.08799, pruned_loss=0.01186, audio_tagging_loss=0.008701, over 3047646.82 frames. ], batch size: 62, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:03:06,477 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3697073.3333333335, ans=0.5 2023-11-27 03:03:25,401 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3697206.6666666665, ans=0.125 2023-11-27 03:03:38,715 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.96 vs. limit=15.0 2023-11-27 03:03:40,316 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 554600 2023-11-27 03:03:43,658 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 1500, loss[loss=0.07541, simple_loss=0.08995, pruned_loss=0.01926, audio_tagging_loss=0.01117, over 15167.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08938, pruned_loss=0.01215, audio_tagging_loss=0.008583, over 3048552.30 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:03:48,362 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.418e+01 9.023e+01 9.880e+01 1.062e+02 1.307e+02, threshold=1.976e+02, percent-clipped=0.0 2023-11-27 03:03:59,396 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.55 vs. limit=15.0 2023-11-27 03:04:02,937 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3697406.6666666665, ans=0.0 2023-11-27 03:04:03,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3697406.6666666665, ans=0.125 2023-11-27 03:04:11,813 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.40 vs. limit=6.0 2023-11-27 03:04:14,559 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3697473.3333333335, ans=0.125 2023-11-27 03:04:36,697 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 554650 2023-11-27 03:04:37,927 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3697606.6666666665, ans=0.025 2023-11-27 03:04:40,349 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 1550, loss[loss=0.05284, simple_loss=0.06666, pruned_loss=0.007732, audio_tagging_loss=0.01178, over 14677.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.08899, pruned_loss=0.01217, audio_tagging_loss=0.008668, over 3047061.35 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:04:48,792 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.53 vs. limit=10.0 2023-11-27 03:04:54,891 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3697740.0, ans=0.125 2023-11-27 03:04:59,073 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3697740.0, ans=0.125 2023-11-27 03:05:02,335 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3697806.6666666665, ans=0.0 2023-11-27 03:05:29,896 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3697940.0, ans=0.0 2023-11-27 03:05:32,950 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 554700 2023-11-27 03:05:36,041 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 1600, loss[loss=0.07107, simple_loss=0.09901, pruned_loss=0.01265, audio_tagging_loss=0.00891, over 14396.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08889, pruned_loss=0.01214, audio_tagging_loss=0.008715, over 3045311.32 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:05:38,969 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.58 vs. limit=15.0 2023-11-27 03:05:40,326 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3698006.6666666665, ans=0.125 2023-11-27 03:05:41,276 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.275e+01 8.979e+01 9.588e+01 1.025e+02 1.510e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-27 03:06:01,586 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3698140.0, ans=0.0 2023-11-27 03:06:06,435 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3698140.0, ans=0.0 2023-11-27 03:06:07,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3698140.0, ans=0.125 2023-11-27 03:06:27,980 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 554750 2023-11-27 03:06:31,094 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 1650, loss[loss=0.06232, simple_loss=0.08835, pruned_loss=0.007613, audio_tagging_loss=0.01053, over 15308.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08935, pruned_loss=0.01213, audio_tagging_loss=0.008825, over 3044438.69 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:06:51,138 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.07 vs. limit=15.0 2023-11-27 03:06:51,945 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3698406.6666666665, ans=0.1 2023-11-27 03:07:03,109 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3698473.3333333335, ans=0.0 2023-11-27 03:07:03,113 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3698473.3333333335, ans=0.125 2023-11-27 03:07:03,139 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3698473.3333333335, ans=0.2 2023-11-27 03:07:07,305 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3698540.0, ans=0.125 2023-11-27 03:07:07,377 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3698540.0, ans=0.0 2023-11-27 03:07:12,498 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3698540.0, ans=0.2 2023-11-27 03:07:23,990 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 554800 2023-11-27 03:07:27,485 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 1700, loss[loss=0.0436, simple_loss=0.05493, pruned_loss=0.007461, audio_tagging_loss=0.008674, over 15407.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.08934, pruned_loss=0.01222, audio_tagging_loss=0.008845, over 3046291.10 frames. ], batch size: 61, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:07:29,120 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.31 vs. limit=22.5 2023-11-27 03:07:33,350 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.387e+01 8.918e+01 9.493e+01 1.014e+02 1.179e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-27 03:07:34,606 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3698673.3333333335, ans=0.125 2023-11-27 03:07:35,826 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 03:07:35,842 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3698673.3333333335, ans=0.1 2023-11-27 03:07:37,895 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3698740.0, ans=0.125 2023-11-27 03:07:46,829 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3698740.0, ans=0.125 2023-11-27 03:07:50,640 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.57 vs. limit=22.5 2023-11-27 03:08:08,505 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.65 vs. limit=15.0 2023-11-27 03:08:20,296 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 554850 2023-11-27 03:08:24,005 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 1750, loss[loss=0.07171, simple_loss=0.09806, pruned_loss=0.01417, audio_tagging_loss=0.008507, over 15469.00 frames. ], tot_loss[loss=0.066, simple_loss=0.08993, pruned_loss=0.01228, audio_tagging_loss=0.008745, over 3049342.12 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:08:46,164 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3699140.0, ans=0.1 2023-11-27 03:08:47,893 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3699140.0, ans=0.2 2023-11-27 03:09:00,282 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3699206.6666666665, ans=0.125 2023-11-27 03:09:13,496 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3699273.3333333335, ans=0.0 2023-11-27 03:09:14,583 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3699273.3333333335, ans=0.0 2023-11-27 03:09:16,561 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 554900 2023-11-27 03:09:19,729 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 1800, loss[loss=0.08843, simple_loss=0.1312, pruned_loss=0.01789, audio_tagging_loss=0.004937, over 16320.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.09035, pruned_loss=0.01227, audio_tagging_loss=0.008602, over 3045086.86 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 8.0 2023-11-27 03:09:26,754 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.056e+01 8.924e+01 9.583e+01 9.926e+01 1.257e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-27 03:09:35,895 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=3699406.6666666665, ans=0.5 2023-11-27 03:09:40,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3699406.6666666665, ans=0.1 2023-11-27 03:09:57,129 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3699540.0, ans=0.0 2023-11-27 03:10:12,793 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 554950 2023-11-27 03:10:16,019 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 1850, loss[loss=0.07817, simple_loss=0.1136, pruned_loss=0.01441, audio_tagging_loss=0.006982, over 15060.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.0906, pruned_loss=0.0123, audio_tagging_loss=0.008604, over 3044407.11 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 8.0 2023-11-27 03:10:43,751 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3699806.6666666665, ans=0.05 2023-11-27 03:10:59,681 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3699940.0, ans=0.1 2023-11-27 03:11:05,040 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.18 vs. limit=6.0 2023-11-27 03:11:05,727 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3699940.0, ans=0.5 2023-11-27 03:11:08,699 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 555000 2023-11-27 03:11:12,149 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 1900, loss[loss=0.08165, simple_loss=0.1081, pruned_loss=0.02145, audio_tagging_loss=0.006156, over 16115.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08975, pruned_loss=0.01213, audio_tagging_loss=0.008495, over 3041367.65 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 8.0 2023-11-27 03:11:18,506 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.220e+01 9.107e+01 9.832e+01 1.049e+02 1.489e+02, threshold=1.966e+02, percent-clipped=0.0 2023-11-27 03:11:18,772 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3700006.6666666665, ans=0.125 2023-11-27 03:11:20,761 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.55 vs. limit=12.0 2023-11-27 03:12:05,090 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 555050 2023-11-27 03:12:08,240 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 1950, loss[loss=0.05418, simple_loss=0.06201, pruned_loss=0.01119, audio_tagging_loss=0.01199, over 15477.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.09027, pruned_loss=0.0121, audio_tagging_loss=0.008474, over 3039270.10 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 8.0 2023-11-27 03:12:18,773 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.14 vs. limit=12.0 2023-11-27 03:12:21,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3700406.6666666665, ans=0.07 2023-11-27 03:12:47,568 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3700540.0, ans=0.125 2023-11-27 03:12:51,813 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3700606.6666666665, ans=0.125 2023-11-27 03:13:00,677 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 555100 2023-11-27 03:13:04,284 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 2000, loss[loss=0.07542, simple_loss=0.09973, pruned_loss=0.01487, audio_tagging_loss=0.01069, over 15178.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08897, pruned_loss=0.01193, audio_tagging_loss=0.00857, over 3034920.15 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:13:09,860 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3700673.3333333335, ans=0.0 2023-11-27 03:13:11,254 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.430e+01 8.780e+01 9.356e+01 1.007e+02 1.266e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-27 03:13:36,489 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 03:13:57,180 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 555150 2023-11-27 03:14:00,289 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 2050, loss[loss=0.06245, simple_loss=0.09748, pruned_loss=0.007517, audio_tagging_loss=0.006197, over 16366.00 frames. ], tot_loss[loss=0.06409, simple_loss=0.0877, pruned_loss=0.01161, audio_tagging_loss=0.008638, over 3039246.13 frames. ], batch size: 60, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:14:04,051 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.02 vs. limit=15.0 2023-11-27 03:14:11,131 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 03:14:15,402 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3701073.3333333335, ans=0.0 2023-11-27 03:14:30,703 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3701140.0, ans=0.1 2023-11-27 03:14:49,622 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.88 vs. limit=15.0 2023-11-27 03:14:52,361 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 555200 2023-11-27 03:14:55,682 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 2100, loss[loss=0.05747, simple_loss=0.07259, pruned_loss=0.01156, audio_tagging_loss=0.009607, over 14471.00 frames. ], tot_loss[loss=0.06457, simple_loss=0.0885, pruned_loss=0.01179, audio_tagging_loss=0.008532, over 3042661.04 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:14:57,285 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.10 vs. limit=15.0 2023-11-27 03:15:02,568 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.623e+01 8.818e+01 9.814e+01 1.041e+02 1.368e+02, threshold=1.963e+02, percent-clipped=0.0 2023-11-27 03:15:11,652 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3701406.6666666665, ans=0.0 2023-11-27 03:15:26,111 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.58 vs. limit=6.0 2023-11-27 03:15:49,105 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 555250 2023-11-27 03:15:52,281 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 2150, loss[loss=0.06596, simple_loss=0.08802, pruned_loss=0.01314, audio_tagging_loss=0.008809, over 14840.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.08896, pruned_loss=0.01199, audio_tagging_loss=0.008597, over 3036473.73 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:15:59,445 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3701673.3333333335, ans=0.07 2023-11-27 03:16:24,273 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 03:16:45,541 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 555300 2023-11-27 03:16:45,701 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=3701940.0, ans=0.1 2023-11-27 03:16:48,648 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 2200, loss[loss=0.06872, simple_loss=0.09723, pruned_loss=0.01067, audio_tagging_loss=0.009431, over 15496.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08931, pruned_loss=0.01211, audio_tagging_loss=0.008607, over 3040032.13 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:16:48,966 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3702006.6666666665, ans=0.125 2023-11-27 03:16:55,030 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.053e+01 9.078e+01 9.706e+01 1.033e+02 2.180e+02, threshold=1.941e+02, percent-clipped=1.0 2023-11-27 03:17:04,864 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3702073.3333333335, ans=0.0 2023-11-27 03:17:20,177 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3702140.0, ans=0.0 2023-11-27 03:17:21,176 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=3702206.6666666665, ans=0.025 2023-11-27 03:17:22,674 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.47 vs. limit=15.0 2023-11-27 03:17:24,305 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3702206.6666666665, ans=0.0 2023-11-27 03:17:26,448 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3702206.6666666665, ans=0.0 2023-11-27 03:17:32,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3702273.3333333335, ans=0.125 2023-11-27 03:17:38,639 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3702273.3333333335, ans=0.125 2023-11-27 03:17:38,673 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3702273.3333333335, ans=0.1 2023-11-27 03:17:40,561 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 555350 2023-11-27 03:17:41,816 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3702273.3333333335, ans=0.125 2023-11-27 03:17:43,601 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 2250, loss[loss=0.06163, simple_loss=0.07353, pruned_loss=0.01376, audio_tagging_loss=0.01111, over 15668.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08887, pruned_loss=0.01217, audio_tagging_loss=0.008677, over 3041801.59 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:17:44,808 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3702340.0, ans=0.1 2023-11-27 03:17:54,918 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3702406.6666666665, ans=0.0 2023-11-27 03:18:08,746 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3702473.3333333335, ans=0.125 2023-11-27 03:18:34,810 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3702606.6666666665, ans=0.0 2023-11-27 03:18:34,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3702606.6666666665, ans=0.0 2023-11-27 03:18:35,788 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 555400 2023-11-27 03:18:39,677 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 2300, loss[loss=0.07931, simple_loss=0.1107, pruned_loss=0.01571, audio_tagging_loss=0.008264, over 15185.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.08975, pruned_loss=0.01235, audio_tagging_loss=0.00869, over 3042831.14 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:18:42,063 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3702673.3333333335, ans=0.2 2023-11-27 03:18:45,899 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3702673.3333333335, ans=0.0 2023-11-27 03:18:46,633 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.925e+01 9.197e+01 9.925e+01 1.066e+02 1.274e+02, threshold=1.985e+02, percent-clipped=0.0 2023-11-27 03:18:49,076 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3702673.3333333335, ans=0.125 2023-11-27 03:18:55,375 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.55 vs. limit=15.0 2023-11-27 03:19:12,845 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3702873.3333333335, ans=0.1 2023-11-27 03:19:17,196 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3702873.3333333335, ans=0.0 2023-11-27 03:19:20,814 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3702873.3333333335, ans=0.0 2023-11-27 03:19:24,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3702940.0, ans=0.125 2023-11-27 03:19:27,959 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 03:19:32,274 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 555450 2023-11-27 03:19:33,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3702940.0, ans=0.2 2023-11-27 03:19:35,982 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 2350, loss[loss=0.07127, simple_loss=0.09997, pruned_loss=0.01258, audio_tagging_loss=0.008706, over 15394.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.09001, pruned_loss=0.01231, audio_tagging_loss=0.00875, over 3042655.35 frames. ], batch size: 60, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:19:45,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3703073.3333333335, ans=0.125 2023-11-27 03:19:46,973 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3703073.3333333335, ans=0.125 2023-11-27 03:20:04,431 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.81 vs. limit=22.5 2023-11-27 03:20:24,804 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3703273.3333333335, ans=0.0 2023-11-27 03:20:27,960 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 555500 2023-11-27 03:20:31,098 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 2400, loss[loss=0.05601, simple_loss=0.07038, pruned_loss=0.007863, audio_tagging_loss=0.01295, over 16656.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.09003, pruned_loss=0.01218, audio_tagging_loss=0.008891, over 3040098.89 frames. ], batch size: 63, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:20:31,706 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.04 vs. limit=22.5 2023-11-27 03:20:37,430 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.394e+01 8.851e+01 9.612e+01 1.018e+02 1.276e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-27 03:20:42,023 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3703406.6666666665, ans=0.125 2023-11-27 03:20:50,629 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 03:21:14,888 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3703606.6666666665, ans=0.125 2023-11-27 03:21:23,242 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 555550 2023-11-27 03:21:26,361 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 2450, loss[loss=0.08584, simple_loss=0.1234, pruned_loss=0.01928, audio_tagging_loss=0.004851, over 16099.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.09017, pruned_loss=0.01223, audio_tagging_loss=0.008927, over 3038016.13 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:21:32,438 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3703673.3333333335, ans=0.0 2023-11-27 03:21:37,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3703740.0, ans=0.1 2023-11-27 03:21:53,072 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3703806.6666666665, ans=0.0 2023-11-27 03:21:59,668 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.16 vs. limit=15.0 2023-11-27 03:22:08,980 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.97 vs. limit=6.0 2023-11-27 03:22:18,761 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3703940.0, ans=0.0 2023-11-27 03:22:19,714 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 555600 2023-11-27 03:22:23,111 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 2500, loss[loss=0.06959, simple_loss=0.09956, pruned_loss=0.01103, audio_tagging_loss=0.008773, over 15594.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.09025, pruned_loss=0.01225, audio_tagging_loss=0.008891, over 3034013.64 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:22:30,032 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.789e+01 9.086e+01 9.685e+01 1.022e+02 1.331e+02, threshold=1.937e+02, percent-clipped=0.0 2023-11-27 03:22:30,516 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=3704006.6666666665, ans=15.0 2023-11-27 03:22:49,702 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.32 vs. limit=10.0 2023-11-27 03:22:50,435 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3704140.0, ans=0.125 2023-11-27 03:23:04,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3704206.6666666665, ans=0.0 2023-11-27 03:23:08,609 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3704273.3333333335, ans=0.05 2023-11-27 03:23:09,662 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3704273.3333333335, ans=0.125 2023-11-27 03:23:15,743 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 555650 2023-11-27 03:23:18,858 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 2550, loss[loss=0.06205, simple_loss=0.08958, pruned_loss=0.009979, audio_tagging_loss=0.007283, over 15484.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08963, pruned_loss=0.01223, audio_tagging_loss=0.008794, over 3030592.04 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:23:21,063 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3704340.0, ans=0.1 2023-11-27 03:23:24,320 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3704340.0, ans=0.125 2023-11-27 03:23:25,229 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3704340.0, ans=0.125 2023-11-27 03:23:26,347 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3704340.0, ans=0.125 2023-11-27 03:23:28,583 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3704406.6666666665, ans=0.0 2023-11-27 03:23:36,444 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3704406.6666666665, ans=0.125 2023-11-27 03:23:46,485 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.77 vs. limit=12.0 2023-11-27 03:23:49,931 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3704473.3333333335, ans=0.1 2023-11-27 03:23:56,428 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3704540.0, ans=0.1 2023-11-27 03:24:10,990 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 555700 2023-11-27 03:24:14,133 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 2600, loss[loss=0.05289, simple_loss=0.07846, pruned_loss=0.007214, audio_tagging_loss=0.006444, over 15758.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08929, pruned_loss=0.01199, audio_tagging_loss=0.008651, over 3031757.59 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:24:17,531 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3704673.3333333335, ans=0.025 2023-11-27 03:24:22,130 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.465e+01 9.053e+01 9.535e+01 1.024e+02 1.234e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-27 03:24:32,880 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.69 vs. limit=12.0 2023-11-27 03:24:41,987 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3704806.6666666665, ans=0.2 2023-11-27 03:24:49,867 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.66 vs. limit=15.0 2023-11-27 03:24:51,482 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3704873.3333333335, ans=0.125 2023-11-27 03:25:01,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3704940.0, ans=0.2 2023-11-27 03:25:02,565 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3704940.0, ans=0.125 2023-11-27 03:25:07,327 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 555750 2023-11-27 03:25:10,355 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 2650, loss[loss=0.05834, simple_loss=0.08453, pruned_loss=0.007613, audio_tagging_loss=0.008457, over 15401.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.08963, pruned_loss=0.01197, audio_tagging_loss=0.008596, over 3035398.23 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:25:20,657 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3705073.3333333335, ans=0.125 2023-11-27 03:25:53,525 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3705273.3333333335, ans=0.1 2023-11-27 03:26:02,917 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 555800 2023-11-27 03:26:04,462 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.39 vs. limit=15.0 2023-11-27 03:26:06,301 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 2700, loss[loss=0.06249, simple_loss=0.08231, pruned_loss=0.01149, audio_tagging_loss=0.009845, over 14957.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.0891, pruned_loss=0.01179, audio_tagging_loss=0.008615, over 3038039.99 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:26:13,771 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.275e+01 9.070e+01 9.755e+01 1.047e+02 1.495e+02, threshold=1.951e+02, percent-clipped=0.0 2023-11-27 03:26:24,670 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3705406.6666666665, ans=0.0 2023-11-27 03:26:36,441 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3705473.3333333335, ans=0.2 2023-11-27 03:26:58,390 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 555850 2023-11-27 03:27:01,515 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 2750, loss[loss=0.06042, simple_loss=0.08889, pruned_loss=0.008236, audio_tagging_loss=0.00774, over 15079.00 frames. ], tot_loss[loss=0.06492, simple_loss=0.08912, pruned_loss=0.01187, audio_tagging_loss=0.0085, over 3039324.35 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:27:12,916 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3705740.0, ans=0.125 2023-11-27 03:27:19,785 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3705740.0, ans=0.1 2023-11-27 03:27:22,388 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3705740.0, ans=0.07 2023-11-27 03:27:49,882 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 03:27:54,243 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 555900 2023-11-27 03:27:54,392 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3705940.0, ans=0.2 2023-11-27 03:27:58,011 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 2800, loss[loss=0.08406, simple_loss=0.12, pruned_loss=0.01764, audio_tagging_loss=0.006434, over 14485.00 frames. ], tot_loss[loss=0.06482, simple_loss=0.08895, pruned_loss=0.01181, audio_tagging_loss=0.008542, over 3039720.05 frames. ], batch size: 53, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:27:59,333 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3706006.6666666665, ans=0.0 2023-11-27 03:28:00,382 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3706006.6666666665, ans=0.2 2023-11-27 03:28:05,912 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.107e+01 8.970e+01 9.604e+01 1.036e+02 1.276e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-27 03:28:15,484 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.77 vs. limit=15.0 2023-11-27 03:28:15,611 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.99 vs. limit=22.5 2023-11-27 03:28:25,698 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3706140.0, ans=0.0 2023-11-27 03:28:27,884 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3706140.0, ans=0.125 2023-11-27 03:28:35,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3706206.6666666665, ans=0.125 2023-11-27 03:28:50,590 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 555950 2023-11-27 03:28:54,315 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 2850, loss[loss=0.06798, simple_loss=0.0948, pruned_loss=0.01576, audio_tagging_loss=0.004824, over 15558.00 frames. ], tot_loss[loss=0.06475, simple_loss=0.0888, pruned_loss=0.01187, audio_tagging_loss=0.008487, over 3051211.90 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:28:57,981 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=3706340.0, ans=15.0 2023-11-27 03:29:19,479 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3706473.3333333335, ans=0.04949747468305833 2023-11-27 03:29:46,425 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 556000 2023-11-27 03:29:51,988 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 2900, loss[loss=0.07353, simple_loss=0.1002, pruned_loss=0.0126, audio_tagging_loss=0.01081, over 16589.00 frames. ], tot_loss[loss=0.0652, simple_loss=0.08942, pruned_loss=0.01197, audio_tagging_loss=0.00852, over 3051570.66 frames. ], batch size: 61, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:29:59,420 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.629e+01 8.856e+01 9.574e+01 1.046e+02 1.351e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-27 03:30:06,947 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3706740.0, ans=0.0 2023-11-27 03:30:20,261 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3706806.6666666665, ans=0.2 2023-11-27 03:30:36,915 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=3706940.0, ans=15.0 2023-11-27 03:30:40,056 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.02 vs. limit=15.0 2023-11-27 03:30:43,657 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.75 vs. limit=15.0 2023-11-27 03:30:44,355 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 556050 2023-11-27 03:30:47,478 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 2950, loss[loss=0.07177, simple_loss=0.09896, pruned_loss=0.01454, audio_tagging_loss=0.007744, over 15230.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.0903, pruned_loss=0.01213, audio_tagging_loss=0.00847, over 3053950.18 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:30:54,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3707006.6666666665, ans=0.125 2023-11-27 03:31:25,251 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.06 vs. limit=15.0 2023-11-27 03:31:40,933 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 556100 2023-11-27 03:31:44,035 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 3000, loss[loss=0.05706, simple_loss=0.07611, pruned_loss=0.009525, audio_tagging_loss=0.009481, over 15596.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08976, pruned_loss=0.01207, audio_tagging_loss=0.008589, over 3060455.91 frames. ], batch size: 61, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:31:44,035 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-27 03:32:16,614 INFO [train_asr.py:1267] (3/4) Epoch 47, validation: loss=0.05735, simple_loss=0.05053, pruned_loss=0.005352, audio_tagging_loss=0.02673, over 4681554.00 frames. 2023-11-27 03:32:16,615 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-27 03:32:16,812 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3707340.0, ans=0.125 2023-11-27 03:32:25,492 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.754e+01 9.179e+01 9.770e+01 1.041e+02 1.490e+02, threshold=1.954e+02, percent-clipped=0.0 2023-11-27 03:32:33,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3707406.6666666665, ans=0.05 2023-11-27 03:32:38,680 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3707473.3333333335, ans=0.0 2023-11-27 03:33:02,753 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3707606.6666666665, ans=0.0 2023-11-27 03:33:02,844 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.26 vs. limit=15.0 2023-11-27 03:33:09,528 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 556150 2023-11-27 03:33:11,685 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3707673.3333333335, ans=0.1 2023-11-27 03:33:13,153 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 3050, loss[loss=0.08275, simple_loss=0.1167, pruned_loss=0.01756, audio_tagging_loss=0.006847, over 14820.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08943, pruned_loss=0.01198, audio_tagging_loss=0.008674, over 3061307.39 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:33:17,149 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.75 vs. limit=15.0 2023-11-27 03:33:19,280 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3707673.3333333335, ans=0.125 2023-11-27 03:33:45,030 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 03:33:52,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3707873.3333333335, ans=0.1 2023-11-27 03:34:05,886 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 556200 2023-11-27 03:34:09,356 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 3100, loss[loss=0.05122, simple_loss=0.06924, pruned_loss=0.005454, audio_tagging_loss=0.01115, over 14023.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08952, pruned_loss=0.012, audio_tagging_loss=0.008687, over 3055272.26 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:34:17,730 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.478e+01 9.052e+01 9.705e+01 1.059e+02 1.500e+02, threshold=1.941e+02, percent-clipped=0.0 2023-11-27 03:34:20,770 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3708073.3333333335, ans=0.0 2023-11-27 03:34:21,708 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3708073.3333333335, ans=0.035 2023-11-27 03:34:25,889 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3708073.3333333335, ans=0.1 2023-11-27 03:34:31,823 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.65 vs. limit=15.0 2023-11-27 03:34:35,470 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.76 vs. limit=15.0 2023-11-27 03:34:40,885 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3708140.0, ans=0.0 2023-11-27 03:34:53,291 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3708273.3333333335, ans=0.125 2023-11-27 03:35:01,574 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 556250 2023-11-27 03:35:05,257 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 3150, loss[loss=0.07849, simple_loss=0.1082, pruned_loss=0.01599, audio_tagging_loss=0.008407, over 15449.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.0898, pruned_loss=0.01206, audio_tagging_loss=0.008702, over 3059144.08 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:35:07,573 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3708340.0, ans=0.04949747468305833 2023-11-27 03:35:29,017 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.88 vs. limit=15.0 2023-11-27 03:35:30,859 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3708473.3333333335, ans=0.0 2023-11-27 03:35:38,178 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3708540.0, ans=0.125 2023-11-27 03:35:40,716 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.24 vs. limit=12.0 2023-11-27 03:35:42,372 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3708540.0, ans=0.1 2023-11-27 03:35:58,268 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 556300 2023-11-27 03:36:01,314 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 3200, loss[loss=0.05308, simple_loss=0.07153, pruned_loss=0.005976, audio_tagging_loss=0.01134, over 15532.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.09008, pruned_loss=0.012, audio_tagging_loss=0.008778, over 3055528.34 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:36:10,710 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.678e+01 8.966e+01 9.584e+01 1.017e+02 1.282e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-27 03:36:14,117 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3708740.0, ans=0.125 2023-11-27 03:36:22,983 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.71 vs. limit=6.0 2023-11-27 03:36:25,782 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3708806.6666666665, ans=0.1 2023-11-27 03:36:29,316 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.32 vs. limit=15.0 2023-11-27 03:36:31,653 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3708806.6666666665, ans=0.2 2023-11-27 03:36:46,437 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3708940.0, ans=0.125 2023-11-27 03:36:54,324 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 556350 2023-11-27 03:36:57,423 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 3250, loss[loss=0.0466, simple_loss=0.05564, pruned_loss=0.007196, audio_tagging_loss=0.01158, over 15860.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08912, pruned_loss=0.01197, audio_tagging_loss=0.008935, over 3057054.52 frames. ], batch size: 62, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:37:20,888 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.52 vs. limit=15.0 2023-11-27 03:37:30,039 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3709206.6666666665, ans=0.2 2023-11-27 03:37:33,121 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3709206.6666666665, ans=0.125 2023-11-27 03:37:33,351 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3709206.6666666665, ans=0.0 2023-11-27 03:37:35,989 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3709206.6666666665, ans=0.1 2023-11-27 03:37:44,853 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.81 vs. limit=15.0 2023-11-27 03:37:49,585 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 556400 2023-11-27 03:37:52,936 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 3300, loss[loss=0.08105, simple_loss=0.1234, pruned_loss=0.01408, audio_tagging_loss=0.005254, over 14952.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.08854, pruned_loss=0.01178, audio_tagging_loss=0.009022, over 3053387.98 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:37:57,413 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3709340.0, ans=0.1 2023-11-27 03:37:58,821 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.60 vs. limit=15.0 2023-11-27 03:38:01,132 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3709340.0, ans=0.0 2023-11-27 03:38:02,994 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.558e+01 9.106e+01 9.727e+01 1.041e+02 1.146e+02, threshold=1.945e+02, percent-clipped=0.0 2023-11-27 03:38:14,789 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3709473.3333333335, ans=0.0 2023-11-27 03:38:21,747 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3709473.3333333335, ans=0.125 2023-11-27 03:38:30,304 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3709540.0, ans=0.125 2023-11-27 03:38:35,623 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3709540.0, ans=0.125 2023-11-27 03:38:42,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3709606.6666666665, ans=0.125 2023-11-27 03:38:45,559 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 556450 2023-11-27 03:38:49,252 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 3350, loss[loss=0.05464, simple_loss=0.07276, pruned_loss=0.01089, audio_tagging_loss=0.007373, over 14608.00 frames. ], tot_loss[loss=0.0649, simple_loss=0.08834, pruned_loss=0.01185, audio_tagging_loss=0.008883, over 3060142.89 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:38:56,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3709673.3333333335, ans=0.125 2023-11-27 03:39:10,811 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3709806.6666666665, ans=0.125 2023-11-27 03:39:35,421 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3709940.0, ans=0.0 2023-11-27 03:39:37,375 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.80 vs. limit=15.0 2023-11-27 03:39:41,303 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3709940.0, ans=0.5 2023-11-27 03:39:42,788 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 556500 2023-11-27 03:39:44,531 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=19.01 vs. limit=22.5 2023-11-27 03:39:45,906 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 3400, loss[loss=0.07038, simple_loss=0.09994, pruned_loss=0.01502, audio_tagging_loss=0.005383, over 14934.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08887, pruned_loss=0.01188, audio_tagging_loss=0.008769, over 3062895.03 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:39:55,328 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.632e+01 9.012e+01 9.564e+01 1.021e+02 1.293e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-27 03:40:13,833 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.38 vs. limit=22.5 2023-11-27 03:40:18,528 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.24 vs. limit=12.0 2023-11-27 03:40:38,031 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 556550 2023-11-27 03:40:41,076 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 3450, loss[loss=0.08072, simple_loss=0.1013, pruned_loss=0.02105, audio_tagging_loss=0.009037, over 14792.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08971, pruned_loss=0.01202, audio_tagging_loss=0.008688, over 3058756.74 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:40:50,762 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3710406.6666666665, ans=0.125 2023-11-27 03:40:51,926 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-27 03:41:02,388 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3710473.3333333335, ans=0.1 2023-11-27 03:41:04,894 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.22 vs. limit=15.0 2023-11-27 03:41:29,546 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3710606.6666666665, ans=0.125 2023-11-27 03:41:32,656 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 556600 2023-11-27 03:41:35,181 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3710673.3333333335, ans=0.2 2023-11-27 03:41:36,601 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 3500, loss[loss=0.06191, simple_loss=0.09352, pruned_loss=0.008249, audio_tagging_loss=0.006901, over 15269.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08976, pruned_loss=0.01207, audio_tagging_loss=0.008537, over 3063367.40 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:41:47,261 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.834e+01 8.850e+01 9.448e+01 1.007e+02 1.285e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-27 03:42:05,872 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 03:42:06,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3710806.6666666665, ans=0.07 2023-11-27 03:42:19,419 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3710873.3333333335, ans=0.0 2023-11-27 03:42:22,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3710940.0, ans=0.0 2023-11-27 03:42:29,915 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 556650 2023-11-27 03:42:33,579 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 3550, loss[loss=0.07172, simple_loss=0.1056, pruned_loss=0.01429, audio_tagging_loss=0.004625, over 15962.00 frames. ], tot_loss[loss=0.06488, simple_loss=0.0888, pruned_loss=0.01197, audio_tagging_loss=0.008516, over 3059252.95 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:42:57,209 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3711140.0, ans=0.125 2023-11-27 03:43:20,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3711273.3333333335, ans=0.2 2023-11-27 03:43:25,868 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 556700 2023-11-27 03:43:29,005 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 3600, loss[loss=0.06289, simple_loss=0.0821, pruned_loss=0.01324, audio_tagging_loss=0.008596, over 14944.00 frames. ], tot_loss[loss=0.06453, simple_loss=0.08835, pruned_loss=0.01187, audio_tagging_loss=0.008481, over 3059299.26 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:43:32,387 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3711340.0, ans=0.125 2023-11-27 03:43:38,650 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.654e+01 8.614e+01 9.369e+01 1.008e+02 1.433e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-27 03:44:04,996 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3711540.0, ans=0.125 2023-11-27 03:44:20,564 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 556750 2023-11-27 03:44:23,827 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 3650, loss[loss=0.07741, simple_loss=0.09946, pruned_loss=0.02041, audio_tagging_loss=0.007275, over 15146.00 frames. ], tot_loss[loss=0.06466, simple_loss=0.08858, pruned_loss=0.01192, audio_tagging_loss=0.008447, over 3048451.95 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:44:31,511 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3711673.3333333335, ans=0.2 2023-11-27 03:44:40,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3711740.0, ans=0.05 2023-11-27 03:44:43,535 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.96 vs. limit=15.0 2023-11-27 03:44:45,741 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.whiten.whitening_limit, batch_count=3711740.0, ans=12.0 2023-11-27 03:44:46,924 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.89 vs. limit=15.0 2023-11-27 03:45:00,111 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3711873.3333333335, ans=0.2 2023-11-27 03:45:17,589 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 556800 2023-11-27 03:45:20,975 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 3700, loss[loss=0.06714, simple_loss=0.08954, pruned_loss=0.01351, audio_tagging_loss=0.008858, over 15925.00 frames. ], tot_loss[loss=0.06462, simple_loss=0.08865, pruned_loss=0.01187, audio_tagging_loss=0.008422, over 3053775.99 frames. ], batch size: 60, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:45:23,848 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3712006.6666666665, ans=0.1 2023-11-27 03:45:31,046 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.710e+01 9.060e+01 9.619e+01 1.026e+02 1.251e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-27 03:45:36,678 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3712073.3333333335, ans=0.125 2023-11-27 03:45:38,794 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3712073.3333333335, ans=0.0 2023-11-27 03:45:48,400 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3712140.0, ans=0.2 2023-11-27 03:46:02,904 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.35 vs. limit=22.5 2023-11-27 03:46:03,680 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3712206.6666666665, ans=0.0 2023-11-27 03:46:03,712 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3712206.6666666665, ans=0.0 2023-11-27 03:46:13,712 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 556850 2023-11-27 03:46:16,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3712340.0, ans=0.0 2023-11-27 03:46:16,766 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 3750, loss[loss=0.06499, simple_loss=0.09544, pruned_loss=0.008974, audio_tagging_loss=0.008289, over 16491.00 frames. ], tot_loss[loss=0.06447, simple_loss=0.08823, pruned_loss=0.01184, audio_tagging_loss=0.00851, over 3052245.29 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:46:24,315 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3712340.0, ans=0.125 2023-11-27 03:46:38,754 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.91 vs. limit=15.0 2023-11-27 03:46:40,314 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3712473.3333333335, ans=0.0 2023-11-27 03:46:41,709 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.01 vs. limit=15.0 2023-11-27 03:46:46,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3712473.3333333335, ans=0.1 2023-11-27 03:46:46,504 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.63 vs. limit=10.0 2023-11-27 03:46:54,702 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.out_whiten.whitening_limit, batch_count=3712540.0, ans=8.0 2023-11-27 03:46:54,975 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 03:47:04,787 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3712606.6666666665, ans=0.95 2023-11-27 03:47:05,180 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.53 vs. limit=10.0 2023-11-27 03:47:08,945 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 556900 2023-11-27 03:47:10,095 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3712606.6666666665, ans=0.125 2023-11-27 03:47:12,066 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 3800, loss[loss=0.05126, simple_loss=0.07027, pruned_loss=0.005829, audio_tagging_loss=0.0103, over 15286.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08909, pruned_loss=0.01212, audio_tagging_loss=0.008527, over 3053783.74 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:47:22,981 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3712740.0, ans=0.125 2023-11-27 03:47:24,836 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.007e+01 9.077e+01 9.693e+01 1.049e+02 1.287e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-27 03:47:54,498 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3712873.3333333335, ans=0.0 2023-11-27 03:48:05,197 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 556950 2023-11-27 03:48:08,320 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 3850, loss[loss=0.06734, simple_loss=0.09734, pruned_loss=0.01238, audio_tagging_loss=0.006287, over 14496.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08975, pruned_loss=0.01226, audio_tagging_loss=0.008612, over 3054513.45 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 8.0 2023-11-27 03:48:14,884 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3713006.6666666665, ans=0.125 2023-11-27 03:48:32,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3713140.0, ans=0.1 2023-11-27 03:48:36,080 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3713140.0, ans=0.125 2023-11-27 03:48:40,421 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3713206.6666666665, ans=0.1 2023-11-27 03:48:52,245 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3713273.3333333335, ans=0.125 2023-11-27 03:48:53,845 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3713273.3333333335, ans=0.0 2023-11-27 03:49:01,539 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 557000 2023-11-27 03:49:05,021 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 3900, loss[loss=0.07832, simple_loss=0.1058, pruned_loss=0.01549, audio_tagging_loss=0.009925, over 14910.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08942, pruned_loss=0.01218, audio_tagging_loss=0.008697, over 3046682.50 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 8.0 2023-11-27 03:49:12,667 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3713340.0, ans=0.2 2023-11-27 03:49:13,637 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3713340.0, ans=0.125 2023-11-27 03:49:16,693 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.780e+01 9.076e+01 9.575e+01 1.020e+02 1.197e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-27 03:49:17,872 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3713406.6666666665, ans=0.125 2023-11-27 03:49:29,669 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3713473.3333333335, ans=0.125 2023-11-27 03:49:44,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3713540.0, ans=10.0 2023-11-27 03:49:49,830 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3713606.6666666665, ans=0.125 2023-11-27 03:49:57,119 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 557050 2023-11-27 03:50:00,207 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 3950, loss[loss=0.07856, simple_loss=0.1052, pruned_loss=0.01791, audio_tagging_loss=0.008038, over 15977.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08881, pruned_loss=0.01217, audio_tagging_loss=0.00887, over 3054438.53 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 8.0 2023-11-27 03:50:15,111 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3713740.0, ans=0.07 2023-11-27 03:50:41,355 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3713873.3333333335, ans=0.125 2023-11-27 03:50:52,438 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 557100 2023-11-27 03:50:56,122 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 4000, loss[loss=0.05621, simple_loss=0.0773, pruned_loss=0.0108, audio_tagging_loss=0.00675, over 16141.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08908, pruned_loss=0.01211, audio_tagging_loss=0.008868, over 3052929.30 frames. ], batch size: 60, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:51:01,483 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.30 vs. limit=22.5 2023-11-27 03:51:01,565 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.52 vs. limit=6.0 2023-11-27 03:51:08,818 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.068e+01 9.327e+01 9.767e+01 1.033e+02 1.414e+02, threshold=1.953e+02, percent-clipped=0.0 2023-11-27 03:51:44,339 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3714273.3333333335, ans=0.125 2023-11-27 03:51:44,818 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.95 vs. limit=15.0 2023-11-27 03:51:48,486 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 557150 2023-11-27 03:51:52,053 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 4050, loss[loss=0.07496, simple_loss=0.0976, pruned_loss=0.01719, audio_tagging_loss=0.008969, over 14690.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.08989, pruned_loss=0.01225, audio_tagging_loss=0.008885, over 3051437.19 frames. ], batch size: 53, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:51:54,229 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 03:51:58,666 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3714340.0, ans=0.0 2023-11-27 03:52:08,240 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3714406.6666666665, ans=0.125 2023-11-27 03:52:17,132 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3714473.3333333335, ans=0.125 2023-11-27 03:52:28,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3714540.0, ans=0.125 2023-11-27 03:52:44,079 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 557200 2023-11-27 03:52:47,541 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 4100, loss[loss=0.07338, simple_loss=0.1035, pruned_loss=0.01265, audio_tagging_loss=0.008969, over 15629.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.09026, pruned_loss=0.01214, audio_tagging_loss=0.008857, over 3051931.73 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:52:49,396 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn2.whiten.whitening_limit, batch_count=3714673.3333333335, ans=22.5 2023-11-27 03:52:52,008 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3714673.3333333335, ans=0.125 2023-11-27 03:52:59,707 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.067e+01 9.062e+01 9.739e+01 1.030e+02 1.331e+02, threshold=1.948e+02, percent-clipped=0.0 2023-11-27 03:53:09,806 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3714806.6666666665, ans=0.0 2023-11-27 03:53:10,020 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.65 vs. limit=6.0 2023-11-27 03:53:13,028 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3714806.6666666665, ans=0.125 2023-11-27 03:53:26,387 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3714873.3333333335, ans=0.0 2023-11-27 03:53:40,363 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 557250 2023-11-27 03:53:43,488 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 4150, loss[loss=0.06878, simple_loss=0.09597, pruned_loss=0.01345, audio_tagging_loss=0.007341, over 14981.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08959, pruned_loss=0.0119, audio_tagging_loss=0.008677, over 3050076.79 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:53:53,306 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3715006.6666666665, ans=0.125 2023-11-27 03:54:17,764 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.43 vs. limit=12.0 2023-11-27 03:54:24,028 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 03:54:24,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3715206.6666666665, ans=0.125 2023-11-27 03:54:29,043 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3715273.3333333335, ans=0.125 2023-11-27 03:54:36,716 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 557300 2023-11-27 03:54:38,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3715273.3333333335, ans=0.125 2023-11-27 03:54:39,857 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 4200, loss[loss=0.05343, simple_loss=0.07941, pruned_loss=0.006885, audio_tagging_loss=0.006838, over 14600.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08955, pruned_loss=0.01187, audio_tagging_loss=0.008564, over 3048365.40 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:54:44,270 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3715340.0, ans=0.125 2023-11-27 03:54:51,931 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.295e+01 8.957e+01 9.619e+01 1.045e+02 2.364e+02, threshold=1.924e+02, percent-clipped=1.0 2023-11-27 03:55:02,905 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3715473.3333333335, ans=0.125 2023-11-27 03:55:14,684 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3715540.0, ans=0.1 2023-11-27 03:55:26,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3715606.6666666665, ans=0.0 2023-11-27 03:55:26,991 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3715606.6666666665, ans=0.125 2023-11-27 03:55:32,778 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 557350 2023-11-27 03:55:34,968 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3715673.3333333335, ans=0.2 2023-11-27 03:55:35,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3715673.3333333335, ans=0.1 2023-11-27 03:55:35,910 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 4250, loss[loss=0.04888, simple_loss=0.06358, pruned_loss=0.006596, audio_tagging_loss=0.0105, over 15122.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.09048, pruned_loss=0.01185, audio_tagging_loss=0.008464, over 3048282.38 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:55:43,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3715673.3333333335, ans=0.125 2023-11-27 03:55:44,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3715673.3333333335, ans=0.125 2023-11-27 03:55:48,577 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3715740.0, ans=0.125 2023-11-27 03:56:02,495 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3715806.6666666665, ans=0.2 2023-11-27 03:56:02,511 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3715806.6666666665, ans=0.0 2023-11-27 03:56:03,763 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.48 vs. limit=22.5 2023-11-27 03:56:05,797 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.84 vs. limit=15.0 2023-11-27 03:56:07,654 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3715806.6666666665, ans=0.5 2023-11-27 03:56:08,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3715873.3333333335, ans=0.0 2023-11-27 03:56:20,974 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3715940.0, ans=0.04949747468305833 2023-11-27 03:56:28,284 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 557400 2023-11-27 03:56:31,993 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 4300, loss[loss=0.05558, simple_loss=0.06896, pruned_loss=0.01031, audio_tagging_loss=0.01079, over 17248.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.09074, pruned_loss=0.01212, audio_tagging_loss=0.008454, over 3047646.76 frames. ], batch size: 65, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:56:32,196 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3716006.6666666665, ans=0.125 2023-11-27 03:56:44,663 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.985e+01 9.063e+01 9.742e+01 1.048e+02 1.434e+02, threshold=1.948e+02, percent-clipped=0.0 2023-11-27 03:56:46,929 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3716073.3333333335, ans=0.1 2023-11-27 03:56:52,050 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3716073.3333333335, ans=0.125 2023-11-27 03:57:08,607 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3716206.6666666665, ans=0.125 2023-11-27 03:57:13,293 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3716206.6666666665, ans=0.2 2023-11-27 03:57:16,577 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3716273.3333333335, ans=0.125 2023-11-27 03:57:24,955 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 557450 2023-11-27 03:57:28,058 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 4350, loss[loss=0.0771, simple_loss=0.1067, pruned_loss=0.01652, audio_tagging_loss=0.00723, over 15321.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.09079, pruned_loss=0.01221, audio_tagging_loss=0.008371, over 3049144.59 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 03:57:40,008 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3716406.6666666665, ans=0.0 2023-11-27 03:57:48,158 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.36 vs. limit=15.0 2023-11-27 03:58:03,995 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.36 vs. limit=15.0 2023-11-27 03:58:20,051 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 557500 2023-11-27 03:58:23,189 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 4400, loss[loss=0.07712, simple_loss=0.1073, pruned_loss=0.01682, audio_tagging_loss=0.006636, over 14998.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.0904, pruned_loss=0.01228, audio_tagging_loss=0.008428, over 3047828.81 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:58:35,347 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.054e+01 9.094e+01 9.740e+01 1.025e+02 1.251e+02, threshold=1.948e+02, percent-clipped=0.0 2023-11-27 03:58:49,891 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3716806.6666666665, ans=0.125 2023-11-27 03:59:01,423 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3716873.3333333335, ans=0.125 2023-11-27 03:59:15,452 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 557550 2023-11-27 03:59:18,573 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 4450, loss[loss=0.07062, simple_loss=0.1033, pruned_loss=0.013, audio_tagging_loss=0.005987, over 15453.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.09135, pruned_loss=0.01237, audio_tagging_loss=0.00837, over 3052387.16 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 03:59:33,020 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 03:59:41,521 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3717140.0, ans=0.07 2023-11-27 03:59:42,533 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 04:00:03,606 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3717273.3333333335, ans=0.125 2023-11-27 04:00:11,613 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 557600 2023-11-27 04:00:15,473 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 4500, loss[loss=0.07766, simple_loss=0.1112, pruned_loss=0.01497, audio_tagging_loss=0.00709, over 15439.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.09135, pruned_loss=0.0124, audio_tagging_loss=0.008355, over 3050930.35 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 04:00:27,191 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.280e+01 9.128e+01 9.724e+01 1.026e+02 1.221e+02, threshold=1.945e+02, percent-clipped=0.0 2023-11-27 04:00:51,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3717540.0, ans=0.125 2023-11-27 04:00:52,535 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3717540.0, ans=0.125 2023-11-27 04:01:03,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3717606.6666666665, ans=0.125 2023-11-27 04:01:04,822 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3717606.6666666665, ans=0.1 2023-11-27 04:01:07,861 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 557650 2023-11-27 04:01:10,112 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3717673.3333333335, ans=0.1 2023-11-27 04:01:11,017 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 4550, loss[loss=0.05868, simple_loss=0.07977, pruned_loss=0.0114, audio_tagging_loss=0.007395, over 15782.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.09049, pruned_loss=0.01227, audio_tagging_loss=0.008379, over 3052796.41 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 04:01:21,819 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3717740.0, ans=0.125 2023-11-27 04:01:53,958 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 04:01:57,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3717940.0, ans=0.125 2023-11-27 04:01:59,017 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.85 vs. limit=15.0 2023-11-27 04:02:02,864 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 04:02:03,750 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 557700 2023-11-27 04:02:03,814 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3717940.0, ans=0.1 2023-11-27 04:02:07,393 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 4600, loss[loss=0.06516, simple_loss=0.08619, pruned_loss=0.01395, audio_tagging_loss=0.008112, over 14982.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08956, pruned_loss=0.0121, audio_tagging_loss=0.008511, over 3056081.86 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 04:02:12,490 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3718006.6666666665, ans=0.0 2023-11-27 04:02:20,756 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.735e+01 9.058e+01 9.556e+01 1.027e+02 1.489e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-27 04:02:20,984 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3718073.3333333335, ans=0.2 2023-11-27 04:02:38,120 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3718140.0, ans=0.125 2023-11-27 04:02:45,447 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3718206.6666666665, ans=0.125 2023-11-27 04:02:45,977 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.45 vs. limit=15.0 2023-11-27 04:02:55,861 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3718273.3333333335, ans=0.2 2023-11-27 04:03:00,550 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 557750 2023-11-27 04:03:04,147 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 4650, loss[loss=0.08269, simple_loss=0.1137, pruned_loss=0.01836, audio_tagging_loss=0.007491, over 14859.00 frames. ], tot_loss[loss=0.06444, simple_loss=0.08781, pruned_loss=0.0119, audio_tagging_loss=0.008637, over 3050625.17 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 04:03:14,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3718406.6666666665, ans=0.1 2023-11-27 04:03:22,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3718406.6666666665, ans=0.125 2023-11-27 04:03:30,085 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3718473.3333333335, ans=0.0 2023-11-27 04:03:48,381 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.41 vs. limit=22.5 2023-11-27 04:03:56,138 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 557800 2023-11-27 04:03:56,822 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.06 vs. limit=10.0 2023-11-27 04:03:57,386 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3718606.6666666665, ans=0.2 2023-11-27 04:03:59,577 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 4700, loss[loss=0.0511, simple_loss=0.07244, pruned_loss=0.007534, audio_tagging_loss=0.007349, over 16506.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08909, pruned_loss=0.01202, audio_tagging_loss=0.008607, over 3050187.74 frames. ], batch size: 60, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:04:08,840 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.96 vs. limit=15.0 2023-11-27 04:04:12,170 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.864e+01 9.072e+01 9.943e+01 1.043e+02 1.382e+02, threshold=1.989e+02, percent-clipped=0.0 2023-11-27 04:04:14,683 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3718740.0, ans=0.1 2023-11-27 04:04:15,592 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3718740.0, ans=0.125 2023-11-27 04:04:27,931 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3718806.6666666665, ans=0.125 2023-11-27 04:04:38,983 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.97 vs. limit=15.0 2023-11-27 04:04:44,883 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3718940.0, ans=0.125 2023-11-27 04:04:51,001 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 557850 2023-11-27 04:04:53,309 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3719006.6666666665, ans=0.125 2023-11-27 04:04:54,123 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 4750, loss[loss=0.09142, simple_loss=0.1307, pruned_loss=0.01886, audio_tagging_loss=0.007239, over 17189.00 frames. ], tot_loss[loss=0.06472, simple_loss=0.0881, pruned_loss=0.0119, audio_tagging_loss=0.008766, over 3047393.96 frames. ], batch size: 61, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:05:00,336 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3719006.6666666665, ans=0.125 2023-11-27 04:05:23,892 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3719140.0, ans=0.0 2023-11-27 04:05:47,630 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 557900 2023-11-27 04:05:50,733 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 4800, loss[loss=0.06957, simple_loss=0.1003, pruned_loss=0.01163, audio_tagging_loss=0.007782, over 14738.00 frames. ], tot_loss[loss=0.06489, simple_loss=0.08849, pruned_loss=0.01181, audio_tagging_loss=0.008831, over 3051912.96 frames. ], batch size: 53, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 04:05:59,933 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3719340.0, ans=0.125 2023-11-27 04:06:05,053 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.356e+01 9.052e+01 9.526e+01 1.032e+02 1.738e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-27 04:06:05,876 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.87 vs. limit=22.5 2023-11-27 04:06:06,393 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3719406.6666666665, ans=0.125 2023-11-27 04:06:33,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3719540.0, ans=10.0 2023-11-27 04:06:38,492 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.43 vs. limit=22.5 2023-11-27 04:06:43,259 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 557950 2023-11-27 04:06:46,390 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 4850, loss[loss=0.05859, simple_loss=0.07578, pruned_loss=0.009418, audio_tagging_loss=0.01128, over 15829.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08922, pruned_loss=0.01185, audio_tagging_loss=0.00888, over 3049276.96 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:07:18,989 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3719873.3333333335, ans=0.125 2023-11-27 04:07:29,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3719940.0, ans=0.1 2023-11-27 04:07:31,871 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3719940.0, ans=0.125 2023-11-27 04:07:37,135 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3719940.0, ans=0.1 2023-11-27 04:07:38,010 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 558000 2023-11-27 04:07:41,444 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 4900, loss[loss=0.07305, simple_loss=0.1001, pruned_loss=0.01346, audio_tagging_loss=0.009548, over 15095.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08943, pruned_loss=0.01185, audio_tagging_loss=0.008844, over 3045848.73 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:07:56,185 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.78 vs. limit=22.5 2023-11-27 04:07:56,762 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.179e+01 8.861e+01 9.533e+01 1.009e+02 1.253e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-27 04:08:03,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3720140.0, ans=0.125 2023-11-27 04:08:33,870 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 558050 2023-11-27 04:08:37,500 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 4950, loss[loss=0.08342, simple_loss=0.1126, pruned_loss=0.01921, audio_tagging_loss=0.007917, over 15905.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.0895, pruned_loss=0.01194, audio_tagging_loss=0.008662, over 3043727.45 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:08:45,604 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2023-11-27 04:08:48,315 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3720406.6666666665, ans=0.125 2023-11-27 04:08:50,348 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3720406.6666666665, ans=0.0 2023-11-27 04:09:21,506 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3720606.6666666665, ans=0.125 2023-11-27 04:09:31,022 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 558100 2023-11-27 04:09:34,158 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 5000, loss[loss=0.06584, simple_loss=0.09313, pruned_loss=0.01199, audio_tagging_loss=0.007288, over 15416.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08974, pruned_loss=0.01201, audio_tagging_loss=0.008511, over 3042967.57 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:09:47,971 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.742e+01 8.780e+01 9.541e+01 1.005e+02 1.452e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-27 04:09:53,463 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3720740.0, ans=0.125 2023-11-27 04:10:18,742 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3720940.0, ans=0.125 2023-11-27 04:10:25,994 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 558150 2023-11-27 04:10:28,662 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.89 vs. limit=15.0 2023-11-27 04:10:29,095 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 5050, loss[loss=0.05776, simple_loss=0.07318, pruned_loss=0.01327, audio_tagging_loss=0.007897, over 14832.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08958, pruned_loss=0.01202, audio_tagging_loss=0.008456, over 3037113.63 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:10:29,297 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3721006.6666666665, ans=0.125 2023-11-27 04:10:58,041 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3721140.0, ans=0.125 2023-11-27 04:11:01,284 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 04:11:07,921 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.67 vs. limit=15.0 2023-11-27 04:11:21,958 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 558200 2023-11-27 04:11:25,313 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 5100, loss[loss=0.07801, simple_loss=0.1155, pruned_loss=0.01482, audio_tagging_loss=0.005435, over 15553.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08993, pruned_loss=0.01218, audio_tagging_loss=0.008391, over 3047327.38 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:11:35,672 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3721340.0, ans=0.2 2023-11-27 04:11:40,666 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.478e+01 8.877e+01 9.521e+01 1.045e+02 1.362e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-27 04:11:47,050 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3721473.3333333335, ans=0.015 2023-11-27 04:12:04,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3721540.0, ans=0.0 2023-11-27 04:12:12,760 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3721606.6666666665, ans=0.035 2023-11-27 04:12:19,056 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 558250 2023-11-27 04:12:22,228 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 5150, loss[loss=0.09105, simple_loss=0.1294, pruned_loss=0.02056, audio_tagging_loss=0.005802, over 15481.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.0903, pruned_loss=0.01233, audio_tagging_loss=0.008395, over 3039753.20 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:12:42,525 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3721806.6666666665, ans=0.0 2023-11-27 04:12:50,392 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3721806.6666666665, ans=0.09899494936611666 2023-11-27 04:13:09,090 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3721940.0, ans=0.125 2023-11-27 04:13:14,252 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 558300 2023-11-27 04:13:17,310 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 5200, loss[loss=0.06803, simple_loss=0.09206, pruned_loss=0.01321, audio_tagging_loss=0.008788, over 14981.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.09045, pruned_loss=0.01226, audio_tagging_loss=0.008322, over 3035168.45 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 04:13:29,947 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.14 vs. limit=15.0 2023-11-27 04:13:31,586 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.764e+01 8.871e+01 9.548e+01 1.019e+02 1.156e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-27 04:13:34,912 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.81 vs. limit=15.0 2023-11-27 04:13:41,460 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3722140.0, ans=0.125 2023-11-27 04:13:42,762 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.78 vs. limit=15.0 2023-11-27 04:14:01,377 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=3722273.3333333335, ans=15.0 2023-11-27 04:14:09,361 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 558350 2023-11-27 04:14:13,079 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 5250, loss[loss=0.09209, simple_loss=0.1205, pruned_loss=0.0246, audio_tagging_loss=0.007248, over 15439.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.09101, pruned_loss=0.0123, audio_tagging_loss=0.008338, over 3039277.32 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 04:14:15,451 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3722340.0, ans=0.0 2023-11-27 04:14:34,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3722406.6666666665, ans=0.125 2023-11-27 04:14:35,316 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2023-11-27 04:14:57,684 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3722606.6666666665, ans=0.0 2023-11-27 04:15:02,712 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.96 vs. limit=15.0 2023-11-27 04:15:05,997 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 558400 2023-11-27 04:15:09,435 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 5300, loss[loss=0.06683, simple_loss=0.09171, pruned_loss=0.0123, audio_tagging_loss=0.008682, over 13815.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.09035, pruned_loss=0.01217, audio_tagging_loss=0.008427, over 3040498.74 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:15:15,492 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3722673.3333333335, ans=0.0 2023-11-27 04:15:20,853 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3722740.0, ans=0.125 2023-11-27 04:15:23,379 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.31 vs. limit=15.0 2023-11-27 04:15:24,823 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.425e+01 9.122e+01 9.674e+01 1.051e+02 1.467e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-27 04:15:25,172 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3722740.0, ans=0.0 2023-11-27 04:15:43,393 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3722873.3333333335, ans=0.125 2023-11-27 04:15:45,309 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3722873.3333333335, ans=0.0 2023-11-27 04:16:01,257 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3722940.0, ans=0.125 2023-11-27 04:16:02,268 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 558450 2023-11-27 04:16:05,438 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 5350, loss[loss=0.05815, simple_loss=0.07723, pruned_loss=0.01, audio_tagging_loss=0.009527, over 14996.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.09005, pruned_loss=0.01203, audio_tagging_loss=0.008498, over 3041144.55 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:16:09,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3723006.6666666665, ans=10.0 2023-11-27 04:16:10,323 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.80 vs. limit=15.0 2023-11-27 04:16:33,602 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3723140.0, ans=0.125 2023-11-27 04:16:55,401 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3723273.3333333335, ans=0.125 2023-11-27 04:16:57,324 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 558500 2023-11-27 04:17:00,458 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 5400, loss[loss=0.07344, simple_loss=0.102, pruned_loss=0.0136, audio_tagging_loss=0.008828, over 14726.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.09016, pruned_loss=0.01207, audio_tagging_loss=0.008491, over 3036415.24 frames. ], batch size: 52, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:17:01,755 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3723340.0, ans=0.0 2023-11-27 04:17:11,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3723406.6666666665, ans=0.0 2023-11-27 04:17:16,916 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.704e+01 9.000e+01 9.552e+01 1.029e+02 2.043e+02, threshold=1.910e+02, percent-clipped=1.0 2023-11-27 04:17:20,300 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3723406.6666666665, ans=0.125 2023-11-27 04:17:22,588 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.36 vs. limit=15.0 2023-11-27 04:17:32,755 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.29 vs. limit=22.5 2023-11-27 04:17:46,147 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.37 vs. limit=22.5 2023-11-27 04:17:53,674 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 558550 2023-11-27 04:17:54,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3723606.6666666665, ans=0.0 2023-11-27 04:17:57,345 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 5450, loss[loss=0.05856, simple_loss=0.07665, pruned_loss=0.01004, audio_tagging_loss=0.01019, over 15420.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08984, pruned_loss=0.01198, audio_tagging_loss=0.008559, over 3037074.87 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:18:31,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3723873.3333333335, ans=0.125 2023-11-27 04:18:33,933 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3723873.3333333335, ans=0.07 2023-11-27 04:18:41,874 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3723940.0, ans=0.1 2023-11-27 04:18:49,210 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 558600 2023-11-27 04:18:50,463 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3723940.0, ans=0.0 2023-11-27 04:18:50,500 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3723940.0, ans=0.1 2023-11-27 04:18:51,712 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3724006.6666666665, ans=0.125 2023-11-27 04:18:52,634 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 5500, loss[loss=0.06783, simple_loss=0.09255, pruned_loss=0.01358, audio_tagging_loss=0.007978, over 14685.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08968, pruned_loss=0.01191, audio_tagging_loss=0.008593, over 3036654.10 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:18:56,647 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.68 vs. limit=15.0 2023-11-27 04:19:07,815 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.611e+01 9.127e+01 9.786e+01 1.057e+02 1.357e+02, threshold=1.957e+02, percent-clipped=0.0 2023-11-27 04:19:23,546 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3724140.0, ans=0.125 2023-11-27 04:19:45,219 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 558650 2023-11-27 04:19:48,351 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 5550, loss[loss=0.06674, simple_loss=0.0773, pruned_loss=0.01603, audio_tagging_loss=0.01206, over 14917.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08932, pruned_loss=0.01181, audio_tagging_loss=0.008743, over 3043449.12 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:19:58,161 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3724340.0, ans=0.5 2023-11-27 04:20:09,332 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3724406.6666666665, ans=0.125 2023-11-27 04:20:13,494 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3724473.3333333335, ans=0.04949747468305833 2023-11-27 04:20:21,952 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.08 vs. limit=22.5 2023-11-27 04:20:37,993 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3724606.6666666665, ans=0.1 2023-11-27 04:20:40,098 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.74 vs. limit=22.5 2023-11-27 04:20:41,513 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 558700 2023-11-27 04:20:44,633 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 5600, loss[loss=0.06905, simple_loss=0.09927, pruned_loss=0.01222, audio_tagging_loss=0.0072, over 15563.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.09035, pruned_loss=0.012, audio_tagging_loss=0.00885, over 3047829.23 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 04:21:00,061 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.072e+01 9.029e+01 9.775e+01 1.051e+02 1.247e+02, threshold=1.955e+02, percent-clipped=0.0 2023-11-27 04:21:03,425 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3724740.0, ans=0.0 2023-11-27 04:21:09,232 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3724806.6666666665, ans=0.125 2023-11-27 04:21:23,804 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 04:21:29,830 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3724940.0, ans=0.1 2023-11-27 04:21:32,048 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3724940.0, ans=0.125 2023-11-27 04:21:37,123 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 558750 2023-11-27 04:21:37,321 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3724940.0, ans=0.1 2023-11-27 04:21:40,271 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 5650, loss[loss=0.0838, simple_loss=0.1105, pruned_loss=0.01755, audio_tagging_loss=0.011, over 14521.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.08984, pruned_loss=0.01201, audio_tagging_loss=0.008976, over 3050679.62 frames. ], batch size: 53, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:21:40,521 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-27 04:21:41,439 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3725006.6666666665, ans=0.125 2023-11-27 04:22:19,745 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3725206.6666666665, ans=0.0 2023-11-27 04:22:33,014 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 558800 2023-11-27 04:22:36,431 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 5700, loss[loss=0.06052, simple_loss=0.07953, pruned_loss=0.01324, audio_tagging_loss=0.007514, over 15090.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08986, pruned_loss=0.01194, audio_tagging_loss=0.009041, over 3047831.79 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:22:53,741 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.360e+01 9.100e+01 9.627e+01 1.012e+02 1.597e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-27 04:23:06,580 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3725473.3333333335, ans=0.0 2023-11-27 04:23:16,099 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 04:23:28,769 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 558850 2023-11-27 04:23:32,508 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 5750, loss[loss=0.07785, simple_loss=0.1126, pruned_loss=0.01465, audio_tagging_loss=0.006879, over 15157.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08907, pruned_loss=0.01191, audio_tagging_loss=0.008825, over 3043070.44 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:23:32,724 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3725673.3333333335, ans=0.125 2023-11-27 04:23:33,245 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.51 vs. limit=6.0 2023-11-27 04:23:39,677 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3725673.3333333335, ans=0.125 2023-11-27 04:23:39,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3725673.3333333335, ans=0.1 2023-11-27 04:23:59,935 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3725806.6666666665, ans=0.0 2023-11-27 04:24:08,594 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.77 vs. limit=15.0 2023-11-27 04:24:25,215 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 558900 2023-11-27 04:24:28,363 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 5800, loss[loss=0.06593, simple_loss=0.08688, pruned_loss=0.01264, audio_tagging_loss=0.009848, over 15303.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08917, pruned_loss=0.01197, audio_tagging_loss=0.008699, over 3047608.38 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:24:30,012 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=3726006.6666666665, ans=15.0 2023-11-27 04:24:31,662 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3726006.6666666665, ans=0.0 2023-11-27 04:24:32,792 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3726006.6666666665, ans=0.125 2023-11-27 04:24:44,055 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.198e+01 8.898e+01 9.503e+01 1.014e+02 1.698e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-27 04:25:05,547 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3726206.6666666665, ans=0.125 2023-11-27 04:25:13,131 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3726273.3333333335, ans=0.2 2023-11-27 04:25:20,303 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 558950 2023-11-27 04:25:23,511 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 5850, loss[loss=0.07797, simple_loss=0.1133, pruned_loss=0.01636, audio_tagging_loss=0.004974, over 15186.00 frames. ], tot_loss[loss=0.0652, simple_loss=0.08908, pruned_loss=0.01194, audio_tagging_loss=0.00872, over 3052166.32 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:25:31,672 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3726340.0, ans=0.2 2023-11-27 04:25:36,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3726406.6666666665, ans=0.07 2023-11-27 04:25:57,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3726540.0, ans=0.125 2023-11-27 04:26:14,202 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3726606.6666666665, ans=0.2 2023-11-27 04:26:16,688 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 559000 2023-11-27 04:26:20,681 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 5900, loss[loss=0.07097, simple_loss=0.0996, pruned_loss=0.01497, audio_tagging_loss=0.006199, over 14216.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08973, pruned_loss=0.01188, audio_tagging_loss=0.008605, over 3050153.27 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:26:21,119 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.68 vs. limit=22.5 2023-11-27 04:26:37,158 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.616e+01 9.105e+01 9.736e+01 1.055e+02 1.471e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-27 04:26:37,409 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3726740.0, ans=0.0 2023-11-27 04:26:48,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3726806.6666666665, ans=0.0 2023-11-27 04:27:00,527 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3726873.3333333335, ans=0.04949747468305833 2023-11-27 04:27:13,046 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 559050 2023-11-27 04:27:16,211 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 5950, loss[loss=0.07383, simple_loss=0.1029, pruned_loss=0.01446, audio_tagging_loss=0.007934, over 15996.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.09066, pruned_loss=0.01202, audio_tagging_loss=0.008547, over 3056265.06 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:27:27,247 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.76 vs. limit=15.0 2023-11-27 04:27:28,178 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3727073.3333333335, ans=0.125 2023-11-27 04:27:50,922 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3727206.6666666665, ans=0.125 2023-11-27 04:27:50,954 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3727206.6666666665, ans=0.125 2023-11-27 04:28:07,741 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 559100 2023-11-27 04:28:10,858 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 6000, loss[loss=0.04803, simple_loss=0.05971, pruned_loss=0.008056, audio_tagging_loss=0.01011, over 14578.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.09019, pruned_loss=0.01204, audio_tagging_loss=0.00849, over 3050262.66 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 04:28:10,859 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-27 04:28:28,881 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.4650, 3.0234, 3.3171, 2.9879, 3.7067, 3.7683, 3.2156, 3.1800], device='cuda:3') 2023-11-27 04:28:43,421 INFO [train_asr.py:1267] (3/4) Epoch 47, validation: loss=0.05733, simple_loss=0.05048, pruned_loss=0.005338, audio_tagging_loss=0.02675, over 4681554.00 frames. 2023-11-27 04:28:43,422 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-27 04:28:59,632 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.659e+01 9.039e+01 9.599e+01 1.058e+02 1.819e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-27 04:28:59,932 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 04:29:05,128 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3727473.3333333335, ans=0.0 2023-11-27 04:29:21,480 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 04:29:35,852 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 559150 2023-11-27 04:29:39,050 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 6050, loss[loss=0.07399, simple_loss=0.1035, pruned_loss=0.01041, audio_tagging_loss=0.01181, over 15531.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.09075, pruned_loss=0.01221, audio_tagging_loss=0.008522, over 3055776.64 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 32.0 2023-11-27 04:29:47,816 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3727673.3333333335, ans=0.0 2023-11-27 04:29:49,256 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.34 vs. limit=6.0 2023-11-27 04:29:55,389 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 04:30:31,211 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 559200 2023-11-27 04:30:33,816 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3728006.6666666665, ans=0.2 2023-11-27 04:30:34,601 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 6100, loss[loss=0.08628, simple_loss=0.1204, pruned_loss=0.01817, audio_tagging_loss=0.007895, over 15402.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.09034, pruned_loss=0.01201, audio_tagging_loss=0.008523, over 3046749.27 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:30:53,479 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.728e+01 8.839e+01 9.378e+01 9.986e+01 1.390e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-27 04:30:53,760 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3728073.3333333335, ans=0.125 2023-11-27 04:30:58,893 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3728140.0, ans=0.1 2023-11-27 04:31:05,448 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.41 vs. limit=22.5 2023-11-27 04:31:07,346 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3728206.6666666665, ans=0.125 2023-11-27 04:31:17,769 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3728273.3333333335, ans=0.2 2023-11-27 04:31:26,135 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 559250 2023-11-27 04:31:27,304 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3728273.3333333335, ans=0.0 2023-11-27 04:31:28,910 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3728340.0, ans=0.125 2023-11-27 04:31:30,232 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 6150, loss[loss=0.06724, simple_loss=0.08563, pruned_loss=0.01211, audio_tagging_loss=0.01232, over 15284.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.09035, pruned_loss=0.01182, audio_tagging_loss=0.008585, over 3047609.71 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:32:02,039 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3728540.0, ans=0.125 2023-11-27 04:32:22,918 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 559300 2023-11-27 04:32:26,078 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 6200, loss[loss=0.07544, simple_loss=0.1153, pruned_loss=0.01119, audio_tagging_loss=0.006605, over 14738.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08998, pruned_loss=0.01184, audio_tagging_loss=0.008621, over 3050332.82 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:32:27,612 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.39 vs. limit=15.0 2023-11-27 04:32:41,049 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3728740.0, ans=0.07 2023-11-27 04:32:41,111 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 04:32:42,989 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.527e+01 8.961e+01 9.532e+01 1.029e+02 1.294e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-27 04:32:58,527 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3728873.3333333335, ans=0.0 2023-11-27 04:33:00,684 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3728873.3333333335, ans=0.125 2023-11-27 04:33:17,036 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3728940.0, ans=0.125 2023-11-27 04:33:17,910 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 559350 2023-11-27 04:33:21,002 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 6250, loss[loss=0.08221, simple_loss=0.1042, pruned_loss=0.02474, audio_tagging_loss=0.005386, over 14325.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.09013, pruned_loss=0.01193, audio_tagging_loss=0.008702, over 3052498.86 frames. ], batch size: 53, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:33:24,640 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.83 vs. limit=12.0 2023-11-27 04:33:38,740 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.45 vs. limit=12.0 2023-11-27 04:33:58,972 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.69 vs. limit=15.0 2023-11-27 04:33:59,698 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3729206.6666666665, ans=0.2 2023-11-27 04:34:08,386 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.10 vs. limit=12.0 2023-11-27 04:34:11,136 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3729273.3333333335, ans=0.125 2023-11-27 04:34:13,110 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 559400 2023-11-27 04:34:16,465 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 6300, loss[loss=0.07217, simple_loss=0.1018, pruned_loss=0.01184, audio_tagging_loss=0.009439, over 15352.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.09037, pruned_loss=0.01198, audio_tagging_loss=0.008885, over 3053255.88 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:34:18,322 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3729340.0, ans=0.2 2023-11-27 04:34:28,763 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3729406.6666666665, ans=0.2 2023-11-27 04:34:32,507 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3729406.6666666665, ans=0.1 2023-11-27 04:34:35,449 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.760e+01 8.982e+01 9.657e+01 1.038e+02 1.298e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-27 04:34:55,182 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.53 vs. limit=12.0 2023-11-27 04:35:10,065 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 559450 2023-11-27 04:35:13,206 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 6350, loss[loss=0.09105, simple_loss=0.1136, pruned_loss=0.0254, audio_tagging_loss=0.00887, over 14828.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.08969, pruned_loss=0.012, audio_tagging_loss=0.008972, over 3055346.34 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-27 04:35:17,124 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3729673.3333333335, ans=0.2 2023-11-27 04:35:21,208 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3729673.3333333335, ans=0.125 2023-11-27 04:35:25,637 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.75 vs. limit=15.0 2023-11-27 04:35:35,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3729806.6666666665, ans=0.1 2023-11-27 04:35:43,499 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3729806.6666666665, ans=0.125 2023-11-27 04:35:52,637 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.12 vs. limit=12.0 2023-11-27 04:36:02,952 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.11 vs. limit=22.5 2023-11-27 04:36:05,615 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 559500 2023-11-27 04:36:07,819 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3730006.6666666665, ans=0.125 2023-11-27 04:36:07,831 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3730006.6666666665, ans=0.035 2023-11-27 04:36:08,709 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 6400, loss[loss=0.05516, simple_loss=0.08109, pruned_loss=0.005718, audio_tagging_loss=0.008895, over 14461.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08928, pruned_loss=0.01188, audio_tagging_loss=0.009001, over 3049773.74 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 04:36:08,969 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3730006.6666666665, ans=0.125 2023-11-27 04:36:11,033 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3730006.6666666665, ans=0.1 2023-11-27 04:36:14,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3730006.6666666665, ans=0.125 2023-11-27 04:36:15,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3730006.6666666665, ans=0.125 2023-11-27 04:36:23,561 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.01 vs. limit=12.0 2023-11-27 04:36:24,207 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3730073.3333333335, ans=0.125 2023-11-27 04:36:25,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3730073.3333333335, ans=0.125 2023-11-27 04:36:26,643 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.785e+01 9.070e+01 9.519e+01 1.025e+02 1.551e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-27 04:36:31,679 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3730140.0, ans=0.125 2023-11-27 04:36:40,836 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.62 vs. limit=15.0 2023-11-27 04:36:41,594 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.44 vs. limit=15.0 2023-11-27 04:36:49,335 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3730206.6666666665, ans=0.125 2023-11-27 04:37:00,653 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 559550 2023-11-27 04:37:03,652 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 6450, loss[loss=0.04546, simple_loss=0.06471, pruned_loss=0.005856, audio_tagging_loss=0.007254, over 14738.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08917, pruned_loss=0.01183, audio_tagging_loss=0.008995, over 3048689.35 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 04:37:04,260 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.84 vs. limit=15.0 2023-11-27 04:37:18,692 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.89 vs. limit=22.5 2023-11-27 04:37:48,673 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3730606.6666666665, ans=0.2 2023-11-27 04:37:48,992 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.33 vs. limit=12.0 2023-11-27 04:37:50,477 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.40 vs. limit=15.0 2023-11-27 04:37:55,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=3730606.6666666665, ans=10.0 2023-11-27 04:37:56,995 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 559600 2023-11-27 04:38:00,424 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 6500, loss[loss=0.07899, simple_loss=0.1037, pruned_loss=0.01936, audio_tagging_loss=0.007769, over 15065.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08883, pruned_loss=0.01176, audio_tagging_loss=0.008899, over 3050840.49 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 04:38:05,017 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3730673.3333333335, ans=0.125 2023-11-27 04:38:18,000 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.985e+01 8.947e+01 9.565e+01 1.041e+02 1.320e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-27 04:38:29,827 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3730806.6666666665, ans=0.0 2023-11-27 04:38:36,835 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3730873.3333333335, ans=0.125 2023-11-27 04:38:40,242 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.56 vs. limit=22.5 2023-11-27 04:38:44,549 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.83 vs. limit=15.0 2023-11-27 04:38:46,274 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3730940.0, ans=0.2 2023-11-27 04:38:53,118 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 559650 2023-11-27 04:38:54,320 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3730940.0, ans=0.2 2023-11-27 04:38:56,215 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 6550, loss[loss=0.06261, simple_loss=0.09044, pruned_loss=0.01097, audio_tagging_loss=0.00642, over 15127.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.08926, pruned_loss=0.01182, audio_tagging_loss=0.008778, over 3051358.17 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 04:39:01,616 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3731006.6666666665, ans=0.125 2023-11-27 04:39:29,833 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.60 vs. limit=15.0 2023-11-27 04:39:47,931 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 559700 2023-11-27 04:39:51,020 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 6600, loss[loss=0.05419, simple_loss=0.07044, pruned_loss=0.00749, audio_tagging_loss=0.01148, over 15118.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08948, pruned_loss=0.01192, audio_tagging_loss=0.008615, over 3047535.96 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 04:39:56,023 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3731340.0, ans=0.2 2023-11-27 04:40:03,404 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3731406.6666666665, ans=0.125 2023-11-27 04:40:09,656 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.811e+01 8.917e+01 9.565e+01 1.016e+02 1.189e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-27 04:40:26,471 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.87 vs. limit=15.0 2023-11-27 04:40:34,277 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.43 vs. limit=8.0 2023-11-27 04:40:44,520 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 559750 2023-11-27 04:40:47,626 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 6650, loss[loss=0.07066, simple_loss=0.09652, pruned_loss=0.01325, audio_tagging_loss=0.009149, over 14428.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08998, pruned_loss=0.01212, audio_tagging_loss=0.008604, over 3037959.10 frames. ], batch size: 54, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 04:41:07,264 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3731740.0, ans=0.035 2023-11-27 04:41:15,853 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3731806.6666666665, ans=0.125 2023-11-27 04:41:17,922 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3731806.6666666665, ans=0.125 2023-11-27 04:41:29,228 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3731873.3333333335, ans=0.1 2023-11-27 04:41:33,322 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3731940.0, ans=0.125 2023-11-27 04:41:39,520 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 559800 2023-11-27 04:41:39,937 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.59 vs. limit=15.0 2023-11-27 04:41:42,938 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 6700, loss[loss=0.05301, simple_loss=0.07448, pruned_loss=0.008029, audio_tagging_loss=0.007733, over 14472.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.09007, pruned_loss=0.01222, audio_tagging_loss=0.008545, over 3034567.18 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 04:41:43,178 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3732006.6666666665, ans=0.125 2023-11-27 04:41:47,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3732006.6666666665, ans=0.125 2023-11-27 04:42:03,041 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.591e+01 8.829e+01 9.518e+01 1.003e+02 1.219e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-27 04:42:34,583 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3732273.3333333335, ans=0.125 2023-11-27 04:42:35,430 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 559850 2023-11-27 04:42:38,559 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 6750, loss[loss=0.07147, simple_loss=0.1039, pruned_loss=0.01319, audio_tagging_loss=0.006347, over 16182.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08951, pruned_loss=0.01216, audio_tagging_loss=0.00851, over 3037656.67 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 8.0 2023-11-27 04:42:54,101 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3732406.6666666665, ans=0.0 2023-11-27 04:43:08,926 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3732473.3333333335, ans=0.125 2023-11-27 04:43:24,027 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3732606.6666666665, ans=0.125 2023-11-27 04:43:31,369 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 559900 2023-11-27 04:43:35,035 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 6800, loss[loss=0.05796, simple_loss=0.07853, pruned_loss=0.01148, audio_tagging_loss=0.007212, over 15281.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08979, pruned_loss=0.0122, audio_tagging_loss=0.008451, over 3033098.58 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 04:43:41,513 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3732673.3333333335, ans=0.125 2023-11-27 04:43:42,690 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3732673.3333333335, ans=0.125 2023-11-27 04:43:43,791 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3732673.3333333335, ans=0.2 2023-11-27 04:43:50,501 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.99 vs. limit=15.0 2023-11-27 04:43:54,088 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.648e+01 9.067e+01 9.528e+01 1.036e+02 1.458e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-27 04:43:56,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3732806.6666666665, ans=0.1 2023-11-27 04:43:59,756 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3732806.6666666665, ans=0.0 2023-11-27 04:44:10,228 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3732873.3333333335, ans=0.09899494936611666 2023-11-27 04:44:13,355 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.47 vs. limit=22.5 2023-11-27 04:44:21,875 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 04:44:25,180 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.45 vs. limit=15.0 2023-11-27 04:44:26,857 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 559950 2023-11-27 04:44:29,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3733006.6666666665, ans=0.125 2023-11-27 04:44:29,964 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 6850, loss[loss=0.07046, simple_loss=0.1013, pruned_loss=0.0121, audio_tagging_loss=0.007704, over 15033.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08933, pruned_loss=0.01209, audio_tagging_loss=0.00846, over 3037298.65 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 04:44:30,206 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3733006.6666666665, ans=0.125 2023-11-27 04:44:32,679 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.97 vs. limit=15.0 2023-11-27 04:44:45,542 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3733073.3333333335, ans=0.0 2023-11-27 04:45:04,133 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3733206.6666666665, ans=0.0 2023-11-27 04:45:04,491 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.24 vs. limit=15.0 2023-11-27 04:45:05,215 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3733206.6666666665, ans=0.05 2023-11-27 04:45:13,692 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3733273.3333333335, ans=0.125 2023-11-27 04:45:17,954 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3733273.3333333335, ans=0.2 2023-11-27 04:45:21,874 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 560000 2023-11-27 04:45:27,241 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 6900, loss[loss=0.07745, simple_loss=0.1136, pruned_loss=0.01475, audio_tagging_loss=0.005915, over 15531.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.09032, pruned_loss=0.01227, audio_tagging_loss=0.008412, over 3038493.04 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 04:45:48,391 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.525e+01 8.976e+01 9.495e+01 1.031e+02 1.745e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-27 04:46:00,362 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3733540.0, ans=0.0 2023-11-27 04:46:00,877 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.05 vs. limit=15.0 2023-11-27 04:46:09,637 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 04:46:16,797 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3733606.6666666665, ans=0.125 2023-11-27 04:46:19,468 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3733606.6666666665, ans=0.2 2023-11-27 04:46:20,325 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 560050 2023-11-27 04:46:23,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3733673.3333333335, ans=0.0 2023-11-27 04:46:23,932 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 6950, loss[loss=0.06803, simple_loss=0.0917, pruned_loss=0.01322, audio_tagging_loss=0.008954, over 15943.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08979, pruned_loss=0.01213, audio_tagging_loss=0.008447, over 3042328.06 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 04:46:28,918 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3733673.3333333335, ans=0.1 2023-11-27 04:46:39,661 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.99 vs. limit=22.5 2023-11-27 04:46:40,610 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3733740.0, ans=0.125 2023-11-27 04:46:43,781 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3733740.0, ans=0.1 2023-11-27 04:47:00,750 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3733873.3333333335, ans=0.125 2023-11-27 04:47:16,351 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 560100 2023-11-27 04:47:19,469 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 7000, loss[loss=0.05063, simple_loss=0.06646, pruned_loss=0.007451, audio_tagging_loss=0.00995, over 16376.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.09016, pruned_loss=0.01227, audio_tagging_loss=0.008524, over 3043313.79 frames. ], batch size: 61, lr: 1.43e-03, grad_scale: 8.0 2023-11-27 04:47:28,086 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3734006.6666666665, ans=0.2 2023-11-27 04:47:39,979 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.504e+01 8.711e+01 9.428e+01 1.006e+02 1.288e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-27 04:47:47,691 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3734140.0, ans=0.1 2023-11-27 04:47:57,502 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3734206.6666666665, ans=0.0 2023-11-27 04:48:01,810 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3734206.6666666665, ans=0.0 2023-11-27 04:48:09,246 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3734273.3333333335, ans=0.125 2023-11-27 04:48:11,189 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 560150 2023-11-27 04:48:13,362 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3734340.0, ans=0.0 2023-11-27 04:48:14,240 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 7050, loss[loss=0.06425, simple_loss=0.08361, pruned_loss=0.01135, audio_tagging_loss=0.01109, over 15548.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.09018, pruned_loss=0.01218, audio_tagging_loss=0.008562, over 3044100.91 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 4.0 2023-11-27 04:48:18,797 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3734340.0, ans=10.0 2023-11-27 04:48:38,493 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3734473.3333333335, ans=0.125 2023-11-27 04:48:43,886 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3734473.3333333335, ans=0.125 2023-11-27 04:48:44,523 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.91 vs. limit=22.5 2023-11-27 04:48:45,017 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3734473.3333333335, ans=0.0 2023-11-27 04:49:00,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3734606.6666666665, ans=0.0 2023-11-27 04:49:06,576 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 560200 2023-11-27 04:49:10,409 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 7100, loss[loss=0.05947, simple_loss=0.07341, pruned_loss=0.01324, audio_tagging_loss=0.009528, over 14530.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.09043, pruned_loss=0.01225, audio_tagging_loss=0.008646, over 3047687.10 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 8.0 2023-11-27 04:49:21,001 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.27 vs. limit=15.0 2023-11-27 04:49:26,358 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.34 vs. limit=15.0 2023-11-27 04:49:29,431 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.06 vs. limit=22.5 2023-11-27 04:49:32,040 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.100e+01 9.003e+01 9.804e+01 1.066e+02 3.214e+02, threshold=1.961e+02, percent-clipped=1.0 2023-11-27 04:50:02,371 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 560250 2023-11-27 04:50:02,594 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3734940.0, ans=0.1 2023-11-27 04:50:04,635 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3735006.6666666665, ans=0.125 2023-11-27 04:50:05,481 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 7150, loss[loss=0.07805, simple_loss=0.1057, pruned_loss=0.01669, audio_tagging_loss=0.008509, over 15638.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.08999, pruned_loss=0.01212, audio_tagging_loss=0.008704, over 3044559.06 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 8.0 2023-11-27 04:50:13,163 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 04:50:21,548 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3735073.3333333335, ans=0.125 2023-11-27 04:50:56,425 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 04:50:57,371 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 560300 2023-11-27 04:51:00,452 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 7200, loss[loss=0.06071, simple_loss=0.0757, pruned_loss=0.01304, audio_tagging_loss=0.009814, over 14747.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08959, pruned_loss=0.01188, audio_tagging_loss=0.008748, over 3043590.16 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 04:51:05,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3735340.0, ans=0.0 2023-11-27 04:51:23,662 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.638e+01 8.928e+01 9.518e+01 1.020e+02 1.389e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-27 04:51:24,379 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.80 vs. limit=15.0 2023-11-27 04:51:24,921 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3735473.3333333335, ans=0.0 2023-11-27 04:51:52,248 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 560350 2023-11-27 04:51:55,157 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3735673.3333333335, ans=0.125 2023-11-27 04:51:55,942 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 7250, loss[loss=0.05608, simple_loss=0.07569, pruned_loss=0.009243, audio_tagging_loss=0.008995, over 15127.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08945, pruned_loss=0.01176, audio_tagging_loss=0.008726, over 3045490.79 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 04:51:56,124 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3735673.3333333335, ans=0.2 2023-11-27 04:51:56,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3735673.3333333335, ans=0.2 2023-11-27 04:52:05,237 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3735673.3333333335, ans=0.0 2023-11-27 04:52:08,338 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3735740.0, ans=0.0 2023-11-27 04:52:11,771 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.33 vs. limit=15.0 2023-11-27 04:52:29,710 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3735873.3333333335, ans=0.1 2023-11-27 04:52:36,821 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3735873.3333333335, ans=0.125 2023-11-27 04:52:39,782 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.21 vs. limit=15.0 2023-11-27 04:52:45,229 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.27 vs. limit=6.0 2023-11-27 04:52:48,920 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 560400 2023-11-27 04:52:52,312 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 7300, loss[loss=0.09251, simple_loss=0.1379, pruned_loss=0.01828, audio_tagging_loss=0.005298, over 14478.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.0897, pruned_loss=0.01187, audio_tagging_loss=0.008655, over 3047048.26 frames. ], batch size: 54, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 04:52:53,660 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3736006.6666666665, ans=0.125 2023-11-27 04:53:13,372 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.632e+01 8.819e+01 9.611e+01 1.034e+02 1.337e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-27 04:53:23,266 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3736140.0, ans=0.125 2023-11-27 04:53:28,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3736206.6666666665, ans=0.125 2023-11-27 04:53:32,739 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=3736206.6666666665, ans=0.5 2023-11-27 04:53:34,807 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3736206.6666666665, ans=0.125 2023-11-27 04:53:35,824 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3736273.3333333335, ans=0.2 2023-11-27 04:53:41,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3736273.3333333335, ans=0.125 2023-11-27 04:53:44,152 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 560450 2023-11-27 04:53:47,258 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 7350, loss[loss=0.05864, simple_loss=0.08672, pruned_loss=0.006945, audio_tagging_loss=0.008332, over 15088.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.09055, pruned_loss=0.01196, audio_tagging_loss=0.0085, over 3042690.16 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 04:54:11,790 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3736473.3333333335, ans=0.125 2023-11-27 04:54:27,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3736540.0, ans=0.125 2023-11-27 04:54:38,969 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 560500 2023-11-27 04:54:39,089 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3736606.6666666665, ans=0.125 2023-11-27 04:54:39,198 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3736606.6666666665, ans=0.2 2023-11-27 04:54:42,037 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 7400, loss[loss=0.06792, simple_loss=0.09083, pruned_loss=0.01407, audio_tagging_loss=0.008432, over 17315.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08977, pruned_loss=0.01206, audio_tagging_loss=0.008421, over 3039110.38 frames. ], batch size: 66, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 04:54:47,610 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3736673.3333333335, ans=0.125 2023-11-27 04:54:58,305 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3736740.0, ans=0.2 2023-11-27 04:55:02,339 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3736740.0, ans=0.2 2023-11-27 04:55:03,888 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=3736740.0, ans=15.0 2023-11-27 04:55:05,349 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.674e+01 8.988e+01 9.437e+01 1.029e+02 2.461e+02, threshold=1.887e+02, percent-clipped=1.0 2023-11-27 04:55:05,664 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3736806.6666666665, ans=0.125 2023-11-27 04:55:21,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3736873.3333333335, ans=0.0 2023-11-27 04:55:35,712 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 560550 2023-11-27 04:55:39,273 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 7450, loss[loss=0.07221, simple_loss=0.1109, pruned_loss=0.01312, audio_tagging_loss=0.003629, over 14308.00 frames. ], tot_loss[loss=0.06473, simple_loss=0.08905, pruned_loss=0.0118, audio_tagging_loss=0.008404, over 3039677.63 frames. ], batch size: 52, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 04:55:39,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3737006.6666666665, ans=0.0 2023-11-27 04:55:52,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3737073.3333333335, ans=0.2 2023-11-27 04:55:55,747 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.24 vs. limit=15.0 2023-11-27 04:55:58,530 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3737073.3333333335, ans=0.05 2023-11-27 04:56:30,297 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3737273.3333333335, ans=0.0 2023-11-27 04:56:31,235 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 560600 2023-11-27 04:56:34,660 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 7500, loss[loss=0.07359, simple_loss=0.101, pruned_loss=0.01376, audio_tagging_loss=0.009337, over 14872.00 frames. ], tot_loss[loss=0.06445, simple_loss=0.08843, pruned_loss=0.01177, audio_tagging_loss=0.008464, over 3047281.24 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 04:56:56,841 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.600e+01 8.930e+01 9.677e+01 1.022e+02 1.348e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-27 04:57:17,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3737540.0, ans=0.1 2023-11-27 04:57:26,579 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 560650 2023-11-27 04:57:28,810 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3737673.3333333335, ans=0.125 2023-11-27 04:57:29,691 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 7550, loss[loss=0.05563, simple_loss=0.08034, pruned_loss=0.00677, audio_tagging_loss=0.008691, over 15547.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.08899, pruned_loss=0.01199, audio_tagging_loss=0.008478, over 3041006.42 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 04:57:44,690 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.31 vs. limit=15.0 2023-11-27 04:57:45,313 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3737740.0, ans=0.0 2023-11-27 04:58:03,018 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.80 vs. limit=15.0 2023-11-27 04:58:04,899 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3737873.3333333335, ans=0.0 2023-11-27 04:58:05,826 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3737873.3333333335, ans=0.2 2023-11-27 04:58:15,960 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3737940.0, ans=0.0 2023-11-27 04:58:20,794 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3737940.0, ans=0.1 2023-11-27 04:58:23,199 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 560700 2023-11-27 04:58:26,307 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 7600, loss[loss=0.06844, simple_loss=0.09617, pruned_loss=0.01108, audio_tagging_loss=0.009282, over 15527.00 frames. ], tot_loss[loss=0.06442, simple_loss=0.08853, pruned_loss=0.01166, audio_tagging_loss=0.00849, over 3044360.96 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 04:58:28,068 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.44 vs. limit=10.0 2023-11-27 04:58:38,735 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3738073.3333333335, ans=0.1 2023-11-27 04:58:48,031 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.645e+01 8.653e+01 9.351e+01 1.003e+02 1.286e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-27 04:58:59,563 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.44 vs. limit=15.0 2023-11-27 04:59:13,539 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.85 vs. limit=15.0 2023-11-27 04:59:16,760 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3738273.3333333335, ans=0.035 2023-11-27 04:59:16,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3738273.3333333335, ans=0.2 2023-11-27 04:59:18,823 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 560750 2023-11-27 04:59:22,043 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 7650, loss[loss=0.06218, simple_loss=0.08524, pruned_loss=0.01009, audio_tagging_loss=0.009466, over 15292.00 frames. ], tot_loss[loss=0.06442, simple_loss=0.08863, pruned_loss=0.01158, audio_tagging_loss=0.008528, over 3037763.52 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 04:59:26,352 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3738340.0, ans=0.125 2023-11-27 04:59:30,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3738340.0, ans=0.125 2023-11-27 04:59:40,706 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3738406.6666666665, ans=0.125 2023-11-27 04:59:41,965 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.04 vs. limit=15.0 2023-11-27 05:00:13,759 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 560800 2023-11-27 05:00:17,103 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 7700, loss[loss=0.07713, simple_loss=0.1017, pruned_loss=0.01738, audio_tagging_loss=0.008901, over 14929.00 frames. ], tot_loss[loss=0.06479, simple_loss=0.08917, pruned_loss=0.01176, audio_tagging_loss=0.008445, over 3037477.41 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:00:18,413 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3738673.3333333335, ans=0.125 2023-11-27 05:00:26,953 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=3738673.3333333335, ans=0.025 2023-11-27 05:00:40,057 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.670e+01 9.124e+01 9.752e+01 1.036e+02 1.277e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-27 05:00:56,541 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3738873.3333333335, ans=0.125 2023-11-27 05:01:09,976 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 560850 2023-11-27 05:01:13,597 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 7750, loss[loss=0.07674, simple_loss=0.1025, pruned_loss=0.01468, audio_tagging_loss=0.01082, over 15537.00 frames. ], tot_loss[loss=0.06488, simple_loss=0.08907, pruned_loss=0.01183, audio_tagging_loss=0.008514, over 3037317.31 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:01:28,185 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3739073.3333333335, ans=0.09899494936611666 2023-11-27 05:01:35,587 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3739140.0, ans=0.0 2023-11-27 05:01:38,697 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3739140.0, ans=0.125 2023-11-27 05:02:00,228 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3739273.3333333335, ans=0.125 2023-11-27 05:02:05,348 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 560900 2023-11-27 05:02:08,480 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 7800, loss[loss=0.0893, simple_loss=0.121, pruned_loss=0.02143, audio_tagging_loss=0.007346, over 15114.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08931, pruned_loss=0.01193, audio_tagging_loss=0.008536, over 3038306.88 frames. ], batch size: 54, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:02:18,128 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.76 vs. limit=15.0 2023-11-27 05:02:30,359 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3739473.3333333335, ans=0.1 2023-11-27 05:02:31,128 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.744e+01 8.958e+01 9.629e+01 1.040e+02 1.238e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-27 05:02:45,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3739540.0, ans=0.125 2023-11-27 05:02:47,362 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3739540.0, ans=0.0 2023-11-27 05:03:00,451 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 560950 2023-11-27 05:03:02,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3739673.3333333335, ans=0.0 2023-11-27 05:03:03,538 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 7850, loss[loss=0.06454, simple_loss=0.08714, pruned_loss=0.01209, audio_tagging_loss=0.008878, over 15857.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.09009, pruned_loss=0.01209, audio_tagging_loss=0.008606, over 3042004.69 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:03:21,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=3739740.0, ans=0.95 2023-11-27 05:03:43,973 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3739873.3333333335, ans=0.125 2023-11-27 05:03:50,468 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3739940.0, ans=0.2 2023-11-27 05:03:51,698 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.43 vs. limit=15.0 2023-11-27 05:03:56,074 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 561000 2023-11-27 05:03:59,948 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 7900, loss[loss=0.04436, simple_loss=0.05936, pruned_loss=0.005464, audio_tagging_loss=0.009218, over 16970.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08968, pruned_loss=0.0121, audio_tagging_loss=0.008712, over 3046034.35 frames. ], batch size: 65, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:04:22,305 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3740140.0, ans=0.125 2023-11-27 05:04:23,045 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.984e+01 9.150e+01 9.892e+01 1.047e+02 1.288e+02, threshold=1.978e+02, percent-clipped=0.0 2023-11-27 05:04:25,417 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3740140.0, ans=0.125 2023-11-27 05:04:36,998 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3740206.6666666665, ans=0.1 2023-11-27 05:04:43,428 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3740273.3333333335, ans=0.1 2023-11-27 05:04:47,441 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.34 vs. limit=15.0 2023-11-27 05:04:48,220 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3740273.3333333335, ans=0.0 2023-11-27 05:04:52,370 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 561050 2023-11-27 05:04:55,421 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 7950, loss[loss=0.05719, simple_loss=0.07716, pruned_loss=0.00857, audio_tagging_loss=0.01005, over 15469.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08945, pruned_loss=0.01197, audio_tagging_loss=0.008799, over 3047137.58 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:05:01,990 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 05:05:04,101 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3740340.0, ans=0.0 2023-11-27 05:05:07,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3740406.6666666665, ans=0.07 2023-11-27 05:05:08,745 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 05:05:08,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3740406.6666666665, ans=0.2 2023-11-27 05:05:30,622 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3740540.0, ans=0.1 2023-11-27 05:05:47,384 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 561100 2023-11-27 05:05:51,026 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 8000, loss[loss=0.06295, simple_loss=0.08491, pruned_loss=0.01168, audio_tagging_loss=0.008813, over 14904.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08933, pruned_loss=0.01193, audio_tagging_loss=0.008864, over 3049549.10 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:06:14,627 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.003e+01 8.945e+01 9.427e+01 1.017e+02 1.273e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-27 05:06:20,078 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3740806.6666666665, ans=0.125 2023-11-27 05:06:21,212 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3740806.6666666665, ans=0.125 2023-11-27 05:06:32,357 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3740873.3333333335, ans=0.0 2023-11-27 05:06:32,620 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.96 vs. limit=15.0 2023-11-27 05:06:38,600 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3740940.0, ans=0.125 2023-11-27 05:06:39,206 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.55 vs. limit=15.0 2023-11-27 05:06:42,822 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 561150 2023-11-27 05:06:46,431 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 8050, loss[loss=0.07253, simple_loss=0.1002, pruned_loss=0.01646, audio_tagging_loss=0.005964, over 16287.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.09019, pruned_loss=0.01209, audio_tagging_loss=0.008849, over 3053090.17 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:06:49,626 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.49 vs. limit=10.0 2023-11-27 05:06:55,058 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3741006.6666666665, ans=0.04949747468305833 2023-11-27 05:07:01,487 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3741073.3333333335, ans=0.0 2023-11-27 05:07:31,580 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 05:07:32,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3741273.3333333335, ans=0.1 2023-11-27 05:07:39,358 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 561200 2023-11-27 05:07:42,686 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 8100, loss[loss=0.07247, simple_loss=0.09771, pruned_loss=0.01339, audio_tagging_loss=0.01022, over 15609.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08926, pruned_loss=0.01188, audio_tagging_loss=0.008887, over 3049678.08 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:07:55,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3741406.6666666665, ans=0.125 2023-11-27 05:08:01,911 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.76 vs. limit=12.0 2023-11-27 05:08:05,307 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.227e+01 8.986e+01 9.924e+01 1.066e+02 1.404e+02, threshold=1.985e+02, percent-clipped=0.0 2023-11-27 05:08:23,712 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=3741540.0, ans=10.0 2023-11-27 05:08:34,238 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 561250 2023-11-27 05:08:37,356 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 8150, loss[loss=0.05432, simple_loss=0.07242, pruned_loss=0.009232, audio_tagging_loss=0.008875, over 14381.00 frames. ], tot_loss[loss=0.06484, simple_loss=0.08854, pruned_loss=0.01182, audio_tagging_loss=0.008753, over 3053749.36 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:09:29,763 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 561300 2023-11-27 05:09:32,792 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 8200, loss[loss=0.05902, simple_loss=0.08529, pruned_loss=0.009565, audio_tagging_loss=0.006811, over 15557.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.08881, pruned_loss=0.01186, audio_tagging_loss=0.008604, over 3056772.56 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:09:32,841 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 05:09:40,373 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3742006.6666666665, ans=0.0 2023-11-27 05:09:44,120 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff3.min_abs, batch_count=3742073.3333333335, ans=0.2 2023-11-27 05:09:45,106 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3742073.3333333335, ans=0.125 2023-11-27 05:09:51,435 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3742073.3333333335, ans=0.125 2023-11-27 05:09:57,531 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.908e+01 9.029e+01 9.641e+01 1.058e+02 1.267e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-27 05:10:21,422 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3742273.3333333335, ans=0.0 2023-11-27 05:10:26,214 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 561350 2023-11-27 05:10:29,366 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 8250, loss[loss=0.06176, simple_loss=0.08798, pruned_loss=0.009224, audio_tagging_loss=0.008551, over 14906.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.089, pruned_loss=0.01189, audio_tagging_loss=0.008595, over 3056686.02 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:10:29,677 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3742340.0, ans=0.2 2023-11-27 05:11:05,347 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.20 vs. limit=15.0 2023-11-27 05:11:21,048 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 561400 2023-11-27 05:11:24,439 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 8300, loss[loss=0.07253, simple_loss=0.1049, pruned_loss=0.01092, audio_tagging_loss=0.009156, over 14282.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.0895, pruned_loss=0.01195, audio_tagging_loss=0.008521, over 3058864.10 frames. ], batch size: 54, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:11:25,007 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.07 vs. limit=15.0 2023-11-27 05:11:41,718 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3742740.0, ans=0.125 2023-11-27 05:11:49,841 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.852e+01 8.979e+01 9.695e+01 1.038e+02 1.326e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-27 05:12:16,295 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 561450 2023-11-27 05:12:19,463 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 8350, loss[loss=0.04401, simple_loss=0.05667, pruned_loss=0.006911, audio_tagging_loss=0.008762, over 15997.00 frames. ], tot_loss[loss=0.06469, simple_loss=0.08892, pruned_loss=0.01178, audio_tagging_loss=0.008451, over 3049024.15 frames. ], batch size: 61, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:12:27,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3743006.6666666665, ans=0.125 2023-11-27 05:12:36,750 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3743073.3333333335, ans=0.05 2023-11-27 05:12:42,110 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3743140.0, ans=0.0 2023-11-27 05:12:52,080 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.02 vs. limit=15.0 2023-11-27 05:12:52,507 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3743206.6666666665, ans=0.125 2023-11-27 05:13:09,620 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3743273.3333333335, ans=0.0 2023-11-27 05:13:13,216 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 561500 2023-11-27 05:13:16,320 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 8400, loss[loss=0.06493, simple_loss=0.08342, pruned_loss=0.01573, audio_tagging_loss=0.007485, over 15313.00 frames. ], tot_loss[loss=0.06485, simple_loss=0.08887, pruned_loss=0.01193, audio_tagging_loss=0.008493, over 3047403.88 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:13:19,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3743340.0, ans=0.0 2023-11-27 05:13:21,978 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3743340.0, ans=0.2 2023-11-27 05:13:29,296 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3743406.6666666665, ans=0.1 2023-11-27 05:13:39,604 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.675e+01 8.778e+01 9.390e+01 9.920e+01 1.165e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-27 05:13:41,880 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3743473.3333333335, ans=0.0 2023-11-27 05:13:46,215 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3743473.3333333335, ans=0.125 2023-11-27 05:14:05,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3743606.6666666665, ans=0.125 2023-11-27 05:14:08,327 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 561550 2023-11-27 05:14:11,419 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 8450, loss[loss=0.06352, simple_loss=0.09236, pruned_loss=0.01075, audio_tagging_loss=0.006587, over 15423.00 frames. ], tot_loss[loss=0.06483, simple_loss=0.08877, pruned_loss=0.01187, audio_tagging_loss=0.008567, over 3049331.79 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:14:17,970 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3743673.3333333335, ans=0.125 2023-11-27 05:14:26,359 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3743740.0, ans=0.125 2023-11-27 05:14:31,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3743740.0, ans=0.125 2023-11-27 05:15:02,057 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3743940.0, ans=0.125 2023-11-27 05:15:02,157 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3743940.0, ans=0.5 2023-11-27 05:15:03,055 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 561600 2023-11-27 05:15:06,448 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 8500, loss[loss=0.06372, simple_loss=0.07706, pruned_loss=0.01397, audio_tagging_loss=0.01122, over 14618.00 frames. ], tot_loss[loss=0.06476, simple_loss=0.0887, pruned_loss=0.01179, audio_tagging_loss=0.008621, over 3046807.15 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:15:23,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3744073.3333333335, ans=0.125 2023-11-27 05:15:26,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3744073.3333333335, ans=0.2 2023-11-27 05:15:31,430 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.032e+01 9.244e+01 9.812e+01 1.039e+02 1.357e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-27 05:15:58,946 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 561650 2023-11-27 05:16:03,180 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 8550, loss[loss=0.08498, simple_loss=0.12, pruned_loss=0.01976, audio_tagging_loss=0.005218, over 15187.00 frames. ], tot_loss[loss=0.06473, simple_loss=0.0887, pruned_loss=0.01176, audio_tagging_loss=0.008619, over 3054575.14 frames. ], batch size: 53, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:16:20,356 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3744406.6666666665, ans=0.1 2023-11-27 05:16:35,053 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3744540.0, ans=0.0 2023-11-27 05:16:38,856 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3744540.0, ans=0.2 2023-11-27 05:16:54,671 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 561700 2023-11-27 05:16:55,951 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3744606.6666666665, ans=0.125 2023-11-27 05:16:57,855 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 8600, loss[loss=0.06033, simple_loss=0.08332, pruned_loss=0.00838, audio_tagging_loss=0.01029, over 14674.00 frames. ], tot_loss[loss=0.06408, simple_loss=0.08757, pruned_loss=0.01157, audio_tagging_loss=0.00872, over 3054005.97 frames. ], batch size: 54, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:17:07,812 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.26 vs. limit=15.0 2023-11-27 05:17:14,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3744740.0, ans=0.2 2023-11-27 05:17:22,245 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.611e+01 8.987e+01 9.604e+01 1.025e+02 1.300e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-27 05:17:33,530 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.89 vs. limit=10.0 2023-11-27 05:17:38,643 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3744873.3333333335, ans=0.2 2023-11-27 05:17:47,128 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3744940.0, ans=0.0 2023-11-27 05:17:50,178 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 561750 2023-11-27 05:17:53,225 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 8650, loss[loss=0.05785, simple_loss=0.08064, pruned_loss=0.007586, audio_tagging_loss=0.009941, over 15755.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.08926, pruned_loss=0.01175, audio_tagging_loss=0.008654, over 3053577.30 frames. ], batch size: 63, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:18:04,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3745073.3333333335, ans=0.04949747468305833 2023-11-27 05:18:28,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3745206.6666666665, ans=0.1 2023-11-27 05:18:45,594 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 561800 2023-11-27 05:18:47,719 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.38 vs. limit=15.0 2023-11-27 05:18:50,082 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 8700, loss[loss=0.08083, simple_loss=0.1118, pruned_loss=0.01623, audio_tagging_loss=0.008697, over 14373.00 frames. ], tot_loss[loss=0.06466, simple_loss=0.08833, pruned_loss=0.0117, audio_tagging_loss=0.008799, over 3062489.37 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:19:02,422 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3745406.6666666665, ans=0.2 2023-11-27 05:19:06,862 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.62 vs. limit=15.0 2023-11-27 05:19:14,251 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3745473.3333333335, ans=0.0 2023-11-27 05:19:15,002 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.677e+01 9.080e+01 9.687e+01 1.041e+02 1.884e+02, threshold=1.937e+02, percent-clipped=0.0 2023-11-27 05:19:18,432 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3745473.3333333335, ans=0.125 2023-11-27 05:19:19,421 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3745473.3333333335, ans=0.0 2023-11-27 05:19:23,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3745540.0, ans=0.0 2023-11-27 05:19:28,569 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3745540.0, ans=0.2 2023-11-27 05:19:42,776 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 561850 2023-11-27 05:19:45,866 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 8750, loss[loss=0.06445, simple_loss=0.08606, pruned_loss=0.01202, audio_tagging_loss=0.009408, over 16011.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08867, pruned_loss=0.0118, audio_tagging_loss=0.008772, over 3061036.76 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:19:52,381 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3745673.3333333335, ans=0.0 2023-11-27 05:20:04,991 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3745740.0, ans=0.125 2023-11-27 05:20:24,486 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3745873.3333333335, ans=0.0 2023-11-27 05:20:32,456 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3745940.0, ans=0.2 2023-11-27 05:20:32,912 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.20 vs. limit=22.5 2023-11-27 05:20:37,695 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 561900 2023-11-27 05:20:38,915 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 05:20:39,006 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-27 05:20:40,935 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 8800, loss[loss=0.1023, simple_loss=0.1368, pruned_loss=0.02534, audio_tagging_loss=0.008566, over 15402.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.0896, pruned_loss=0.01205, audio_tagging_loss=0.00882, over 3060708.49 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:20:43,166 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3746006.6666666665, ans=0.1 2023-11-27 05:20:51,136 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3746073.3333333335, ans=0.125 2023-11-27 05:21:06,199 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3746140.0, ans=0.125 2023-11-27 05:21:07,023 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.717e+01 9.208e+01 9.853e+01 1.073e+02 1.310e+02, threshold=1.971e+02, percent-clipped=0.0 2023-11-27 05:21:12,342 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.14 vs. limit=10.0 2023-11-27 05:21:31,995 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3746273.3333333335, ans=0.0 2023-11-27 05:21:33,500 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 561950 2023-11-27 05:21:37,189 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 8850, loss[loss=0.08717, simple_loss=0.1266, pruned_loss=0.01789, audio_tagging_loss=0.005985, over 15748.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08892, pruned_loss=0.01187, audio_tagging_loss=0.008824, over 3057809.47 frames. ], batch size: 53, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:21:41,194 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3746340.0, ans=0.0 2023-11-27 05:21:47,277 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 05:21:47,526 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3746406.6666666665, ans=0.95 2023-11-27 05:22:00,684 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=3746473.3333333335, ans=0.025 2023-11-27 05:22:29,708 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 562000 2023-11-27 05:22:33,128 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 8900, loss[loss=0.07228, simple_loss=0.0891, pruned_loss=0.01832, audio_tagging_loss=0.00941, over 15601.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08952, pruned_loss=0.01216, audio_tagging_loss=0.008742, over 3062923.77 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:22:52,891 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3746740.0, ans=0.1 2023-11-27 05:22:59,602 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.698e+01 9.144e+01 9.662e+01 1.039e+02 1.595e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-27 05:23:10,578 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3746873.3333333335, ans=0.125 2023-11-27 05:23:19,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3746940.0, ans=0.0 2023-11-27 05:23:25,550 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 562050 2023-11-27 05:23:28,059 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.28 vs. limit=15.0 2023-11-27 05:23:28,620 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 8950, loss[loss=0.05218, simple_loss=0.06727, pruned_loss=0.008253, audio_tagging_loss=0.01029, over 14628.00 frames. ], tot_loss[loss=0.06498, simple_loss=0.08862, pruned_loss=0.01196, audio_tagging_loss=0.008707, over 3058074.81 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:23:38,441 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3747073.3333333335, ans=0.0 2023-11-27 05:23:38,491 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3747073.3333333335, ans=0.125 2023-11-27 05:23:44,527 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=3747073.3333333335, ans=0.025 2023-11-27 05:23:49,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3747073.3333333335, ans=0.0 2023-11-27 05:23:55,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3747140.0, ans=0.125 2023-11-27 05:23:59,971 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3747140.0, ans=0.125 2023-11-27 05:24:20,684 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 562100 2023-11-27 05:24:24,291 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 9000, loss[loss=0.06708, simple_loss=0.09734, pruned_loss=0.01177, audio_tagging_loss=0.006636, over 15345.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.09053, pruned_loss=0.01227, audio_tagging_loss=0.00852, over 3054452.88 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:24:24,291 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-27 05:24:56,573 INFO [train_asr.py:1267] (3/4) Epoch 47, validation: loss=0.05848, simple_loss=0.05048, pruned_loss=0.005329, audio_tagging_loss=0.02791, over 4681554.00 frames. 2023-11-27 05:24:56,573 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-27 05:24:58,315 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.48 vs. limit=12.0 2023-11-27 05:25:08,132 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.46 vs. limit=22.5 2023-11-27 05:25:13,018 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3747406.6666666665, ans=0.125 2023-11-27 05:25:19,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3747473.3333333335, ans=0.125 2023-11-27 05:25:23,630 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.971e+01 9.071e+01 9.540e+01 1.018e+02 1.204e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-27 05:25:23,847 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3747473.3333333335, ans=0.125 2023-11-27 05:25:49,084 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 562150 2023-11-27 05:25:49,147 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3747606.6666666665, ans=0.0 2023-11-27 05:25:52,137 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 9050, loss[loss=0.05995, simple_loss=0.0885, pruned_loss=0.008139, audio_tagging_loss=0.007561, over 15333.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.09023, pruned_loss=0.0122, audio_tagging_loss=0.008549, over 3053808.03 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:26:00,541 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.69 vs. limit=15.0 2023-11-27 05:26:10,944 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3747740.0, ans=0.125 2023-11-27 05:26:12,144 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.60 vs. limit=15.0 2023-11-27 05:26:16,217 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3747806.6666666665, ans=0.2 2023-11-27 05:26:17,252 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3747806.6666666665, ans=0.2 2023-11-27 05:26:36,571 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3747940.0, ans=0.125 2023-11-27 05:26:44,565 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 562200 2023-11-27 05:26:48,187 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 9100, loss[loss=0.06487, simple_loss=0.09308, pruned_loss=0.0106, audio_tagging_loss=0.007725, over 15826.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08992, pruned_loss=0.01212, audio_tagging_loss=0.008415, over 3048611.41 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 8.0 2023-11-27 05:27:06,978 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3748073.3333333335, ans=0.0 2023-11-27 05:27:12,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3748140.0, ans=0.125 2023-11-27 05:27:15,544 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.643e+01 9.080e+01 9.612e+01 1.021e+02 1.425e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-27 05:27:21,160 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3748206.6666666665, ans=0.2 2023-11-27 05:27:40,447 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 562250 2023-11-27 05:27:41,654 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3748273.3333333335, ans=0.125 2023-11-27 05:27:43,542 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 9150, loss[loss=0.05154, simple_loss=0.07034, pruned_loss=0.008306, audio_tagging_loss=0.00806, over 15420.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.09011, pruned_loss=0.01213, audio_tagging_loss=0.008335, over 3055705.95 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 8.0 2023-11-27 05:27:52,123 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3748340.0, ans=0.125 2023-11-27 05:28:35,589 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 562300 2023-11-27 05:28:35,714 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3748606.6666666665, ans=0.125 2023-11-27 05:28:39,267 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 9200, loss[loss=0.05546, simple_loss=0.07745, pruned_loss=0.009137, audio_tagging_loss=0.007593, over 16277.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.09017, pruned_loss=0.01211, audio_tagging_loss=0.008383, over 3055903.88 frames. ], batch size: 62, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:28:42,478 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3748673.3333333335, ans=0.125 2023-11-27 05:28:51,579 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3748740.0, ans=0.0 2023-11-27 05:29:07,169 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.322e+01 9.097e+01 9.873e+01 1.057e+02 1.295e+02, threshold=1.975e+02, percent-clipped=0.0 2023-11-27 05:29:17,982 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3748873.3333333335, ans=0.0 2023-11-27 05:29:31,601 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 562350 2023-11-27 05:29:35,188 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 9250, loss[loss=0.0553, simple_loss=0.07169, pruned_loss=0.01111, audio_tagging_loss=0.008346, over 15161.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08961, pruned_loss=0.01197, audio_tagging_loss=0.0084, over 3058977.26 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:29:37,540 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3749006.6666666665, ans=0.125 2023-11-27 05:29:40,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3749006.6666666665, ans=0.0 2023-11-27 05:30:00,266 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3749140.0, ans=0.1 2023-11-27 05:30:00,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3749140.0, ans=0.2 2023-11-27 05:30:13,975 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3749206.6666666665, ans=0.0 2023-11-27 05:30:16,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3749206.6666666665, ans=0.1 2023-11-27 05:30:17,757 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3749206.6666666665, ans=0.0 2023-11-27 05:30:27,458 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 562400 2023-11-27 05:30:30,780 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 9300, loss[loss=0.06446, simple_loss=0.09074, pruned_loss=0.01091, audio_tagging_loss=0.008177, over 15129.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.0895, pruned_loss=0.01186, audio_tagging_loss=0.008483, over 3058941.46 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:30:35,688 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.22 vs. limit=15.0 2023-11-27 05:30:40,806 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.25 vs. limit=22.5 2023-11-27 05:30:57,870 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.22 vs. limit=22.5 2023-11-27 05:30:58,394 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.309e+01 9.060e+01 9.666e+01 1.045e+02 1.304e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-27 05:31:02,587 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.44 vs. limit=12.0 2023-11-27 05:31:22,693 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 562450 2023-11-27 05:31:25,815 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 9350, loss[loss=0.06658, simple_loss=0.08646, pruned_loss=0.01339, audio_tagging_loss=0.009965, over 13353.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08916, pruned_loss=0.01194, audio_tagging_loss=0.008549, over 3056164.57 frames. ], batch size: 54, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:31:36,134 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3749740.0, ans=0.1 2023-11-27 05:31:50,364 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3749806.6666666665, ans=0.2 2023-11-27 05:32:02,548 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.80 vs. limit=15.0 2023-11-27 05:32:03,079 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3749873.3333333335, ans=0.2 2023-11-27 05:32:05,250 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3749873.3333333335, ans=0.125 2023-11-27 05:32:18,291 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 562500 2023-11-27 05:32:21,990 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 9400, loss[loss=0.06776, simple_loss=0.08901, pruned_loss=0.01502, audio_tagging_loss=0.008235, over 14654.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08988, pruned_loss=0.01214, audio_tagging_loss=0.00863, over 3051462.93 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 8.0 2023-11-27 05:32:24,286 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3750006.6666666665, ans=0.0 2023-11-27 05:32:26,976 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3750006.6666666665, ans=0.2 2023-11-27 05:32:33,918 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3750073.3333333335, ans=0.2 2023-11-27 05:32:37,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3750073.3333333335, ans=0.125 2023-11-27 05:32:44,440 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3750140.0, ans=0.125 2023-11-27 05:32:50,574 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.875e+01 9.326e+01 9.875e+01 1.073e+02 1.247e+02, threshold=1.975e+02, percent-clipped=0.0 2023-11-27 05:32:54,255 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.05 vs. limit=22.5 2023-11-27 05:33:00,948 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.12 vs. limit=12.0 2023-11-27 05:33:14,802 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 562550 2023-11-27 05:33:16,810 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 05:33:17,842 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 9450, loss[loss=0.07114, simple_loss=0.09124, pruned_loss=0.01738, audio_tagging_loss=0.008141, over 13976.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08962, pruned_loss=0.0121, audio_tagging_loss=0.008675, over 3049862.36 frames. ], batch size: 53, lr: 1.43e-03, grad_scale: 8.0 2023-11-27 05:34:03,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3750606.6666666665, ans=0.0 2023-11-27 05:34:10,313 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 562600 2023-11-27 05:34:10,623 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.60 vs. limit=22.5 2023-11-27 05:34:13,600 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 9500, loss[loss=0.05017, simple_loss=0.06406, pruned_loss=0.01067, audio_tagging_loss=0.007467, over 14439.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.09005, pruned_loss=0.01217, audio_tagging_loss=0.008651, over 3044820.83 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 8.0 2023-11-27 05:34:19,173 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3750673.3333333335, ans=0.125 2023-11-27 05:34:19,570 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.30 vs. limit=15.0 2023-11-27 05:34:42,866 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.626e+01 8.981e+01 9.517e+01 1.036e+02 1.547e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-27 05:34:45,589 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.46 vs. limit=15.0 2023-11-27 05:35:00,261 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3750940.0, ans=0.0 2023-11-27 05:35:05,379 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 562650 2023-11-27 05:35:08,596 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 9550, loss[loss=0.07253, simple_loss=0.1035, pruned_loss=0.01418, audio_tagging_loss=0.006622, over 15563.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.09, pruned_loss=0.01208, audio_tagging_loss=0.0088, over 3050238.20 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 8.0 2023-11-27 05:35:13,463 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.11 vs. limit=15.0 2023-11-27 05:35:15,277 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3751006.6666666665, ans=0.0 2023-11-27 05:36:02,671 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 562700 2023-11-27 05:36:05,780 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 9600, loss[loss=0.07012, simple_loss=0.09997, pruned_loss=0.01271, audio_tagging_loss=0.007434, over 15246.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.09006, pruned_loss=0.01198, audio_tagging_loss=0.008897, over 3047403.69 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:36:10,496 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.58 vs. limit=15.0 2023-11-27 05:36:13,422 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3751340.0, ans=0.1 2023-11-27 05:36:33,690 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.325e+01 9.083e+01 9.744e+01 1.053e+02 1.282e+02, threshold=1.949e+02, percent-clipped=0.0 2023-11-27 05:36:48,759 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.02 vs. limit=15.0 2023-11-27 05:36:57,616 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 562750 2023-11-27 05:37:00,758 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 9650, loss[loss=0.06395, simple_loss=0.08891, pruned_loss=0.009226, audio_tagging_loss=0.01027, over 16340.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08946, pruned_loss=0.01194, audio_tagging_loss=0.008969, over 3043742.38 frames. ], batch size: 62, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:37:04,458 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.45 vs. limit=15.0 2023-11-27 05:37:17,314 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3751740.0, ans=0.125 2023-11-27 05:37:19,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3751740.0, ans=0.125 2023-11-27 05:37:26,339 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3751806.6666666665, ans=0.0 2023-11-27 05:37:29,577 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3751806.6666666665, ans=10.0 2023-11-27 05:37:31,024 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.44 vs. limit=22.5 2023-11-27 05:37:31,748 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3751806.6666666665, ans=0.0 2023-11-27 05:37:34,845 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3751873.3333333335, ans=0.0 2023-11-27 05:37:49,931 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 05:37:51,832 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3751940.0, ans=0.0 2023-11-27 05:37:52,829 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 562800 2023-11-27 05:37:56,163 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 9700, loss[loss=0.06332, simple_loss=0.0853, pruned_loss=0.01286, audio_tagging_loss=0.007813, over 14305.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08928, pruned_loss=0.01207, audio_tagging_loss=0.008801, over 3041386.64 frames. ], batch size: 53, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:38:22,277 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3752140.0, ans=0.125 2023-11-27 05:38:25,189 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.533e+01 8.968e+01 9.559e+01 1.024e+02 1.547e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-27 05:38:32,056 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.09 vs. limit=22.5 2023-11-27 05:38:42,868 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3752273.3333333335, ans=0.1 2023-11-27 05:38:45,978 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3752273.3333333335, ans=0.07 2023-11-27 05:38:48,477 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 562850 2023-11-27 05:38:52,138 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 9750, loss[loss=0.05767, simple_loss=0.07827, pruned_loss=0.008668, audio_tagging_loss=0.009868, over 14700.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.0892, pruned_loss=0.01196, audio_tagging_loss=0.008621, over 3046396.75 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:39:20,833 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3752473.3333333335, ans=0.0 2023-11-27 05:39:36,980 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.17 vs. limit=15.0 2023-11-27 05:39:44,023 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 562900 2023-11-27 05:39:47,147 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 9800, loss[loss=0.05429, simple_loss=0.07757, pruned_loss=0.007339, audio_tagging_loss=0.008165, over 16217.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.08911, pruned_loss=0.01194, audio_tagging_loss=0.008545, over 3045026.98 frames. ], batch size: 61, lr: 1.43e-03, grad_scale: 8.0 2023-11-27 05:39:57,975 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3752740.0, ans=0.125 2023-11-27 05:39:59,422 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.75 vs. limit=15.0 2023-11-27 05:40:02,071 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3752740.0, ans=0.125 2023-11-27 05:40:02,133 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3752740.0, ans=0.125 2023-11-27 05:40:17,814 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.921e+01 8.828e+01 9.556e+01 1.035e+02 1.179e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-27 05:40:25,506 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3752873.3333333335, ans=0.125 2023-11-27 05:40:36,819 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 05:40:39,086 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 562950 2023-11-27 05:40:42,209 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 9850, loss[loss=0.06786, simple_loss=0.08776, pruned_loss=0.01379, audio_tagging_loss=0.0102, over 15463.00 frames. ], tot_loss[loss=0.0652, simple_loss=0.08959, pruned_loss=0.012, audio_tagging_loss=0.008402, over 3050410.55 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 8.0 2023-11-27 05:40:42,411 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3753006.6666666665, ans=0.0 2023-11-27 05:41:05,796 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3753140.0, ans=0.0 2023-11-27 05:41:24,438 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3753206.6666666665, ans=0.0 2023-11-27 05:41:29,257 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 05:41:33,784 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.12 vs. limit=15.0 2023-11-27 05:41:34,398 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 563000 2023-11-27 05:41:37,274 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3753340.0, ans=0.1 2023-11-27 05:41:38,592 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 9900, loss[loss=0.05264, simple_loss=0.07012, pruned_loss=0.0086, audio_tagging_loss=0.008979, over 16008.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08936, pruned_loss=0.01206, audio_tagging_loss=0.008431, over 3056183.73 frames. ], batch size: 62, lr: 1.43e-03, grad_scale: 8.0 2023-11-27 05:41:44,139 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3753340.0, ans=0.07 2023-11-27 05:41:48,882 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3753406.6666666665, ans=0.1 2023-11-27 05:41:50,029 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3753406.6666666665, ans=0.04949747468305833 2023-11-27 05:41:58,260 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3753406.6666666665, ans=0.125 2023-11-27 05:42:07,506 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.516e+01 9.022e+01 9.450e+01 1.050e+02 2.788e+02, threshold=1.890e+02, percent-clipped=1.0 2023-11-27 05:42:19,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.whiten.whitening_limit, batch_count=3753540.0, ans=12.0 2023-11-27 05:42:21,555 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.25 vs. limit=22.5 2023-11-27 05:42:25,529 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.92 vs. limit=6.0 2023-11-27 05:42:31,030 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 563050 2023-11-27 05:42:34,150 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 9950, loss[loss=0.08051, simple_loss=0.1091, pruned_loss=0.01715, audio_tagging_loss=0.008796, over 14859.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.08924, pruned_loss=0.012, audio_tagging_loss=0.008472, over 3052386.92 frames. ], batch size: 54, lr: 1.43e-03, grad_scale: 8.0 2023-11-27 05:42:34,390 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3753673.3333333335, ans=0.125 2023-11-27 05:42:38,033 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.34 vs. limit=22.5 2023-11-27 05:43:05,487 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3753806.6666666665, ans=0.0 2023-11-27 05:43:23,308 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3753940.0, ans=0.04949747468305833 2023-11-27 05:43:26,186 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 563100 2023-11-27 05:43:26,296 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3753940.0, ans=0.0 2023-11-27 05:43:29,294 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 10000, loss[loss=0.07126, simple_loss=0.09808, pruned_loss=0.01223, audio_tagging_loss=0.009993, over 15787.00 frames. ], tot_loss[loss=0.0649, simple_loss=0.08883, pruned_loss=0.01198, audio_tagging_loss=0.008505, over 3052629.16 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:43:32,743 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3754006.6666666665, ans=0.125 2023-11-27 05:43:39,649 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3754073.3333333335, ans=0.07 2023-11-27 05:43:58,448 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.56 vs. limit=22.5 2023-11-27 05:43:59,982 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.550e+01 8.953e+01 9.636e+01 1.038e+02 1.515e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-27 05:44:03,493 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3754206.6666666665, ans=0.0 2023-11-27 05:44:14,185 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3754273.3333333335, ans=0.0 2023-11-27 05:44:21,983 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 563150 2023-11-27 05:44:25,011 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 10050, loss[loss=0.06356, simple_loss=0.09019, pruned_loss=0.0103, audio_tagging_loss=0.008163, over 14885.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08986, pruned_loss=0.01216, audio_tagging_loss=0.008465, over 3052446.55 frames. ], batch size: 54, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:44:33,683 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3754340.0, ans=0.125 2023-11-27 05:44:39,232 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3754406.6666666665, ans=0.2 2023-11-27 05:44:40,760 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3754406.6666666665, ans=0.0 2023-11-27 05:44:51,577 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.33 vs. limit=15.0 2023-11-27 05:45:09,296 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3754606.6666666665, ans=0.125 2023-11-27 05:45:18,011 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 563200 2023-11-27 05:45:21,477 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 10100, loss[loss=0.05765, simple_loss=0.07813, pruned_loss=0.01091, audio_tagging_loss=0.007684, over 15047.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08967, pruned_loss=0.01208, audio_tagging_loss=0.008542, over 3049575.63 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:45:27,909 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.14 vs. limit=22.5 2023-11-27 05:45:30,761 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3754673.3333333335, ans=0.2 2023-11-27 05:45:38,147 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3754740.0, ans=0.125 2023-11-27 05:45:41,381 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3754740.0, ans=0.125 2023-11-27 05:45:51,639 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.566e+01 8.906e+01 9.588e+01 1.046e+02 1.335e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-27 05:46:05,981 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 05:46:13,952 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 563250 2023-11-27 05:46:14,173 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3754940.0, ans=0.2 2023-11-27 05:46:17,011 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 10150, loss[loss=0.05596, simple_loss=0.08167, pruned_loss=0.005745, audio_tagging_loss=0.009384, over 15881.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.09046, pruned_loss=0.01212, audio_tagging_loss=0.008554, over 3054622.30 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:46:27,520 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.97 vs. limit=6.0 2023-11-27 05:46:38,963 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 05:46:42,903 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 05:46:57,717 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.61 vs. limit=12.0 2023-11-27 05:47:03,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3755273.3333333335, ans=0.125 2023-11-27 05:47:09,467 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 563300 2023-11-27 05:47:11,213 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.13 vs. limit=10.0 2023-11-27 05:47:12,624 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 10200, loss[loss=0.06149, simple_loss=0.08268, pruned_loss=0.01177, audio_tagging_loss=0.008381, over 15599.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.08974, pruned_loss=0.01181, audio_tagging_loss=0.008635, over 3057270.53 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:47:25,408 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3755406.6666666665, ans=0.2 2023-11-27 05:47:26,611 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3755406.6666666665, ans=0.125 2023-11-27 05:47:32,876 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 05:47:33,040 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3755406.6666666665, ans=0.1 2023-11-27 05:47:37,112 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.46 vs. limit=15.0 2023-11-27 05:47:38,907 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3755473.3333333335, ans=0.0 2023-11-27 05:47:39,419 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.12 vs. limit=15.0 2023-11-27 05:47:42,915 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.968e+01 9.089e+01 9.735e+01 1.032e+02 1.277e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-27 05:47:45,349 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3755540.0, ans=0.125 2023-11-27 05:47:52,156 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3755540.0, ans=0.2 2023-11-27 05:47:54,533 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.59 vs. limit=15.0 2023-11-27 05:47:55,354 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3755540.0, ans=0.07 2023-11-27 05:48:05,864 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 563350 2023-11-27 05:48:08,973 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 10250, loss[loss=0.06472, simple_loss=0.07887, pruned_loss=0.01432, audio_tagging_loss=0.01097, over 15068.00 frames. ], tot_loss[loss=0.06498, simple_loss=0.08919, pruned_loss=0.01175, audio_tagging_loss=0.008633, over 3048457.38 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:48:24,686 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3755740.0, ans=0.125 2023-11-27 05:48:26,789 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3755740.0, ans=0.0 2023-11-27 05:48:29,172 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.76 vs. limit=6.0 2023-11-27 05:49:00,937 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 563400 2023-11-27 05:49:04,800 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 10300, loss[loss=0.07408, simple_loss=0.1043, pruned_loss=0.01469, audio_tagging_loss=0.007232, over 15457.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.08897, pruned_loss=0.01174, audio_tagging_loss=0.008643, over 3050552.61 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:49:21,359 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3756073.3333333335, ans=0.0 2023-11-27 05:49:26,568 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 05:49:34,851 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.807e+01 9.184e+01 9.721e+01 1.033e+02 1.459e+02, threshold=1.944e+02, percent-clipped=0.0 2023-11-27 05:49:35,430 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.97 vs. limit=22.5 2023-11-27 05:49:38,301 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3756206.6666666665, ans=0.125 2023-11-27 05:49:49,967 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.27 vs. limit=6.0 2023-11-27 05:49:53,726 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3756273.3333333335, ans=0.0 2023-11-27 05:49:56,713 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 563450 2023-11-27 05:49:58,448 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3756273.3333333335, ans=0.125 2023-11-27 05:50:00,361 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 10350, loss[loss=0.05482, simple_loss=0.07365, pruned_loss=0.008436, audio_tagging_loss=0.009561, over 16113.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08991, pruned_loss=0.01174, audio_tagging_loss=0.008716, over 3051492.38 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:50:00,911 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.22 vs. limit=15.0 2023-11-27 05:50:09,991 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3756340.0, ans=0.035 2023-11-27 05:50:15,410 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3756406.6666666665, ans=0.125 2023-11-27 05:50:20,424 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3756406.6666666665, ans=0.0 2023-11-27 05:50:29,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3756473.3333333335, ans=0.1 2023-11-27 05:50:32,694 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3756540.0, ans=0.125 2023-11-27 05:50:35,977 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3756540.0, ans=0.125 2023-11-27 05:50:40,676 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3756540.0, ans=0.0 2023-11-27 05:50:42,771 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3756540.0, ans=0.0 2023-11-27 05:50:53,234 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 563500 2023-11-27 05:50:55,439 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3756673.3333333335, ans=0.2 2023-11-27 05:50:56,292 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 10400, loss[loss=0.06177, simple_loss=0.08254, pruned_loss=0.01025, audio_tagging_loss=0.01025, over 15779.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08973, pruned_loss=0.01175, audio_tagging_loss=0.008799, over 3058341.87 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:51:07,530 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.76 vs. limit=15.0 2023-11-27 05:51:12,914 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3756740.0, ans=0.0 2023-11-27 05:51:18,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3756806.6666666665, ans=0.125 2023-11-27 05:51:26,041 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.687e+01 8.939e+01 9.704e+01 1.060e+02 1.471e+02, threshold=1.941e+02, percent-clipped=0.0 2023-11-27 05:51:28,529 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3756873.3333333335, ans=0.0 2023-11-27 05:51:32,663 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3756873.3333333335, ans=0.125 2023-11-27 05:51:48,449 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 563550 2023-11-27 05:51:51,584 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 10450, loss[loss=0.04217, simple_loss=0.05371, pruned_loss=0.004408, audio_tagging_loss=0.01091, over 15488.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08992, pruned_loss=0.01188, audio_tagging_loss=0.008757, over 3054992.28 frames. ], batch size: 61, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:52:13,484 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 05:52:25,742 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3757206.6666666665, ans=0.0 2023-11-27 05:52:37,537 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3757273.3333333335, ans=0.1 2023-11-27 05:52:44,129 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 563600 2023-11-27 05:52:48,004 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 10500, loss[loss=0.07507, simple_loss=0.1055, pruned_loss=0.01532, audio_tagging_loss=0.007016, over 14184.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08974, pruned_loss=0.0119, audio_tagging_loss=0.008698, over 3053268.62 frames. ], batch size: 54, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:53:01,883 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3757406.6666666665, ans=0.0 2023-11-27 05:53:13,563 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 05:53:17,364 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.844e+01 9.057e+01 9.549e+01 1.045e+02 1.272e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-27 05:53:28,751 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3757540.0, ans=0.125 2023-11-27 05:53:28,866 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3757540.0, ans=0.125 2023-11-27 05:53:40,867 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 563650 2023-11-27 05:53:43,934 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 10550, loss[loss=0.05958, simple_loss=0.07736, pruned_loss=0.01265, audio_tagging_loss=0.008254, over 14102.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08989, pruned_loss=0.01197, audio_tagging_loss=0.008532, over 3044827.56 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:54:00,413 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.88 vs. limit=15.0 2023-11-27 05:54:10,062 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3757806.6666666665, ans=0.125 2023-11-27 05:54:20,888 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten.whitening_limit, batch_count=3757873.3333333335, ans=15.0 2023-11-27 05:54:25,441 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3757873.3333333335, ans=0.1 2023-11-27 05:54:30,753 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3757940.0, ans=0.035 2023-11-27 05:54:30,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3757940.0, ans=0.1 2023-11-27 05:54:31,964 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3757940.0, ans=0.0 2023-11-27 05:54:36,085 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 563700 2023-11-27 05:54:39,298 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 10600, loss[loss=0.05191, simple_loss=0.07261, pruned_loss=0.007927, audio_tagging_loss=0.007672, over 14164.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.08989, pruned_loss=0.0119, audio_tagging_loss=0.008464, over 3039656.02 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:54:39,549 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3758006.6666666665, ans=0.125 2023-11-27 05:55:10,866 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.549e+01 8.995e+01 9.549e+01 1.050e+02 1.253e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-27 05:55:11,168 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3758140.0, ans=0.1 2023-11-27 05:55:13,601 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.86 vs. limit=12.0 2023-11-27 05:55:21,665 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3758206.6666666665, ans=0.125 2023-11-27 05:55:31,014 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 563750 2023-11-27 05:55:34,659 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 10650, loss[loss=0.06468, simple_loss=0.09261, pruned_loss=0.009653, audio_tagging_loss=0.008719, over 15265.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.09006, pruned_loss=0.0119, audio_tagging_loss=0.008418, over 3042736.88 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:55:39,020 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.78 vs. limit=15.0 2023-11-27 05:55:46,726 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.07 vs. limit=15.0 2023-11-27 05:55:51,247 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3758406.6666666665, ans=0.0 2023-11-27 05:55:51,646 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.83 vs. limit=15.0 2023-11-27 05:56:01,123 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.59 vs. limit=10.0 2023-11-27 05:56:04,989 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3758473.3333333335, ans=0.0 2023-11-27 05:56:10,399 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3758540.0, ans=0.125 2023-11-27 05:56:24,019 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3758606.6666666665, ans=0.125 2023-11-27 05:56:27,500 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 563800 2023-11-27 05:56:31,190 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 10700, loss[loss=0.06244, simple_loss=0.08068, pruned_loss=0.01257, audio_tagging_loss=0.00954, over 15287.00 frames. ], tot_loss[loss=0.06455, simple_loss=0.08869, pruned_loss=0.0117, audio_tagging_loss=0.0085, over 3042762.96 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:56:33,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3758673.3333333335, ans=0.125 2023-11-27 05:57:00,755 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.09 vs. limit=15.0 2023-11-27 05:57:01,232 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.230e+01 8.877e+01 9.430e+01 1.025e+02 1.516e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-27 05:57:01,478 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3758806.6666666665, ans=0.0 2023-11-27 05:57:02,483 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3758873.3333333335, ans=0.2 2023-11-27 05:57:10,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3758873.3333333335, ans=0.125 2023-11-27 05:57:11,455 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3758873.3333333335, ans=0.125 2023-11-27 05:57:17,190 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3758940.0, ans=0.035 2023-11-27 05:57:22,427 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 563850 2023-11-27 05:57:25,453 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 10750, loss[loss=0.06839, simple_loss=0.09429, pruned_loss=0.01321, audio_tagging_loss=0.008034, over 16452.00 frames. ], tot_loss[loss=0.06388, simple_loss=0.08757, pruned_loss=0.01154, audio_tagging_loss=0.008553, over 3042563.63 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 05:57:54,026 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3759140.0, ans=0.0 2023-11-27 05:58:01,300 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.23 vs. limit=12.0 2023-11-27 05:58:04,222 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3759206.6666666665, ans=0.1 2023-11-27 05:58:05,125 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3759206.6666666665, ans=0.125 2023-11-27 05:58:17,681 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 563900 2023-11-27 05:58:20,843 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 10800, loss[loss=0.05031, simple_loss=0.06321, pruned_loss=0.01001, audio_tagging_loss=0.008695, over 16046.00 frames. ], tot_loss[loss=0.06386, simple_loss=0.08757, pruned_loss=0.01161, audio_tagging_loss=0.008463, over 3045960.44 frames. ], batch size: 61, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:58:25,888 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3759340.0, ans=0.125 2023-11-27 05:58:33,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3759406.6666666665, ans=0.0 2023-11-27 05:58:41,785 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.32 vs. limit=22.5 2023-11-27 05:58:52,700 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.507e+01 9.069e+01 9.752e+01 1.037e+02 1.652e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-27 05:59:14,068 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 563950 2023-11-27 05:59:17,769 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 10850, loss[loss=0.08172, simple_loss=0.1055, pruned_loss=0.01753, audio_tagging_loss=0.01145, over 15884.00 frames. ], tot_loss[loss=0.06397, simple_loss=0.08742, pruned_loss=0.0117, audio_tagging_loss=0.008556, over 3048044.34 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 05:59:35,622 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3759740.0, ans=0.125 2023-11-27 05:59:37,534 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3759740.0, ans=0.015 2023-11-27 05:59:58,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3759873.3333333335, ans=0.2 2023-11-27 06:00:09,605 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3759940.0, ans=0.0 2023-11-27 06:00:10,335 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 06:00:10,398 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 564000 2023-11-27 06:00:14,987 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3760006.6666666665, ans=0.0 2023-11-27 06:00:15,711 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 10900, loss[loss=0.07514, simple_loss=0.1045, pruned_loss=0.01519, audio_tagging_loss=0.007717, over 15325.00 frames. ], tot_loss[loss=0.06483, simple_loss=0.08849, pruned_loss=0.01205, audio_tagging_loss=0.008531, over 3047473.59 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 06:00:20,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3760006.6666666665, ans=0.125 2023-11-27 06:00:20,043 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3760006.6666666665, ans=0.125 2023-11-27 06:00:21,102 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3760006.6666666665, ans=0.0 2023-11-27 06:00:24,846 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.41 vs. limit=15.0 2023-11-27 06:00:33,502 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3760073.3333333335, ans=0.125 2023-11-27 06:00:40,491 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.35 vs. limit=15.0 2023-11-27 06:00:47,456 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.724e+01 8.933e+01 9.704e+01 1.050e+02 1.255e+02, threshold=1.941e+02, percent-clipped=0.0 2023-11-27 06:00:48,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3760206.6666666665, ans=0.125 2023-11-27 06:00:58,186 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3760206.6666666665, ans=0.125 2023-11-27 06:01:07,493 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 564050 2023-11-27 06:01:07,582 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3760273.3333333335, ans=0.125 2023-11-27 06:01:10,566 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 10950, loss[loss=0.06181, simple_loss=0.09045, pruned_loss=0.01066, audio_tagging_loss=0.005924, over 15584.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08906, pruned_loss=0.01192, audio_tagging_loss=0.008514, over 3056588.81 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:01:32,057 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3760406.6666666665, ans=0.125 2023-11-27 06:01:55,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3760606.6666666665, ans=0.125 2023-11-27 06:02:03,211 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 564100 2023-11-27 06:02:06,792 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 11000, loss[loss=0.0646, simple_loss=0.09129, pruned_loss=0.01174, audio_tagging_loss=0.00722, over 15493.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08921, pruned_loss=0.01188, audio_tagging_loss=0.008574, over 3051783.13 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:02:15,273 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 06:02:38,495 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.912e+01 8.834e+01 9.840e+01 1.033e+02 1.285e+02, threshold=1.968e+02, percent-clipped=0.0 2023-11-27 06:02:53,089 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3760940.0, ans=0.0 2023-11-27 06:02:59,674 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 564150 2023-11-27 06:03:02,830 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 11050, loss[loss=0.05771, simple_loss=0.08362, pruned_loss=0.007308, audio_tagging_loss=0.008591, over 15688.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.08895, pruned_loss=0.01172, audio_tagging_loss=0.008679, over 3056637.66 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:03:03,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3761006.6666666665, ans=0.125 2023-11-27 06:03:11,426 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3761006.6666666665, ans=0.0 2023-11-27 06:03:15,015 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.63 vs. limit=12.0 2023-11-27 06:03:17,911 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3761073.3333333335, ans=0.07 2023-11-27 06:03:49,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3761273.3333333335, ans=0.2 2023-11-27 06:03:54,076 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 564200 2023-11-27 06:03:57,487 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 11100, loss[loss=0.0666, simple_loss=0.08638, pruned_loss=0.0156, audio_tagging_loss=0.007808, over 15292.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.08882, pruned_loss=0.01188, audio_tagging_loss=0.008753, over 3047945.72 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:04:01,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3761340.0, ans=0.2 2023-11-27 06:04:11,708 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3761406.6666666665, ans=0.1 2023-11-27 06:04:30,089 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.469e+01 9.171e+01 9.681e+01 1.044e+02 1.229e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-27 06:04:30,340 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3761540.0, ans=0.0 2023-11-27 06:04:44,387 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.06 vs. limit=22.5 2023-11-27 06:04:49,834 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 564250 2023-11-27 06:04:52,950 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 11150, loss[loss=0.08199, simple_loss=0.1085, pruned_loss=0.0186, audio_tagging_loss=0.009154, over 15011.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08924, pruned_loss=0.01195, audio_tagging_loss=0.008862, over 3047883.60 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:04:56,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3761673.3333333335, ans=0.125 2023-11-27 06:05:02,976 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.78 vs. limit=15.0 2023-11-27 06:05:45,984 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 564300 2023-11-27 06:05:46,228 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3761940.0, ans=0.1 2023-11-27 06:05:49,605 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 11200, loss[loss=0.07986, simple_loss=0.1098, pruned_loss=0.01653, audio_tagging_loss=0.008421, over 15957.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.08851, pruned_loss=0.01188, audio_tagging_loss=0.008956, over 3044492.95 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 06:05:52,475 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.94 vs. limit=15.0 2023-11-27 06:06:01,448 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3762073.3333333335, ans=0.0 2023-11-27 06:06:05,659 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3762073.3333333335, ans=0.125 2023-11-27 06:06:12,761 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 06:06:20,544 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3762140.0, ans=0.125 2023-11-27 06:06:22,458 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.923e+01 9.248e+01 9.841e+01 1.044e+02 1.353e+02, threshold=1.968e+02, percent-clipped=0.0 2023-11-27 06:06:41,693 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 564350 2023-11-27 06:06:44,805 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 11250, loss[loss=0.05932, simple_loss=0.07479, pruned_loss=0.01383, audio_tagging_loss=0.008098, over 15159.00 frames. ], tot_loss[loss=0.06441, simple_loss=0.08739, pruned_loss=0.0117, audio_tagging_loss=0.00902, over 3048060.54 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:06:50,558 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.54 vs. limit=15.0 2023-11-27 06:06:57,209 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3762406.6666666665, ans=0.125 2023-11-27 06:07:00,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3762406.6666666665, ans=0.1 2023-11-27 06:07:00,752 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=3762406.6666666665, ans=6.0 2023-11-27 06:07:01,455 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3762406.6666666665, ans=0.0 2023-11-27 06:07:04,984 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3762406.6666666665, ans=0.0 2023-11-27 06:07:07,219 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 06:07:14,015 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3762473.3333333335, ans=0.125 2023-11-27 06:07:33,645 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3762606.6666666665, ans=0.09899494936611666 2023-11-27 06:07:36,583 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 564400 2023-11-27 06:07:36,780 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3762606.6666666665, ans=0.125 2023-11-27 06:07:40,539 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 11300, loss[loss=0.06253, simple_loss=0.09973, pruned_loss=0.006758, audio_tagging_loss=0.005904, over 16014.00 frames. ], tot_loss[loss=0.0639, simple_loss=0.08704, pruned_loss=0.01153, audio_tagging_loss=0.008853, over 3049988.34 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:07:47,634 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3762673.3333333335, ans=0.2 2023-11-27 06:07:48,071 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.18 vs. limit=15.0 2023-11-27 06:08:07,613 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3762806.6666666665, ans=0.125 2023-11-27 06:08:13,714 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.519e+01 9.028e+01 9.588e+01 1.026e+02 1.216e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-27 06:08:15,008 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3762873.3333333335, ans=0.5 2023-11-27 06:08:26,366 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3762940.0, ans=0.0 2023-11-27 06:08:28,206 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.63 vs. limit=15.0 2023-11-27 06:08:33,505 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 564450 2023-11-27 06:08:36,656 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 11350, loss[loss=0.05828, simple_loss=0.07879, pruned_loss=0.01024, audio_tagging_loss=0.008645, over 15795.00 frames. ], tot_loss[loss=0.06381, simple_loss=0.08711, pruned_loss=0.01154, audio_tagging_loss=0.00872, over 3046124.96 frames. ], batch size: 61, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:08:48,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3763073.3333333335, ans=0.0 2023-11-27 06:09:03,400 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3763140.0, ans=10.0 2023-11-27 06:09:24,314 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3763273.3333333335, ans=0.07 2023-11-27 06:09:24,376 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3763273.3333333335, ans=0.125 2023-11-27 06:09:26,864 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3763273.3333333335, ans=0.0 2023-11-27 06:09:28,880 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 564500 2023-11-27 06:09:31,967 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 11400, loss[loss=0.04632, simple_loss=0.058, pruned_loss=0.006194, audio_tagging_loss=0.01113, over 16704.00 frames. ], tot_loss[loss=0.06416, simple_loss=0.08754, pruned_loss=0.01175, audio_tagging_loss=0.008645, over 3045669.96 frames. ], batch size: 64, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:09:40,014 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.12 vs. limit=22.5 2023-11-27 06:09:47,929 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.87 vs. limit=15.0 2023-11-27 06:09:53,157 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3763473.3333333335, ans=0.0 2023-11-27 06:09:56,250 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3763473.3333333335, ans=0.125 2023-11-27 06:10:01,070 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3763473.3333333335, ans=0.2 2023-11-27 06:10:02,174 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3763473.3333333335, ans=0.125 2023-11-27 06:10:05,121 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.669e+01 9.070e+01 9.706e+01 1.041e+02 1.301e+02, threshold=1.941e+02, percent-clipped=0.0 2023-11-27 06:10:11,156 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3763540.0, ans=0.125 2023-11-27 06:10:13,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3763540.0, ans=0.125 2023-11-27 06:10:23,054 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.54 vs. limit=22.5 2023-11-27 06:10:23,652 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 564550 2023-11-27 06:10:27,266 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 11450, loss[loss=0.06963, simple_loss=0.08778, pruned_loss=0.0152, audio_tagging_loss=0.01053, over 14466.00 frames. ], tot_loss[loss=0.06413, simple_loss=0.08732, pruned_loss=0.01178, audio_tagging_loss=0.008689, over 3040030.55 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:10:38,448 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3763740.0, ans=0.1 2023-11-27 06:10:44,889 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.46 vs. limit=15.0 2023-11-27 06:10:46,403 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3763740.0, ans=0.125 2023-11-27 06:11:04,843 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3763873.3333333335, ans=0.2 2023-11-27 06:11:14,927 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3763940.0, ans=0.07 2023-11-27 06:11:17,827 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.67 vs. limit=15.0 2023-11-27 06:11:19,522 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 564600 2023-11-27 06:11:23,084 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 11500, loss[loss=0.06081, simple_loss=0.07311, pruned_loss=0.01246, audio_tagging_loss=0.0118, over 15197.00 frames. ], tot_loss[loss=0.06422, simple_loss=0.08751, pruned_loss=0.0118, audio_tagging_loss=0.008664, over 3049664.19 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:11:55,857 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.231e+01 8.975e+01 9.589e+01 1.045e+02 1.244e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-27 06:12:14,885 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 564650 2023-11-27 06:12:18,034 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 11550, loss[loss=0.05208, simple_loss=0.07026, pruned_loss=0.008855, audio_tagging_loss=0.008099, over 15246.00 frames. ], tot_loss[loss=0.0642, simple_loss=0.08761, pruned_loss=0.01185, audio_tagging_loss=0.008549, over 3051806.34 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 8.0 2023-11-27 06:12:21,916 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3764340.0, ans=0.1 2023-11-27 06:12:21,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3764340.0, ans=0.0 2023-11-27 06:12:26,352 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3764340.0, ans=0.125 2023-11-27 06:12:52,045 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 06:12:55,376 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3764540.0, ans=0.125 2023-11-27 06:12:56,675 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.73 vs. limit=12.0 2023-11-27 06:12:59,617 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3764540.0, ans=0.0 2023-11-27 06:13:05,297 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3764606.6666666665, ans=0.125 2023-11-27 06:13:07,790 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.34 vs. limit=12.0 2023-11-27 06:13:10,510 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 564700 2023-11-27 06:13:12,929 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3764673.3333333335, ans=0.2 2023-11-27 06:13:13,696 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 11600, loss[loss=0.09085, simple_loss=0.132, pruned_loss=0.0185, audio_tagging_loss=0.006332, over 16403.00 frames. ], tot_loss[loss=0.06412, simple_loss=0.08768, pruned_loss=0.01177, audio_tagging_loss=0.008506, over 3051839.35 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:13:18,645 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3764673.3333333335, ans=0.1 2023-11-27 06:13:33,670 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3764740.0, ans=0.0 2023-11-27 06:13:33,976 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.86 vs. limit=12.0 2023-11-27 06:13:35,664 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3764806.6666666665, ans=0.1 2023-11-27 06:13:48,657 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.716e+01 9.097e+01 9.719e+01 1.053e+02 1.388e+02, threshold=1.944e+02, percent-clipped=0.0 2023-11-27 06:14:06,623 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 564750 2023-11-27 06:14:06,857 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3764940.0, ans=0.0 2023-11-27 06:14:07,878 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3764940.0, ans=0.05 2023-11-27 06:14:09,723 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 11650, loss[loss=0.04091, simple_loss=0.05342, pruned_loss=0.003563, audio_tagging_loss=0.01063, over 14578.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08915, pruned_loss=0.01191, audio_tagging_loss=0.008509, over 3046137.33 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:14:09,886 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3765006.6666666665, ans=0.2 2023-11-27 06:14:15,987 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3765006.6666666665, ans=10.0 2023-11-27 06:14:35,319 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3765140.0, ans=0.1 2023-11-27 06:15:02,231 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 564800 2023-11-27 06:15:05,659 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 11700, loss[loss=0.06426, simple_loss=0.09127, pruned_loss=0.008137, audio_tagging_loss=0.01049, over 15413.00 frames. ], tot_loss[loss=0.06475, simple_loss=0.08847, pruned_loss=0.01191, audio_tagging_loss=0.008602, over 3053126.10 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:15:14,976 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3765340.0, ans=0.125 2023-11-27 06:15:16,013 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3765406.6666666665, ans=0.0 2023-11-27 06:15:16,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3765406.6666666665, ans=0.2 2023-11-27 06:15:24,960 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3765406.6666666665, ans=10.0 2023-11-27 06:15:27,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3765473.3333333335, ans=0.125 2023-11-27 06:15:28,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3765473.3333333335, ans=0.125 2023-11-27 06:15:32,998 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3765473.3333333335, ans=0.1 2023-11-27 06:15:34,554 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.07 vs. limit=15.0 2023-11-27 06:15:40,831 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.747e+01 8.890e+01 9.642e+01 1.040e+02 1.676e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-27 06:15:41,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3765540.0, ans=0.125 2023-11-27 06:15:49,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3765606.6666666665, ans=0.125 2023-11-27 06:15:53,674 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3765606.6666666665, ans=0.0 2023-11-27 06:15:53,751 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3765606.6666666665, ans=0.125 2023-11-27 06:15:58,272 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 564850 2023-11-27 06:16:01,363 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 11750, loss[loss=0.0752, simple_loss=0.1069, pruned_loss=0.01476, audio_tagging_loss=0.006973, over 14297.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08934, pruned_loss=0.01197, audio_tagging_loss=0.008575, over 3045236.06 frames. ], batch size: 53, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:16:12,016 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3765740.0, ans=0.0 2023-11-27 06:16:18,982 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3765740.0, ans=0.125 2023-11-27 06:16:20,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3765740.0, ans=0.2 2023-11-27 06:16:26,885 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=3765806.6666666665, ans=10.0 2023-11-27 06:16:50,582 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3765940.0, ans=0.1 2023-11-27 06:16:54,070 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 564900 2023-11-27 06:16:57,175 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 11800, loss[loss=0.07356, simple_loss=0.08587, pruned_loss=0.01732, audio_tagging_loss=0.0133, over 14521.00 frames. ], tot_loss[loss=0.06477, simple_loss=0.08851, pruned_loss=0.01188, audio_tagging_loss=0.008635, over 3046289.39 frames. ], batch size: 54, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:17:16,942 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3766073.3333333335, ans=0.2 2023-11-27 06:17:31,474 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.000e+01 8.890e+01 9.734e+01 1.045e+02 1.276e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-27 06:17:37,682 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3766206.6666666665, ans=0.04949747468305833 2023-11-27 06:17:37,953 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.97 vs. limit=10.0 2023-11-27 06:17:49,606 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 564950 2023-11-27 06:17:52,738 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 11850, loss[loss=0.07555, simple_loss=0.1056, pruned_loss=0.01219, audio_tagging_loss=0.01057, over 15146.00 frames. ], tot_loss[loss=0.06475, simple_loss=0.08819, pruned_loss=0.01196, audio_tagging_loss=0.008698, over 3040634.60 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:17:54,335 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.14 vs. limit=15.0 2023-11-27 06:18:03,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3766406.6666666665, ans=0.09899494936611666 2023-11-27 06:18:05,748 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3766406.6666666665, ans=0.125 2023-11-27 06:18:08,379 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3766406.6666666665, ans=0.1 2023-11-27 06:18:35,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3766540.0, ans=0.0 2023-11-27 06:18:41,812 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3766606.6666666665, ans=0.05 2023-11-27 06:18:44,753 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 565000 2023-11-27 06:18:48,173 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 11900, loss[loss=0.05146, simple_loss=0.07375, pruned_loss=0.005182, audio_tagging_loss=0.009405, over 16006.00 frames. ], tot_loss[loss=0.06488, simple_loss=0.08866, pruned_loss=0.01184, audio_tagging_loss=0.008707, over 3041732.09 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:18:50,513 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 06:19:12,269 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=3766806.6666666665, ans=15.0 2023-11-27 06:19:23,224 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.850e+01 8.781e+01 9.449e+01 1.024e+02 1.260e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-27 06:19:41,259 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 565050 2023-11-27 06:19:44,894 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 11950, loss[loss=0.03353, simple_loss=0.04118, pruned_loss=0.003859, audio_tagging_loss=0.009079, over 16319.00 frames. ], tot_loss[loss=0.06454, simple_loss=0.08789, pruned_loss=0.01169, audio_tagging_loss=0.008909, over 3040954.73 frames. ], batch size: 64, lr: 1.43e-03, grad_scale: 16.0 2023-11-27 06:19:55,104 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3767073.3333333335, ans=0.1 2023-11-27 06:19:59,672 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.87 vs. limit=22.5 2023-11-27 06:20:04,657 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3767073.3333333335, ans=0.125 2023-11-27 06:20:07,816 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3767140.0, ans=0.125 2023-11-27 06:20:18,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3767206.6666666665, ans=0.0 2023-11-27 06:20:26,827 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3767273.3333333335, ans=0.125 2023-11-27 06:20:27,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3767273.3333333335, ans=0.5 2023-11-27 06:20:34,554 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3767273.3333333335, ans=0.1 2023-11-27 06:20:35,351 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 565100 2023-11-27 06:20:36,569 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 06:20:38,327 INFO [train_asr.py:1235] (3/4) Epoch 47, batch 12000, loss[loss=0.08877, simple_loss=0.1243, pruned_loss=0.0202, audio_tagging_loss=0.0064, over 15742.00 frames. ], tot_loss[loss=0.06483, simple_loss=0.08818, pruned_loss=0.01182, audio_tagging_loss=0.008925, over 3044434.10 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 32.0 2023-11-27 06:20:38,327 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-27 06:21:10,514 INFO [train_asr.py:1267] (3/4) Epoch 47, validation: loss=0.0578, simple_loss=0.05045, pruned_loss=0.005285, audio_tagging_loss=0.02729, over 4681554.00 frames. 2023-11-27 06:21:10,515 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-27 06:21:30,185 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.29 vs. limit=12.0 2023-11-27 06:22:01,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3767493.3333333335, ans=0.125 2023-11-27 06:22:02,645 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 0, loss[loss=0.06501, simple_loss=0.07069, pruned_loss=0.009055, audio_tagging_loss=0.02061, over 15043.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.07069, pruned_loss=0.009055, audio_tagging_loss=0.02061, over 15043.00 frames. ], batch size: 60, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 06:22:02,646 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-27 06:22:33,981 INFO [train_asr.py:1267] (3/4) Epoch 48, validation: loss=0.05791, simple_loss=0.05045, pruned_loss=0.005281, audio_tagging_loss=0.02741, over 4681554.00 frames. 2023-11-27 06:22:33,982 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-27 06:22:39,346 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.92 vs. limit=15.0 2023-11-27 06:22:43,366 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.387e+01 9.223e+01 9.944e+01 1.084e+02 1.467e+02, threshold=1.989e+02, percent-clipped=0.0 2023-11-27 06:22:52,658 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3767560.0, ans=0.125 2023-11-27 06:23:00,872 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 565150 2023-11-27 06:23:08,428 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3767693.3333333335, ans=0.0 2023-11-27 06:23:22,257 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3767760.0, ans=0.125 2023-11-27 06:23:29,995 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 50, loss[loss=0.07634, simple_loss=0.1008, pruned_loss=0.0126, audio_tagging_loss=0.01335, over 15226.00 frames. ], tot_loss[loss=0.07391, simple_loss=0.09014, pruned_loss=0.01244, audio_tagging_loss=0.0164, over 683062.57 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:23:56,516 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 565200 2023-11-27 06:24:20,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3768093.3333333335, ans=10.0 2023-11-27 06:24:24,112 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3768093.3333333335, ans=0.0 2023-11-27 06:24:26,093 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 100, loss[loss=0.06307, simple_loss=0.08225, pruned_loss=0.009946, audio_tagging_loss=0.012, over 16767.00 frames. ], tot_loss[loss=0.07301, simple_loss=0.09085, pruned_loss=0.01196, audio_tagging_loss=0.01563, over 1208161.55 frames. ], batch size: 62, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:24:28,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3768160.0, ans=0.125 2023-11-27 06:24:32,659 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3768160.0, ans=0.0 2023-11-27 06:24:35,731 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.383e+01 1.012e+02 1.072e+02 1.151e+02 1.382e+02, threshold=2.144e+02, percent-clipped=0.0 2023-11-27 06:24:45,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3768226.6666666665, ans=0.2 2023-11-27 06:24:47,078 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3768293.3333333335, ans=0.125 2023-11-27 06:24:53,264 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 565250 2023-11-27 06:25:02,598 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3768360.0, ans=0.125 2023-11-27 06:25:12,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3768426.6666666665, ans=0.0 2023-11-27 06:25:14,734 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.61 vs. limit=15.0 2023-11-27 06:25:21,915 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 150, loss[loss=0.08536, simple_loss=0.1075, pruned_loss=0.02276, audio_tagging_loss=0.008838, over 15386.00 frames. ], tot_loss[loss=0.07205, simple_loss=0.09153, pruned_loss=0.01217, audio_tagging_loss=0.01412, over 1617902.29 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:25:24,340 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3768493.3333333335, ans=0.2 2023-11-27 06:25:27,455 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3768493.3333333335, ans=0.025 2023-11-27 06:25:32,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3768560.0, ans=0.2 2023-11-27 06:25:35,079 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.36 vs. limit=22.5 2023-11-27 06:25:35,859 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3768560.0, ans=0.125 2023-11-27 06:25:38,336 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.37 vs. limit=15.0 2023-11-27 06:25:39,787 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3768560.0, ans=0.125 2023-11-27 06:25:44,990 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3768626.6666666665, ans=0.125 2023-11-27 06:25:47,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3768626.6666666665, ans=0.0 2023-11-27 06:25:49,129 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 565300 2023-11-27 06:26:18,089 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 200, loss[loss=0.06226, simple_loss=0.08434, pruned_loss=0.01148, audio_tagging_loss=0.008603, over 16282.00 frames. ], tot_loss[loss=0.06956, simple_loss=0.08997, pruned_loss=0.01206, audio_tagging_loss=0.01252, over 1940456.98 frames. ], batch size: 61, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:26:28,100 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.314e+01 9.239e+01 9.831e+01 1.046e+02 1.283e+02, threshold=1.966e+02, percent-clipped=0.0 2023-11-27 06:26:31,660 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3768893.3333333335, ans=0.125 2023-11-27 06:26:33,792 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3768893.3333333335, ans=0.0 2023-11-27 06:26:44,275 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 565350 2023-11-27 06:26:44,722 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.65 vs. limit=15.0 2023-11-27 06:26:50,382 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.01 vs. limit=15.0 2023-11-27 06:27:02,761 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3769093.3333333335, ans=0.125 2023-11-27 06:27:06,493 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3769093.3333333335, ans=0.125 2023-11-27 06:27:06,526 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3769093.3333333335, ans=0.0 2023-11-27 06:27:13,803 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 250, loss[loss=0.07098, simple_loss=0.09386, pruned_loss=0.01363, audio_tagging_loss=0.01041, over 14478.00 frames. ], tot_loss[loss=0.06796, simple_loss=0.08879, pruned_loss=0.01205, audio_tagging_loss=0.01151, over 2190972.07 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:27:16,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3769160.0, ans=0.2 2023-11-27 06:27:35,607 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3769293.3333333335, ans=0.0 2023-11-27 06:27:40,219 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 565400 2023-11-27 06:27:45,505 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 06:27:47,234 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3769360.0, ans=0.1 2023-11-27 06:27:48,374 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3769360.0, ans=0.1 2023-11-27 06:27:49,698 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.90 vs. limit=12.0 2023-11-27 06:28:09,441 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 300, loss[loss=0.06988, simple_loss=0.09834, pruned_loss=0.01121, audio_tagging_loss=0.009495, over 14407.00 frames. ], tot_loss[loss=0.06758, simple_loss=0.08944, pruned_loss=0.01222, audio_tagging_loss=0.01064, over 2378854.76 frames. ], batch size: 53, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 06:28:20,486 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.290e+01 9.040e+01 9.670e+01 1.035e+02 1.237e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-27 06:28:37,043 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 565450 2023-11-27 06:28:51,053 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3769693.3333333335, ans=0.0 2023-11-27 06:29:05,745 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 350, loss[loss=0.04953, simple_loss=0.06912, pruned_loss=0.006055, audio_tagging_loss=0.008916, over 15714.00 frames. ], tot_loss[loss=0.06679, simple_loss=0.08942, pruned_loss=0.01197, audio_tagging_loss=0.01012, over 2528406.44 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 06:29:05,975 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3769826.6666666665, ans=0.125 2023-11-27 06:29:32,258 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 565500 2023-11-27 06:29:42,413 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3770026.6666666665, ans=10.0 2023-11-27 06:29:54,143 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3770093.3333333335, ans=0.0 2023-11-27 06:29:54,210 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3770093.3333333335, ans=0.0 2023-11-27 06:30:01,395 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 400, loss[loss=0.07552, simple_loss=0.1079, pruned_loss=0.01246, audio_tagging_loss=0.009122, over 15769.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.08899, pruned_loss=0.01178, audio_tagging_loss=0.009712, over 2646817.10 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:30:11,977 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.578e+01 8.851e+01 9.492e+01 1.029e+02 1.198e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-27 06:30:16,363 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3770226.6666666665, ans=0.0 2023-11-27 06:30:26,739 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3770293.3333333335, ans=0.125 2023-11-27 06:30:27,500 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 565550 2023-11-27 06:30:56,995 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 450, loss[loss=0.06932, simple_loss=0.0989, pruned_loss=0.01242, audio_tagging_loss=0.007456, over 15272.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.08947, pruned_loss=0.01193, audio_tagging_loss=0.009427, over 2733429.97 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:31:05,332 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.58 vs. limit=15.0 2023-11-27 06:31:10,408 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3770560.0, ans=0.0 2023-11-27 06:31:22,963 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.35 vs. limit=15.0 2023-11-27 06:31:24,672 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 565600 2023-11-27 06:31:24,833 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3770626.6666666665, ans=0.0 2023-11-27 06:31:31,802 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.85 vs. limit=15.0 2023-11-27 06:31:44,328 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.25 vs. limit=15.0 2023-11-27 06:31:53,041 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 500, loss[loss=0.07179, simple_loss=0.1006, pruned_loss=0.01237, audio_tagging_loss=0.009128, over 15424.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08879, pruned_loss=0.01184, audio_tagging_loss=0.00926, over 2799016.24 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:32:00,050 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3770826.6666666665, ans=0.125 2023-11-27 06:32:05,262 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.449e+01 9.003e+01 9.804e+01 1.033e+02 1.335e+02, threshold=1.961e+02, percent-clipped=0.0 2023-11-27 06:32:07,649 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3770893.3333333335, ans=0.125 2023-11-27 06:32:19,441 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.26 vs. limit=15.0 2023-11-27 06:32:20,175 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 565650 2023-11-27 06:32:23,425 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3770960.0, ans=0.1 2023-11-27 06:32:41,761 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3771093.3333333335, ans=0.125 2023-11-27 06:32:50,024 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 550, loss[loss=0.06226, simple_loss=0.08014, pruned_loss=0.009989, audio_tagging_loss=0.0122, over 15634.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.08952, pruned_loss=0.01188, audio_tagging_loss=0.009101, over 2848386.17 frames. ], batch size: 60, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:33:16,277 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 565700 2023-11-27 06:33:29,233 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3771360.0, ans=0.125 2023-11-27 06:33:36,186 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 06:33:44,893 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.84 vs. limit=15.0 2023-11-27 06:33:45,511 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 600, loss[loss=0.05523, simple_loss=0.07713, pruned_loss=0.009142, audio_tagging_loss=0.007521, over 15066.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08936, pruned_loss=0.01186, audio_tagging_loss=0.009042, over 2890052.17 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:33:51,611 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3771493.3333333335, ans=0.125 2023-11-27 06:33:56,616 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.401e+01 8.995e+01 9.614e+01 1.020e+02 1.289e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-27 06:34:09,630 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 06:34:12,576 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 565750 2023-11-27 06:34:17,867 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.83 vs. limit=12.0 2023-11-27 06:34:41,268 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 650, loss[loss=0.0532, simple_loss=0.0679, pruned_loss=0.01078, audio_tagging_loss=0.008469, over 15953.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08937, pruned_loss=0.0119, audio_tagging_loss=0.008948, over 2925197.36 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:34:52,064 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 06:34:58,453 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3771893.3333333335, ans=0.125 2023-11-27 06:35:08,429 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 565800 2023-11-27 06:35:16,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3772026.6666666665, ans=0.125 2023-11-27 06:35:26,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3772093.3333333335, ans=0.0 2023-11-27 06:35:37,984 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 700, loss[loss=0.04588, simple_loss=0.06399, pruned_loss=0.007727, audio_tagging_loss=0.00616, over 15396.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08881, pruned_loss=0.01175, audio_tagging_loss=0.008923, over 2948062.88 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:35:46,568 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.68 vs. limit=15.0 2023-11-27 06:35:49,135 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.837e+01 8.967e+01 9.614e+01 1.031e+02 1.404e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-27 06:35:54,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3772226.6666666665, ans=0.05 2023-11-27 06:35:56,943 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3772226.6666666665, ans=0.1 2023-11-27 06:36:04,698 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 565850 2023-11-27 06:36:13,356 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.50 vs. limit=22.5 2023-11-27 06:36:18,733 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3772360.0, ans=0.1 2023-11-27 06:36:28,946 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3772426.6666666665, ans=0.125 2023-11-27 06:36:33,925 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 750, loss[loss=0.06795, simple_loss=0.09188, pruned_loss=0.01223, audio_tagging_loss=0.009784, over 14388.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.08869, pruned_loss=0.01172, audio_tagging_loss=0.008885, over 2970179.88 frames. ], batch size: 53, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:36:56,240 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3772626.6666666665, ans=0.125 2023-11-27 06:37:01,094 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 565900 2023-11-27 06:37:16,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3772693.3333333335, ans=0.125 2023-11-27 06:37:18,694 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3772760.0, ans=0.125 2023-11-27 06:37:19,754 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3772760.0, ans=0.0 2023-11-27 06:37:29,623 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 800, loss[loss=0.06439, simple_loss=0.09182, pruned_loss=0.009691, audio_tagging_loss=0.008793, over 15698.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08949, pruned_loss=0.01188, audio_tagging_loss=0.008812, over 2991376.92 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 06:37:40,742 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.719e+01 9.165e+01 9.801e+01 1.067e+02 1.276e+02, threshold=1.960e+02, percent-clipped=0.0 2023-11-27 06:37:52,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3772960.0, ans=0.04949747468305833 2023-11-27 06:37:56,674 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 565950 2023-11-27 06:38:05,404 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3773026.6666666665, ans=0.1 2023-11-27 06:38:12,362 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3773026.6666666665, ans=0.1 2023-11-27 06:38:12,429 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3773026.6666666665, ans=0.125 2023-11-27 06:38:26,119 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 850, loss[loss=0.06325, simple_loss=0.08751, pruned_loss=0.01119, audio_tagging_loss=0.008314, over 15035.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08997, pruned_loss=0.0119, audio_tagging_loss=0.008797, over 3005814.83 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:38:34,076 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.53 vs. limit=15.0 2023-11-27 06:38:51,673 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3773293.3333333335, ans=0.0 2023-11-27 06:38:52,612 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 566000 2023-11-27 06:38:55,095 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3773293.3333333335, ans=0.07 2023-11-27 06:39:01,121 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.34 vs. limit=15.0 2023-11-27 06:39:04,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3773360.0, ans=0.025 2023-11-27 06:39:06,926 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3773360.0, ans=0.125 2023-11-27 06:39:12,286 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3773426.6666666665, ans=0.0 2023-11-27 06:39:18,741 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3773426.6666666665, ans=0.0 2023-11-27 06:39:20,865 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3773493.3333333335, ans=0.0 2023-11-27 06:39:21,793 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 900, loss[loss=0.1112, simple_loss=0.1712, pruned_loss=0.02118, audio_tagging_loss=0.004435, over 16182.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.09019, pruned_loss=0.01185, audio_tagging_loss=0.008837, over 3015767.84 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:39:24,654 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3773493.3333333335, ans=0.2 2023-11-27 06:39:34,546 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.653e+01 8.912e+01 9.588e+01 1.041e+02 1.300e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-27 06:39:49,505 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 566050 2023-11-27 06:39:58,162 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3773693.3333333335, ans=0.0 2023-11-27 06:39:59,110 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=3773693.3333333335, ans=0.025 2023-11-27 06:40:11,440 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3773760.0, ans=0.125 2023-11-27 06:40:18,033 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 950, loss[loss=0.06148, simple_loss=0.09031, pruned_loss=0.008762, audio_tagging_loss=0.00756, over 15687.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08977, pruned_loss=0.01181, audio_tagging_loss=0.008773, over 3022929.65 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:40:30,545 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3773893.3333333335, ans=0.0 2023-11-27 06:40:32,269 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3773893.3333333335, ans=0.125 2023-11-27 06:40:43,886 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3773960.0, ans=0.125 2023-11-27 06:40:44,814 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 566100 2023-11-27 06:40:45,296 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.46 vs. limit=22.5 2023-11-27 06:40:47,132 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3773960.0, ans=0.0 2023-11-27 06:41:02,110 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 06:41:09,041 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3774093.3333333335, ans=0.1 2023-11-27 06:41:14,692 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 1000, loss[loss=0.05335, simple_loss=0.06529, pruned_loss=0.01009, audio_tagging_loss=0.01061, over 14971.00 frames. ], tot_loss[loss=0.06467, simple_loss=0.08863, pruned_loss=0.0117, audio_tagging_loss=0.008654, over 3022047.21 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:41:23,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3774160.0, ans=0.125 2023-11-27 06:41:26,266 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.762e+01 9.089e+01 9.617e+01 1.045e+02 2.026e+02, threshold=1.923e+02, percent-clipped=1.0 2023-11-27 06:41:29,749 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3774226.6666666665, ans=0.125 2023-11-27 06:41:36,512 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3774293.3333333335, ans=0.125 2023-11-27 06:41:37,414 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 06:41:37,888 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.76 vs. limit=15.0 2023-11-27 06:41:39,188 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3774293.3333333335, ans=0.125 2023-11-27 06:41:41,205 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 566150 2023-11-27 06:41:50,092 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.62 vs. limit=15.0 2023-11-27 06:42:03,857 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3774426.6666666665, ans=0.1 2023-11-27 06:42:10,052 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 1050, loss[loss=0.06006, simple_loss=0.08001, pruned_loss=0.01209, audio_tagging_loss=0.007968, over 15300.00 frames. ], tot_loss[loss=0.06436, simple_loss=0.08828, pruned_loss=0.01165, audio_tagging_loss=0.00857, over 3036264.21 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:42:10,185 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=3774493.3333333335, ans=10.0 2023-11-27 06:42:10,224 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3774493.3333333335, ans=0.04949747468305833 2023-11-27 06:42:20,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3774560.0, ans=0.125 2023-11-27 06:42:21,917 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3774560.0, ans=0.1 2023-11-27 06:42:22,051 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.32 vs. limit=12.0 2023-11-27 06:42:25,107 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3774560.0, ans=0.0 2023-11-27 06:42:37,070 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 566200 2023-11-27 06:42:42,995 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3774693.3333333335, ans=0.09899494936611666 2023-11-27 06:42:47,270 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 06:43:00,043 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3774760.0, ans=0.07 2023-11-27 06:43:00,122 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3774760.0, ans=0.0 2023-11-27 06:43:06,157 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 1100, loss[loss=0.0505, simple_loss=0.07305, pruned_loss=0.006002, audio_tagging_loss=0.007979, over 13991.00 frames. ], tot_loss[loss=0.06479, simple_loss=0.0893, pruned_loss=0.01166, audio_tagging_loss=0.008478, over 3036971.65 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:43:07,425 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3774826.6666666665, ans=0.05 2023-11-27 06:43:08,306 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 06:43:18,468 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.493e+01 8.812e+01 9.589e+01 1.029e+02 1.833e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-27 06:43:25,793 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3774893.3333333335, ans=0.0 2023-11-27 06:43:31,043 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3774960.0, ans=0.125 2023-11-27 06:43:33,087 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 566250 2023-11-27 06:43:44,022 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.67 vs. limit=15.0 2023-11-27 06:44:02,279 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 1150, loss[loss=0.07109, simple_loss=0.09639, pruned_loss=0.01487, audio_tagging_loss=0.008024, over 15040.00 frames. ], tot_loss[loss=0.06467, simple_loss=0.089, pruned_loss=0.0117, audio_tagging_loss=0.008476, over 3037958.27 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:44:05,133 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3775160.0, ans=0.125 2023-11-27 06:44:09,491 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3775160.0, ans=0.125 2023-11-27 06:44:25,467 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 06:44:27,634 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3775293.3333333335, ans=0.125 2023-11-27 06:44:28,487 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 566300 2023-11-27 06:44:57,787 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 1200, loss[loss=0.05889, simple_loss=0.0759, pruned_loss=0.01145, audio_tagging_loss=0.009494, over 14901.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08992, pruned_loss=0.01199, audio_tagging_loss=0.008457, over 3037559.91 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 06:44:59,012 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3775493.3333333335, ans=0.2 2023-11-27 06:45:09,565 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.024e+01 8.861e+01 9.720e+01 1.059e+02 1.366e+02, threshold=1.944e+02, percent-clipped=0.0 2023-11-27 06:45:24,964 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 566350 2023-11-27 06:45:35,094 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3775693.3333333335, ans=0.0 2023-11-27 06:45:37,367 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3775693.3333333335, ans=0.125 2023-11-27 06:45:41,953 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.83 vs. limit=6.0 2023-11-27 06:45:43,905 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3775760.0, ans=0.125 2023-11-27 06:45:53,306 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 1250, loss[loss=0.086, simple_loss=0.1296, pruned_loss=0.01389, audio_tagging_loss=0.007322, over 15555.00 frames. ], tot_loss[loss=0.06441, simple_loss=0.08852, pruned_loss=0.01172, audio_tagging_loss=0.008428, over 3030731.97 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 06:46:04,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3775893.3333333335, ans=0.125 2023-11-27 06:46:15,204 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.03 vs. limit=10.0 2023-11-27 06:46:15,768 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3775960.0, ans=0.0 2023-11-27 06:46:21,020 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 566400 2023-11-27 06:46:21,204 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 06:46:23,591 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 06:46:23,649 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3775960.0, ans=0.125 2023-11-27 06:46:24,669 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3775960.0, ans=0.1 2023-11-27 06:46:38,916 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3776093.3333333335, ans=0.1 2023-11-27 06:46:50,502 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 1300, loss[loss=0.05402, simple_loss=0.0788, pruned_loss=0.008088, audio_tagging_loss=0.006528, over 15710.00 frames. ], tot_loss[loss=0.06351, simple_loss=0.08705, pruned_loss=0.01148, audio_tagging_loss=0.008503, over 3032272.60 frames. ], batch size: 60, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 06:46:51,727 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3776160.0, ans=0.125 2023-11-27 06:47:02,754 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.604e+01 8.746e+01 9.407e+01 9.901e+01 1.217e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-27 06:47:07,368 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3776226.6666666665, ans=0.2 2023-11-27 06:47:10,415 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3776226.6666666665, ans=0.125 2023-11-27 06:47:10,483 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3776226.6666666665, ans=0.1 2023-11-27 06:47:16,627 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 566450 2023-11-27 06:47:30,534 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3776360.0, ans=0.0 2023-11-27 06:47:41,681 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.27 vs. limit=15.0 2023-11-27 06:47:45,428 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3776493.3333333335, ans=0.2 2023-11-27 06:47:46,226 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 1350, loss[loss=0.07193, simple_loss=0.1001, pruned_loss=0.01269, audio_tagging_loss=0.009185, over 14629.00 frames. ], tot_loss[loss=0.06417, simple_loss=0.08801, pruned_loss=0.01169, audio_tagging_loss=0.008472, over 3023098.11 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:47:54,182 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=5.50 vs. limit=15.0 2023-11-27 06:47:59,642 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.10 vs. limit=12.0 2023-11-27 06:48:11,943 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3776626.6666666665, ans=0.125 2023-11-27 06:48:12,896 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 566500 2023-11-27 06:48:26,727 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 06:48:27,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3776693.3333333335, ans=0.125 2023-11-27 06:48:36,643 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3776760.0, ans=0.0 2023-11-27 06:48:37,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3776760.0, ans=0.0 2023-11-27 06:48:41,663 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 1400, loss[loss=0.05686, simple_loss=0.07111, pruned_loss=0.009665, audio_tagging_loss=0.01164, over 16457.00 frames. ], tot_loss[loss=0.06355, simple_loss=0.08696, pruned_loss=0.0115, audio_tagging_loss=0.008572, over 3028011.30 frames. ], batch size: 61, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:48:53,258 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3776893.3333333335, ans=0.125 2023-11-27 06:48:55,604 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.486e+01 8.911e+01 9.491e+01 1.013e+02 1.381e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-27 06:48:58,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3776893.3333333335, ans=0.0 2023-11-27 06:49:09,732 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 566550 2023-11-27 06:49:17,709 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.65 vs. limit=15.0 2023-11-27 06:49:35,847 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3777093.3333333335, ans=0.125 2023-11-27 06:49:38,317 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 1450, loss[loss=0.05863, simple_loss=0.07787, pruned_loss=0.009739, audio_tagging_loss=0.009958, over 15891.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08863, pruned_loss=0.01196, audio_tagging_loss=0.008632, over 3025878.06 frames. ], batch size: 63, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:49:41,243 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3777160.0, ans=0.125 2023-11-27 06:49:54,469 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3777226.6666666665, ans=0.125 2023-11-27 06:49:59,837 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3777293.3333333335, ans=0.125 2023-11-27 06:50:05,180 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 566600 2023-11-27 06:50:16,630 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3777360.0, ans=0.125 2023-11-27 06:50:17,767 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3777360.0, ans=0.125 2023-11-27 06:50:31,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3777426.6666666665, ans=0.125 2023-11-27 06:50:34,477 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 1500, loss[loss=0.04235, simple_loss=0.0538, pruned_loss=0.006245, audio_tagging_loss=0.009199, over 14554.00 frames. ], tot_loss[loss=0.06489, simple_loss=0.08834, pruned_loss=0.01198, audio_tagging_loss=0.008743, over 3030880.51 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:50:37,887 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3777493.3333333335, ans=0.125 2023-11-27 06:50:39,218 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.25 vs. limit=22.5 2023-11-27 06:50:41,287 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.58 vs. limit=6.0 2023-11-27 06:50:47,153 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.689e+01 9.111e+01 9.628e+01 1.038e+02 1.478e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-27 06:51:00,539 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 566650 2023-11-27 06:51:01,084 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.96 vs. limit=15.0 2023-11-27 06:51:01,224 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.03 vs. limit=15.0 2023-11-27 06:51:01,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3777626.6666666665, ans=0.125 2023-11-27 06:51:10,592 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.15 vs. limit=15.0 2023-11-27 06:51:12,886 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3777693.3333333335, ans=0.125 2023-11-27 06:51:17,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3777693.3333333335, ans=0.125 2023-11-27 06:51:18,289 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3777760.0, ans=0.125 2023-11-27 06:51:20,490 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3777760.0, ans=0.0 2023-11-27 06:51:29,862 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 1550, loss[loss=0.08213, simple_loss=0.1165, pruned_loss=0.01786, audio_tagging_loss=0.006002, over 15308.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08965, pruned_loss=0.01231, audio_tagging_loss=0.008636, over 3036213.35 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:51:32,178 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3777826.6666666665, ans=0.125 2023-11-27 06:51:34,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3777826.6666666665, ans=0.1 2023-11-27 06:51:38,129 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3777826.6666666665, ans=0.1 2023-11-27 06:51:47,054 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3777893.3333333335, ans=0.0 2023-11-27 06:51:57,622 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 566700 2023-11-27 06:52:07,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3778026.6666666665, ans=0.0 2023-11-27 06:52:08,605 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3778026.6666666665, ans=10.0 2023-11-27 06:52:25,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3778160.0, ans=0.1 2023-11-27 06:52:26,214 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 1600, loss[loss=0.07016, simple_loss=0.096, pruned_loss=0.01305, audio_tagging_loss=0.009106, over 15023.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.09, pruned_loss=0.01223, audio_tagging_loss=0.008666, over 3042930.80 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 06:52:35,111 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3778160.0, ans=0.125 2023-11-27 06:52:40,788 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.447e+01 9.269e+01 9.916e+01 1.064e+02 1.389e+02, threshold=1.983e+02, percent-clipped=0.0 2023-11-27 06:52:53,731 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 566750 2023-11-27 06:52:57,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3778293.3333333335, ans=0.125 2023-11-27 06:53:06,349 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3778360.0, ans=0.125 2023-11-27 06:53:14,323 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3778426.6666666665, ans=0.125 2023-11-27 06:53:23,718 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 1650, loss[loss=0.05205, simple_loss=0.07214, pruned_loss=0.006776, audio_tagging_loss=0.009204, over 15543.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08947, pruned_loss=0.01202, audio_tagging_loss=0.008732, over 3036131.44 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:53:24,017 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3778493.3333333335, ans=0.125 2023-11-27 06:53:32,388 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3778493.3333333335, ans=0.125 2023-11-27 06:53:43,308 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3778560.0, ans=0.2 2023-11-27 06:53:50,108 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 566800 2023-11-27 06:54:19,585 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 1700, loss[loss=0.06213, simple_loss=0.08616, pruned_loss=0.01191, audio_tagging_loss=0.00714, over 15591.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08959, pruned_loss=0.01203, audio_tagging_loss=0.008692, over 3038611.75 frames. ], batch size: 61, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:54:27,348 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3778826.6666666665, ans=0.125 2023-11-27 06:54:27,364 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3778826.6666666665, ans=0.125 2023-11-27 06:54:34,161 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.403e+01 8.972e+01 9.443e+01 1.035e+02 1.327e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-27 06:54:38,209 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3778893.3333333335, ans=0.125 2023-11-27 06:54:40,329 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3778893.3333333335, ans=0.0 2023-11-27 06:54:42,859 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.04 vs. limit=10.0 2023-11-27 06:54:47,285 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 566850 2023-11-27 06:54:51,890 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.85 vs. limit=15.0 2023-11-27 06:54:54,730 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3779026.6666666665, ans=0.05 2023-11-27 06:54:59,139 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3779026.6666666665, ans=0.0 2023-11-27 06:54:59,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3779026.6666666665, ans=0.125 2023-11-27 06:55:05,622 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3779093.3333333335, ans=0.04949747468305833 2023-11-27 06:55:15,408 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 1750, loss[loss=0.05777, simple_loss=0.07792, pruned_loss=0.009532, audio_tagging_loss=0.009279, over 15921.00 frames. ], tot_loss[loss=0.06456, simple_loss=0.08803, pruned_loss=0.01182, audio_tagging_loss=0.00873, over 3039106.43 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:55:17,697 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3779160.0, ans=0.1 2023-11-27 06:55:20,390 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3779160.0, ans=0.0 2023-11-27 06:55:21,994 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.68 vs. limit=12.0 2023-11-27 06:55:29,400 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3779226.6666666665, ans=0.0 2023-11-27 06:55:42,620 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 566900 2023-11-27 06:56:05,925 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.59 vs. limit=22.5 2023-11-27 06:56:12,158 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 1800, loss[loss=0.08485, simple_loss=0.1166, pruned_loss=0.02004, audio_tagging_loss=0.006506, over 15483.00 frames. ], tot_loss[loss=0.06511, simple_loss=0.08895, pruned_loss=0.01205, audio_tagging_loss=0.008589, over 3039742.93 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:56:21,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3779493.3333333335, ans=0.2 2023-11-27 06:56:26,453 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.562e+01 9.045e+01 9.662e+01 1.034e+02 1.361e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-27 06:56:35,914 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.73 vs. limit=22.5 2023-11-27 06:56:38,861 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 566950 2023-11-27 06:56:47,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3779693.3333333335, ans=0.125 2023-11-27 06:57:00,792 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3779760.0, ans=0.125 2023-11-27 06:57:07,963 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 1850, loss[loss=0.05998, simple_loss=0.06887, pruned_loss=0.016, audio_tagging_loss=0.009549, over 13789.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.0889, pruned_loss=0.01203, audio_tagging_loss=0.008523, over 3033865.35 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:57:09,756 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.59 vs. limit=6.0 2023-11-27 06:57:12,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3779826.6666666665, ans=0.125 2023-11-27 06:57:26,736 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3779893.3333333335, ans=0.125 2023-11-27 06:57:29,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3779960.0, ans=0.0 2023-11-27 06:57:33,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3779960.0, ans=0.09899494936611666 2023-11-27 06:57:34,685 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 567000 2023-11-27 06:58:04,542 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 1900, loss[loss=0.06217, simple_loss=0.09343, pruned_loss=0.009025, audio_tagging_loss=0.006432, over 16007.00 frames. ], tot_loss[loss=0.0645, simple_loss=0.08822, pruned_loss=0.0119, audio_tagging_loss=0.008492, over 3040539.77 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:58:19,409 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.670e+01 9.174e+01 9.812e+01 1.054e+02 1.527e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-27 06:58:25,630 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.21 vs. limit=22.5 2023-11-27 06:58:31,264 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=51.54 vs. limit=22.5 2023-11-27 06:58:31,867 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 567050 2023-11-27 06:58:31,950 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3780293.3333333335, ans=0.035 2023-11-27 06:58:48,682 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.48 vs. limit=15.0 2023-11-27 06:58:54,622 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=3780426.6666666665, ans=15.0 2023-11-27 06:58:58,581 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3780426.6666666665, ans=0.125 2023-11-27 06:58:58,833 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.25 vs. limit=15.0 2023-11-27 06:59:00,543 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 1950, loss[loss=0.05031, simple_loss=0.06859, pruned_loss=0.007174, audio_tagging_loss=0.00884, over 14436.00 frames. ], tot_loss[loss=0.06437, simple_loss=0.08811, pruned_loss=0.01184, audio_tagging_loss=0.008472, over 3041513.45 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 06:59:02,303 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3780493.3333333335, ans=0.125 2023-11-27 06:59:02,382 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 06:59:07,042 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.02 vs. limit=15.0 2023-11-27 06:59:20,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3780560.0, ans=0.07 2023-11-27 06:59:23,304 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3780626.6666666665, ans=0.025 2023-11-27 06:59:27,379 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 567100 2023-11-27 06:59:40,866 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.53 vs. limit=15.0 2023-11-27 06:59:47,648 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.67 vs. limit=15.0 2023-11-27 06:59:57,535 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 2000, loss[loss=0.06475, simple_loss=0.09162, pruned_loss=0.01223, audio_tagging_loss=0.006703, over 14404.00 frames. ], tot_loss[loss=0.06462, simple_loss=0.08834, pruned_loss=0.01193, audio_tagging_loss=0.008525, over 3042803.87 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 07:00:07,859 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.14 vs. limit=15.0 2023-11-27 07:00:11,841 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.474e+01 9.102e+01 9.766e+01 1.042e+02 1.467e+02, threshold=1.953e+02, percent-clipped=0.0 2023-11-27 07:00:12,070 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3780893.3333333335, ans=0.2 2023-11-27 07:00:12,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3780893.3333333335, ans=0.0 2023-11-27 07:00:24,172 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 567150 2023-11-27 07:00:24,174 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3780960.0, ans=0.125 2023-11-27 07:00:41,730 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=3781093.3333333335, ans=22.5 2023-11-27 07:00:45,628 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3781093.3333333335, ans=0.1 2023-11-27 07:00:52,969 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 2050, loss[loss=0.06637, simple_loss=0.09456, pruned_loss=0.01265, audio_tagging_loss=0.006438, over 15566.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08921, pruned_loss=0.01206, audio_tagging_loss=0.008497, over 3039717.85 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 07:01:03,645 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3781226.6666666665, ans=0.015 2023-11-27 07:01:20,010 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 567200 2023-11-27 07:01:24,087 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3781293.3333333335, ans=0.125 2023-11-27 07:01:30,526 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3781360.0, ans=0.125 2023-11-27 07:01:37,457 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3781426.6666666665, ans=0.1 2023-11-27 07:01:46,495 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3781426.6666666665, ans=0.1 2023-11-27 07:01:49,455 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 2100, loss[loss=0.07851, simple_loss=0.1222, pruned_loss=0.01178, audio_tagging_loss=0.005628, over 15212.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08941, pruned_loss=0.01206, audio_tagging_loss=0.008447, over 3038985.34 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:01:50,828 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3781493.3333333335, ans=0.1 2023-11-27 07:02:04,313 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2023-11-27 07:02:04,945 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.553e+01 8.912e+01 9.473e+01 1.026e+02 1.468e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-27 07:02:13,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3781626.6666666665, ans=0.2 2023-11-27 07:02:14,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3781626.6666666665, ans=0.1 2023-11-27 07:02:16,305 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 567250 2023-11-27 07:02:33,932 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.25 vs. limit=6.0 2023-11-27 07:02:39,437 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3781760.0, ans=0.1 2023-11-27 07:02:39,473 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3781760.0, ans=0.0 2023-11-27 07:02:45,485 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 2150, loss[loss=0.06962, simple_loss=0.09412, pruned_loss=0.0154, audio_tagging_loss=0.007161, over 14787.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08972, pruned_loss=0.01218, audio_tagging_loss=0.008476, over 3038870.49 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:03:04,011 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.37 vs. limit=15.0 2023-11-27 07:03:07,041 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3781960.0, ans=0.2 2023-11-27 07:03:12,858 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 567300 2023-11-27 07:03:18,760 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3782026.6666666665, ans=0.04949747468305833 2023-11-27 07:03:18,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3782026.6666666665, ans=0.125 2023-11-27 07:03:19,656 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 07:03:25,198 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.92 vs. limit=15.0 2023-11-27 07:03:32,343 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=3782093.3333333335, ans=0.5 2023-11-27 07:03:40,374 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3782160.0, ans=0.04949747468305833 2023-11-27 07:03:41,248 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 2200, loss[loss=0.06147, simple_loss=0.08113, pruned_loss=0.009618, audio_tagging_loss=0.01128, over 14226.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08955, pruned_loss=0.01219, audio_tagging_loss=0.008589, over 3037393.27 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:03:45,254 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3782160.0, ans=0.125 2023-11-27 07:03:57,267 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.562e+01 8.960e+01 9.671e+01 1.061e+02 1.263e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-27 07:03:58,011 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.32 vs. limit=12.0 2023-11-27 07:04:01,396 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.69 vs. limit=15.0 2023-11-27 07:04:08,503 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 567350 2023-11-27 07:04:11,880 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3782293.3333333335, ans=0.0 2023-11-27 07:04:15,135 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3782360.0, ans=0.2 2023-11-27 07:04:37,779 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 2250, loss[loss=0.05847, simple_loss=0.07051, pruned_loss=0.01157, audio_tagging_loss=0.01164, over 14470.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.09054, pruned_loss=0.01231, audio_tagging_loss=0.008616, over 3041172.57 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:04:41,160 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3782493.3333333335, ans=0.125 2023-11-27 07:04:41,240 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3782493.3333333335, ans=0.0 2023-11-27 07:04:57,813 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3782560.0, ans=0.2 2023-11-27 07:04:58,769 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3782626.6666666665, ans=0.125 2023-11-27 07:05:04,596 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 567400 2023-11-27 07:05:07,427 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.38 vs. limit=6.0 2023-11-27 07:05:16,710 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3782693.3333333335, ans=0.0 2023-11-27 07:05:24,700 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.02 vs. limit=15.0 2023-11-27 07:05:28,026 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3782760.0, ans=0.2 2023-11-27 07:05:33,373 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3782826.6666666665, ans=0.125 2023-11-27 07:05:34,199 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 2300, loss[loss=0.06001, simple_loss=0.08447, pruned_loss=0.008884, audio_tagging_loss=0.008893, over 14459.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.0904, pruned_loss=0.01226, audio_tagging_loss=0.008636, over 3050708.44 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:05:36,686 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3782826.6666666665, ans=0.1 2023-11-27 07:05:49,430 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.463e+01 8.875e+01 9.360e+01 1.027e+02 1.398e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-27 07:06:01,221 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 567450 2023-11-27 07:06:23,047 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 07:06:29,272 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 2350, loss[loss=0.08759, simple_loss=0.1212, pruned_loss=0.01869, audio_tagging_loss=0.008272, over 15540.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.09023, pruned_loss=0.01223, audio_tagging_loss=0.008786, over 3051842.60 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:06:33,627 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.34 vs. limit=10.0 2023-11-27 07:06:37,123 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3783160.0, ans=10.0 2023-11-27 07:06:48,413 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3783226.6666666665, ans=0.0 2023-11-27 07:06:53,328 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3783293.3333333335, ans=0.125 2023-11-27 07:06:57,337 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 567500 2023-11-27 07:07:21,864 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3783426.6666666665, ans=0.2 2023-11-27 07:07:22,960 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3783426.6666666665, ans=0.035 2023-11-27 07:07:26,528 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 2400, loss[loss=0.07578, simple_loss=0.1092, pruned_loss=0.01037, audio_tagging_loss=0.01081, over 15097.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.09016, pruned_loss=0.01209, audio_tagging_loss=0.008884, over 3044315.85 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 07:07:42,060 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.653e+01 8.975e+01 9.647e+01 1.056e+02 1.487e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-27 07:07:43,332 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3783560.0, ans=0.125 2023-11-27 07:07:52,932 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 567550 2023-11-27 07:08:22,608 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 2450, loss[loss=0.08171, simple_loss=0.1142, pruned_loss=0.01714, audio_tagging_loss=0.007494, over 15527.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.09015, pruned_loss=0.01195, audio_tagging_loss=0.008872, over 3045191.89 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:08:31,601 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.33 vs. limit=15.0 2023-11-27 07:08:49,929 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 567600 2023-11-27 07:09:18,671 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 2500, loss[loss=0.05282, simple_loss=0.0733, pruned_loss=0.009246, audio_tagging_loss=0.006924, over 15134.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08964, pruned_loss=0.01181, audio_tagging_loss=0.008893, over 3046319.71 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:09:35,633 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3784226.6666666665, ans=0.07 2023-11-27 07:09:36,696 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3784226.6666666665, ans=0.0 2023-11-27 07:09:37,480 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.405e+01 8.971e+01 9.601e+01 1.047e+02 1.603e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-27 07:09:46,614 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 567650 2023-11-27 07:09:47,781 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3784293.3333333335, ans=0.0 2023-11-27 07:10:00,674 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3784360.0, ans=0.0 2023-11-27 07:10:07,020 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3784426.6666666665, ans=0.0 2023-11-27 07:10:16,021 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 2550, loss[loss=0.05552, simple_loss=0.07614, pruned_loss=0.008928, audio_tagging_loss=0.008518, over 15424.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08899, pruned_loss=0.01186, audio_tagging_loss=0.008802, over 3044465.17 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 07:10:21,708 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3784493.3333333335, ans=0.1 2023-11-27 07:10:32,910 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3784560.0, ans=0.125 2023-11-27 07:10:42,470 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 567700 2023-11-27 07:10:52,799 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3784693.3333333335, ans=0.1 2023-11-27 07:11:05,597 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3784760.0, ans=0.1 2023-11-27 07:11:12,336 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 2600, loss[loss=0.06588, simple_loss=0.09068, pruned_loss=0.01025, audio_tagging_loss=0.0103, over 14940.00 frames. ], tot_loss[loss=0.06482, simple_loss=0.08877, pruned_loss=0.01174, audio_tagging_loss=0.008697, over 3043482.59 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 07:11:22,494 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.14 vs. limit=15.0 2023-11-27 07:11:24,801 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.15 vs. limit=22.5 2023-11-27 07:11:29,536 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.178e+01 8.922e+01 9.501e+01 1.026e+02 1.288e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-27 07:11:38,610 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 567750 2023-11-27 07:11:48,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3785026.6666666665, ans=0.125 2023-11-27 07:12:07,898 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 2650, loss[loss=0.06498, simple_loss=0.09123, pruned_loss=0.01238, audio_tagging_loss=0.006985, over 14962.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08922, pruned_loss=0.01191, audio_tagging_loss=0.008606, over 3039222.79 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 07:12:14,405 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3785160.0, ans=0.125 2023-11-27 07:12:16,629 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3785160.0, ans=10.0 2023-11-27 07:12:29,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3785293.3333333335, ans=0.1 2023-11-27 07:12:35,621 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 567800 2023-11-27 07:12:43,474 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3785360.0, ans=0.0 2023-11-27 07:12:54,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3785426.6666666665, ans=0.1 2023-11-27 07:13:03,850 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 2700, loss[loss=0.05549, simple_loss=0.07545, pruned_loss=0.008806, audio_tagging_loss=0.00896, over 15450.00 frames. ], tot_loss[loss=0.06476, simple_loss=0.08867, pruned_loss=0.01179, audio_tagging_loss=0.008635, over 3046949.52 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 07:13:12,593 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3785493.3333333335, ans=0.0 2023-11-27 07:13:15,186 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.13 vs. limit=12.0 2023-11-27 07:13:15,198 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.08 vs. limit=15.0 2023-11-27 07:13:18,634 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3785560.0, ans=0.1 2023-11-27 07:13:20,784 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3785560.0, ans=0.125 2023-11-27 07:13:22,682 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.681e+01 9.127e+01 9.631e+01 1.043e+02 1.339e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-27 07:13:25,318 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.82 vs. limit=15.0 2023-11-27 07:13:26,133 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 07:13:31,256 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 567850 2023-11-27 07:13:33,595 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3785626.6666666665, ans=0.125 2023-11-27 07:13:37,348 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.28 vs. limit=8.0 2023-11-27 07:13:46,691 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3785693.3333333335, ans=0.1 2023-11-27 07:13:54,122 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3785760.0, ans=0.125 2023-11-27 07:14:00,772 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 2750, loss[loss=0.06138, simple_loss=0.08093, pruned_loss=0.01134, audio_tagging_loss=0.009577, over 14425.00 frames. ], tot_loss[loss=0.06494, simple_loss=0.08883, pruned_loss=0.0119, audio_tagging_loss=0.008628, over 3050047.24 frames. ], batch size: 53, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 07:14:11,652 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3785893.3333333335, ans=0.0 2023-11-27 07:14:26,691 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 567900 2023-11-27 07:14:36,035 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3786026.6666666665, ans=0.2 2023-11-27 07:14:37,633 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3786026.6666666665, ans=0.125 2023-11-27 07:14:48,530 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 07:14:56,110 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 2800, loss[loss=0.06747, simple_loss=0.09411, pruned_loss=0.01206, audio_tagging_loss=0.008357, over 15016.00 frames. ], tot_loss[loss=0.06492, simple_loss=0.08856, pruned_loss=0.01193, audio_tagging_loss=0.008709, over 3046993.63 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:15:00,617 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 07:15:07,031 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3786226.6666666665, ans=0.0 2023-11-27 07:15:14,278 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.005e+01 9.123e+01 9.680e+01 1.057e+02 2.633e+02, threshold=1.936e+02, percent-clipped=1.0 2023-11-27 07:15:14,479 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3786226.6666666665, ans=0.2 2023-11-27 07:15:20,334 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.63 vs. limit=22.5 2023-11-27 07:15:23,604 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 567950 2023-11-27 07:15:26,606 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3786293.3333333335, ans=0.1 2023-11-27 07:15:36,343 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3786360.0, ans=0.0 2023-11-27 07:15:40,561 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3786426.6666666665, ans=0.0 2023-11-27 07:15:50,605 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.81 vs. limit=15.0 2023-11-27 07:15:52,494 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 2850, loss[loss=0.04785, simple_loss=0.05353, pruned_loss=0.008156, audio_tagging_loss=0.01293, over 14336.00 frames. ], tot_loss[loss=0.06438, simple_loss=0.08775, pruned_loss=0.01178, audio_tagging_loss=0.008721, over 3035024.73 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:15:54,913 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3786493.3333333335, ans=0.125 2023-11-27 07:16:01,828 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3786493.3333333335, ans=0.2 2023-11-27 07:16:19,133 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 568000 2023-11-27 07:16:33,159 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3786693.3333333335, ans=0.1 2023-11-27 07:16:50,519 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 2900, loss[loss=0.05256, simple_loss=0.06871, pruned_loss=0.007161, audio_tagging_loss=0.01105, over 15599.00 frames. ], tot_loss[loss=0.06442, simple_loss=0.08805, pruned_loss=0.01176, audio_tagging_loss=0.008638, over 3040260.62 frames. ], batch size: 60, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:17:03,391 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3786893.3333333335, ans=0.125 2023-11-27 07:17:07,510 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.788e+01 8.964e+01 9.454e+01 1.013e+02 1.177e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-27 07:17:16,016 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 568050 2023-11-27 07:17:45,191 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 2950, loss[loss=0.049, simple_loss=0.06813, pruned_loss=0.006449, audio_tagging_loss=0.00849, over 15478.00 frames. ], tot_loss[loss=0.06462, simple_loss=0.08843, pruned_loss=0.01178, audio_tagging_loss=0.008624, over 3038355.56 frames. ], batch size: 60, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:17:50,715 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 07:17:51,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3787160.0, ans=0.1 2023-11-27 07:18:11,738 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 568100 2023-11-27 07:18:29,087 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3787426.6666666665, ans=0.1 2023-11-27 07:18:30,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3787426.6666666665, ans=0.0 2023-11-27 07:18:30,208 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3787426.6666666665, ans=0.125 2023-11-27 07:18:40,619 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 3000, loss[loss=0.06943, simple_loss=0.09984, pruned_loss=0.01213, audio_tagging_loss=0.007384, over 15658.00 frames. ], tot_loss[loss=0.06486, simple_loss=0.08883, pruned_loss=0.01186, audio_tagging_loss=0.008583, over 3039264.76 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 07:18:40,620 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-27 07:18:54,214 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.2017, 4.9482, 4.6351, 5.0564], device='cuda:3') 2023-11-27 07:19:02,485 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.6154, 3.7379, 3.9603, 3.4790], device='cuda:3') 2023-11-27 07:19:13,013 INFO [train_asr.py:1267] (3/4) Epoch 48, validation: loss=0.05781, simple_loss=0.05047, pruned_loss=0.005352, audio_tagging_loss=0.02722, over 4681554.00 frames. 2023-11-27 07:19:13,013 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-27 07:19:27,064 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3787560.0, ans=0.125 2023-11-27 07:19:31,101 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.941e+01 8.979e+01 9.616e+01 1.040e+02 1.231e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-27 07:19:39,341 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 568150 2023-11-27 07:20:08,445 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 3050, loss[loss=0.06962, simple_loss=0.09326, pruned_loss=0.01615, audio_tagging_loss=0.006839, over 14143.00 frames. ], tot_loss[loss=0.06479, simple_loss=0.08876, pruned_loss=0.01182, audio_tagging_loss=0.008589, over 3036782.46 frames. ], batch size: 53, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 07:20:17,135 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3787826.6666666665, ans=0.0 2023-11-27 07:20:18,260 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3787893.3333333335, ans=0.0 2023-11-27 07:20:28,816 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3787893.3333333335, ans=0.125 2023-11-27 07:20:35,640 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 568200 2023-11-27 07:20:36,888 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3787960.0, ans=0.07 2023-11-27 07:20:40,696 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3787960.0, ans=0.125 2023-11-27 07:20:41,568 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 07:20:45,954 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3788026.6666666665, ans=0.1 2023-11-27 07:20:54,496 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3788093.3333333335, ans=0.1 2023-11-27 07:20:54,684 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3788093.3333333335, ans=0.0 2023-11-27 07:20:57,740 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3788093.3333333335, ans=0.1 2023-11-27 07:21:02,431 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.54 vs. limit=22.5 2023-11-27 07:21:04,510 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 3100, loss[loss=0.05126, simple_loss=0.0632, pruned_loss=0.00896, audio_tagging_loss=0.0107, over 14958.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08925, pruned_loss=0.01205, audio_tagging_loss=0.008679, over 3039780.81 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 07:21:14,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3788226.6666666665, ans=0.1 2023-11-27 07:21:24,096 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.788e+01 9.299e+01 9.781e+01 1.036e+02 1.255e+02, threshold=1.956e+02, percent-clipped=0.0 2023-11-27 07:21:26,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3788293.3333333335, ans=0.125 2023-11-27 07:21:28,759 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.04 vs. limit=15.0 2023-11-27 07:21:31,532 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 568250 2023-11-27 07:22:00,753 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 3150, loss[loss=0.06871, simple_loss=0.09669, pruned_loss=0.01155, audio_tagging_loss=0.008817, over 15089.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.09032, pruned_loss=0.01217, audio_tagging_loss=0.008748, over 3050281.06 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 07:22:06,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3788493.3333333335, ans=0.0 2023-11-27 07:22:10,986 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3788560.0, ans=0.125 2023-11-27 07:22:16,426 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3788560.0, ans=0.125 2023-11-27 07:22:22,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3788626.6666666665, ans=0.125 2023-11-27 07:22:26,909 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 568300 2023-11-27 07:22:56,754 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 3200, loss[loss=0.0775, simple_loss=0.1118, pruned_loss=0.01492, audio_tagging_loss=0.006689, over 15329.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.09048, pruned_loss=0.01233, audio_tagging_loss=0.008839, over 3046816.51 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:23:00,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3788826.6666666665, ans=0.1 2023-11-27 07:23:06,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3788893.3333333335, ans=0.1 2023-11-27 07:23:15,180 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.870e+01 9.115e+01 9.612e+01 1.036e+02 1.415e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-27 07:23:22,223 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3788960.0, ans=0.0 2023-11-27 07:23:23,221 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 568350 2023-11-27 07:23:29,337 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3789026.6666666665, ans=0.125 2023-11-27 07:23:39,333 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3789026.6666666665, ans=0.025 2023-11-27 07:23:51,839 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 3250, loss[loss=0.05479, simple_loss=0.06645, pruned_loss=0.00979, audio_tagging_loss=0.01178, over 14097.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.09029, pruned_loss=0.01215, audio_tagging_loss=0.008922, over 3043130.92 frames. ], batch size: 53, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:24:03,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3789226.6666666665, ans=0.125 2023-11-27 07:24:04,895 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3789226.6666666665, ans=0.125 2023-11-27 07:24:05,049 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3789226.6666666665, ans=0.125 2023-11-27 07:24:05,980 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3789226.6666666665, ans=0.2 2023-11-27 07:24:10,255 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.74 vs. limit=15.0 2023-11-27 07:24:19,676 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 568400 2023-11-27 07:24:20,818 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3789293.3333333335, ans=0.1 2023-11-27 07:24:48,709 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 3300, loss[loss=0.08626, simple_loss=0.1192, pruned_loss=0.01837, audio_tagging_loss=0.008273, over 15379.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.09046, pruned_loss=0.01212, audio_tagging_loss=0.009076, over 3046067.51 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:24:52,587 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3789493.3333333335, ans=0.0 2023-11-27 07:25:08,002 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.162e+01 9.209e+01 1.006e+02 1.091e+02 1.432e+02, threshold=2.012e+02, percent-clipped=0.0 2023-11-27 07:25:15,513 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 568450 2023-11-27 07:25:16,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3789626.6666666665, ans=0.1 2023-11-27 07:25:31,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3789693.3333333335, ans=0.0 2023-11-27 07:25:35,518 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3789760.0, ans=0.0 2023-11-27 07:25:44,984 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 3350, loss[loss=0.07484, simple_loss=0.1075, pruned_loss=0.01568, audio_tagging_loss=0.005429, over 15133.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08983, pruned_loss=0.01193, audio_tagging_loss=0.009002, over 3048137.48 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:25:58,501 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3789893.3333333335, ans=0.125 2023-11-27 07:26:04,290 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3789893.3333333335, ans=0.125 2023-11-27 07:26:12,270 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 568500 2023-11-27 07:26:33,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3790093.3333333335, ans=0.125 2023-11-27 07:26:37,270 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.81 vs. limit=22.5 2023-11-27 07:26:40,919 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 3400, loss[loss=0.05148, simple_loss=0.06972, pruned_loss=0.007753, audio_tagging_loss=0.008865, over 14797.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.0897, pruned_loss=0.01186, audio_tagging_loss=0.008894, over 3049427.42 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:26:56,128 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3790226.6666666665, ans=0.125 2023-11-27 07:27:00,692 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.647e+01 9.061e+01 9.680e+01 1.035e+02 1.293e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-27 07:27:08,343 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 568550 2023-11-27 07:27:17,918 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.26 vs. limit=12.0 2023-11-27 07:27:22,813 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3790360.0, ans=0.1 2023-11-27 07:27:24,868 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3790426.6666666665, ans=0.125 2023-11-27 07:27:26,000 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3790426.6666666665, ans=0.1 2023-11-27 07:27:37,468 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 3450, loss[loss=0.06474, simple_loss=0.09275, pruned_loss=0.01228, audio_tagging_loss=0.006082, over 15250.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.09004, pruned_loss=0.01191, audio_tagging_loss=0.008756, over 3050979.50 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:27:53,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3790560.0, ans=0.125 2023-11-27 07:27:53,570 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.84 vs. limit=15.0 2023-11-27 07:28:04,056 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 568600 2023-11-27 07:28:27,320 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3790760.0, ans=0.0 2023-11-27 07:28:33,475 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 3500, loss[loss=0.07216, simple_loss=0.1027, pruned_loss=0.01105, audio_tagging_loss=0.009771, over 14880.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.0899, pruned_loss=0.01178, audio_tagging_loss=0.008646, over 3047004.35 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:28:43,516 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3790893.3333333335, ans=0.125 2023-11-27 07:28:52,427 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.089e+01 9.044e+01 9.629e+01 1.032e+02 1.263e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-27 07:28:52,585 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3790893.3333333335, ans=0.125 2023-11-27 07:28:52,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3790893.3333333335, ans=0.0 2023-11-27 07:29:00,459 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 568650 2023-11-27 07:29:02,540 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 07:29:10,252 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.26 vs. limit=22.5 2023-11-27 07:29:10,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3791026.6666666665, ans=0.09899494936611666 2023-11-27 07:29:15,079 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3791026.6666666665, ans=0.0 2023-11-27 07:29:23,458 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3791093.3333333335, ans=0.125 2023-11-27 07:29:29,117 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 3550, loss[loss=0.04883, simple_loss=0.06486, pruned_loss=0.005184, audio_tagging_loss=0.01121, over 15133.00 frames. ], tot_loss[loss=0.06492, simple_loss=0.08926, pruned_loss=0.01177, audio_tagging_loss=0.008522, over 3045204.87 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:29:30,459 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3791160.0, ans=0.1 2023-11-27 07:29:56,280 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 568700 2023-11-27 07:29:59,598 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3791293.3333333335, ans=0.125 2023-11-27 07:30:01,987 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.28 vs. limit=22.5 2023-11-27 07:30:25,385 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 3600, loss[loss=0.07264, simple_loss=0.0976, pruned_loss=0.01539, audio_tagging_loss=0.008459, over 15805.00 frames. ], tot_loss[loss=0.06468, simple_loss=0.08881, pruned_loss=0.01175, audio_tagging_loss=0.008526, over 3047214.74 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 07:30:32,816 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.20 vs. limit=10.0 2023-11-27 07:30:43,741 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.448e+01 8.813e+01 9.473e+01 1.021e+02 1.241e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-27 07:30:51,209 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 568750 2023-11-27 07:31:05,090 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3791693.3333333335, ans=0.125 2023-11-27 07:31:09,936 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 07:31:19,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3791760.0, ans=0.0 2023-11-27 07:31:21,014 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 3650, loss[loss=0.06844, simple_loss=0.08593, pruned_loss=0.01583, audio_tagging_loss=0.009648, over 15622.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.08919, pruned_loss=0.01183, audio_tagging_loss=0.008441, over 3047373.52 frames. ], batch size: 60, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 07:31:47,526 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 568800 2023-11-27 07:31:58,614 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3792026.6666666665, ans=0.125 2023-11-27 07:32:06,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3792093.3333333335, ans=0.0 2023-11-27 07:32:13,597 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3792093.3333333335, ans=0.125 2023-11-27 07:32:16,488 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 3700, loss[loss=0.04822, simple_loss=0.06094, pruned_loss=0.006294, audio_tagging_loss=0.01146, over 15649.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08916, pruned_loss=0.01186, audio_tagging_loss=0.008466, over 3053960.85 frames. ], batch size: 60, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:32:17,049 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.38 vs. limit=15.0 2023-11-27 07:32:32,217 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3792226.6666666665, ans=0.0 2023-11-27 07:32:37,086 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.335e+01 9.068e+01 9.744e+01 1.049e+02 1.191e+02, threshold=1.949e+02, percent-clipped=0.0 2023-11-27 07:32:40,261 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.64 vs. limit=15.0 2023-11-27 07:32:43,171 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3792293.3333333335, ans=0.125 2023-11-27 07:32:44,056 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 568850 2023-11-27 07:32:48,861 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.38 vs. limit=6.0 2023-11-27 07:33:00,314 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3792426.6666666665, ans=0.2 2023-11-27 07:33:03,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3792426.6666666665, ans=0.0 2023-11-27 07:33:13,054 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 3750, loss[loss=0.07947, simple_loss=0.1112, pruned_loss=0.01481, audio_tagging_loss=0.009047, over 15324.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.0897, pruned_loss=0.01198, audio_tagging_loss=0.008502, over 3051687.02 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:33:14,831 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3792493.3333333335, ans=0.125 2023-11-27 07:33:25,076 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3792560.0, ans=0.2 2023-11-27 07:33:30,664 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.06 vs. limit=22.5 2023-11-27 07:33:38,969 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3792626.6666666665, ans=0.125 2023-11-27 07:33:39,743 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 568900 2023-11-27 07:33:39,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3792626.6666666665, ans=0.125 2023-11-27 07:33:39,944 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 07:33:48,932 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.37 vs. limit=6.0 2023-11-27 07:33:51,831 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3792693.3333333335, ans=0.1 2023-11-27 07:33:52,291 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn1.whiten.whitening_limit, batch_count=3792693.3333333335, ans=22.5 2023-11-27 07:33:52,682 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 07:33:58,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3792760.0, ans=0.125 2023-11-27 07:34:02,162 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.41 vs. limit=12.0 2023-11-27 07:34:03,806 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3792760.0, ans=0.125 2023-11-27 07:34:09,535 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 3800, loss[loss=0.05702, simple_loss=0.08737, pruned_loss=0.006072, audio_tagging_loss=0.007262, over 15508.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.09015, pruned_loss=0.01212, audio_tagging_loss=0.008543, over 3057973.07 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:34:28,607 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.900e+01 9.318e+01 9.986e+01 1.074e+02 1.810e+02, threshold=1.997e+02, percent-clipped=0.0 2023-11-27 07:34:30,968 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3792960.0, ans=0.125 2023-11-27 07:34:35,972 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 568950 2023-11-27 07:34:47,700 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3793026.6666666665, ans=0.0 2023-11-27 07:35:04,402 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 3850, loss[loss=0.0728, simple_loss=0.1028, pruned_loss=0.01469, audio_tagging_loss=0.006683, over 14302.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08949, pruned_loss=0.01203, audio_tagging_loss=0.008567, over 3052392.38 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:35:16,962 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.61 vs. limit=10.0 2023-11-27 07:35:20,439 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3793226.6666666665, ans=0.1 2023-11-27 07:35:26,587 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.22 vs. limit=15.0 2023-11-27 07:35:30,130 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3793293.3333333335, ans=0.2 2023-11-27 07:35:32,177 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 569000 2023-11-27 07:35:40,045 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3793360.0, ans=0.125 2023-11-27 07:36:00,505 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 3900, loss[loss=0.0696, simple_loss=0.1003, pruned_loss=0.01328, audio_tagging_loss=0.006188, over 15154.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08937, pruned_loss=0.01203, audio_tagging_loss=0.008593, over 3047386.05 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 07:36:11,966 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 07:36:12,228 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.38 vs. limit=15.0 2023-11-27 07:36:22,247 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.637e+01 9.249e+01 9.619e+01 1.030e+02 1.289e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-27 07:36:27,696 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 569050 2023-11-27 07:36:34,223 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3793693.3333333335, ans=0.125 2023-11-27 07:36:46,983 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.04 vs. limit=12.0 2023-11-27 07:36:56,879 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 3950, loss[loss=0.07642, simple_loss=0.1039, pruned_loss=0.01756, audio_tagging_loss=0.006914, over 16750.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.09012, pruned_loss=0.01215, audio_tagging_loss=0.008694, over 3049963.95 frames. ], batch size: 64, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 07:37:14,467 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.82 vs. limit=15.0 2023-11-27 07:37:15,638 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.96 vs. limit=15.0 2023-11-27 07:37:23,072 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 569100 2023-11-27 07:37:29,555 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3794026.6666666665, ans=0.0 2023-11-27 07:37:52,206 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 4000, loss[loss=0.06035, simple_loss=0.07984, pruned_loss=0.01089, audio_tagging_loss=0.009542, over 15576.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.09086, pruned_loss=0.01225, audio_tagging_loss=0.008801, over 3050839.31 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:37:57,767 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 07:38:06,156 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.19 vs. limit=22.5 2023-11-27 07:38:13,569 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.003e+01 9.122e+01 1.001e+02 1.075e+02 1.362e+02, threshold=2.001e+02, percent-clipped=0.0 2023-11-27 07:38:17,571 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 07:38:20,099 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 569150 2023-11-27 07:38:22,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3794293.3333333335, ans=0.04949747468305833 2023-11-27 07:38:40,641 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.07 vs. limit=15.0 2023-11-27 07:38:48,314 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 4050, loss[loss=0.07289, simple_loss=0.09904, pruned_loss=0.01117, audio_tagging_loss=0.01221, over 14385.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.09039, pruned_loss=0.01209, audio_tagging_loss=0.008907, over 3044680.44 frames. ], batch size: 52, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:38:51,530 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 07:38:54,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3794493.3333333335, ans=0.125 2023-11-27 07:38:59,584 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3794560.0, ans=0.2 2023-11-27 07:39:02,865 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3794560.0, ans=0.1 2023-11-27 07:39:03,870 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3794560.0, ans=0.0 2023-11-27 07:39:12,482 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3794626.6666666665, ans=0.09899494936611666 2023-11-27 07:39:15,557 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 569200 2023-11-27 07:39:15,755 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3794626.6666666665, ans=0.125 2023-11-27 07:39:16,208 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.32 vs. limit=15.0 2023-11-27 07:39:21,731 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.77 vs. limit=6.0 2023-11-27 07:39:35,786 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.98 vs. limit=15.0 2023-11-27 07:39:44,873 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 4100, loss[loss=0.07328, simple_loss=0.101, pruned_loss=0.01368, audio_tagging_loss=0.009076, over 16466.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.09031, pruned_loss=0.01202, audio_tagging_loss=0.008813, over 3053491.08 frames. ], batch size: 61, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:40:02,740 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3794893.3333333335, ans=0.125 2023-11-27 07:40:05,684 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.307e+01 9.057e+01 9.659e+01 1.032e+02 2.111e+02, threshold=1.932e+02, percent-clipped=1.0 2023-11-27 07:40:06,310 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.98 vs. limit=15.0 2023-11-27 07:40:08,145 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3794960.0, ans=0.0 2023-11-27 07:40:11,107 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 569250 2023-11-27 07:40:20,178 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3795026.6666666665, ans=0.125 2023-11-27 07:40:23,899 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3795026.6666666665, ans=0.125 2023-11-27 07:40:31,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3795093.3333333335, ans=0.1 2023-11-27 07:40:38,857 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.59 vs. limit=15.0 2023-11-27 07:40:40,542 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 4150, loss[loss=0.07007, simple_loss=0.09015, pruned_loss=0.01356, audio_tagging_loss=0.01143, over 14190.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.09069, pruned_loss=0.01211, audio_tagging_loss=0.008695, over 3049710.76 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:40:43,886 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3795160.0, ans=0.0 2023-11-27 07:41:06,150 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3795293.3333333335, ans=0.125 2023-11-27 07:41:07,126 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 569300 2023-11-27 07:41:08,891 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3795293.3333333335, ans=0.2 2023-11-27 07:41:10,952 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3795293.3333333335, ans=0.125 2023-11-27 07:41:22,182 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 07:41:24,536 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3795426.6666666665, ans=0.1 2023-11-27 07:41:36,066 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 4200, loss[loss=0.07054, simple_loss=0.1046, pruned_loss=0.01202, audio_tagging_loss=0.006202, over 16109.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.09044, pruned_loss=0.01199, audio_tagging_loss=0.008643, over 3050561.25 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:41:37,916 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3795493.3333333335, ans=0.125 2023-11-27 07:41:39,027 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3795493.3333333335, ans=0.0 2023-11-27 07:41:58,437 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.451e+01 9.145e+01 9.853e+01 1.047e+02 1.662e+02, threshold=1.971e+02, percent-clipped=0.0 2023-11-27 07:41:59,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3795626.6666666665, ans=0.1 2023-11-27 07:42:03,840 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 569350 2023-11-27 07:42:19,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3795693.3333333335, ans=0.1 2023-11-27 07:42:32,523 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 4250, loss[loss=0.08747, simple_loss=0.1185, pruned_loss=0.02072, audio_tagging_loss=0.007499, over 15359.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.09089, pruned_loss=0.01201, audio_tagging_loss=0.008478, over 3051255.41 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:42:44,428 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3795893.3333333335, ans=0.125 2023-11-27 07:42:59,206 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 569400 2023-11-27 07:43:07,591 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3796026.6666666665, ans=0.0 2023-11-27 07:43:08,788 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 07:43:16,748 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3796093.3333333335, ans=0.09899494936611666 2023-11-27 07:43:21,389 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3796093.3333333335, ans=0.2 2023-11-27 07:43:29,339 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 4300, loss[loss=0.06165, simple_loss=0.08974, pruned_loss=0.008812, audio_tagging_loss=0.00797, over 15589.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.09149, pruned_loss=0.01195, audio_tagging_loss=0.008379, over 3056616.97 frames. ], batch size: 60, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:43:34,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3796160.0, ans=0.0 2023-11-27 07:43:48,053 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3796226.6666666665, ans=0.015 2023-11-27 07:43:50,074 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.655e+01 9.134e+01 9.847e+01 1.057e+02 1.328e+02, threshold=1.969e+02, percent-clipped=0.0 2023-11-27 07:43:56,068 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 569450 2023-11-27 07:44:06,471 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3796360.0, ans=0.125 2023-11-27 07:44:24,881 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 4350, loss[loss=0.08602, simple_loss=0.1225, pruned_loss=0.01799, audio_tagging_loss=0.006761, over 15405.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.09104, pruned_loss=0.0121, audio_tagging_loss=0.008408, over 3052670.83 frames. ], batch size: 53, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:44:45,918 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3796560.0, ans=0.0 2023-11-27 07:44:51,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3796626.6666666665, ans=0.1 2023-11-27 07:44:52,285 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 569500 2023-11-27 07:44:54,023 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3796626.6666666665, ans=0.1 2023-11-27 07:45:11,123 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3796760.0, ans=0.125 2023-11-27 07:45:13,712 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3796760.0, ans=0.0 2023-11-27 07:45:20,912 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 4400, loss[loss=0.08264, simple_loss=0.1231, pruned_loss=0.01647, audio_tagging_loss=0.004611, over 16371.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.09013, pruned_loss=0.01193, audio_tagging_loss=0.008431, over 3055072.10 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 07:45:21,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3796826.6666666665, ans=0.0 2023-11-27 07:45:42,614 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.177e+01 8.963e+01 9.534e+01 1.011e+02 1.280e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-27 07:45:47,991 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 569550 2023-11-27 07:45:55,248 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.61 vs. limit=15.0 2023-11-27 07:46:15,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3797093.3333333335, ans=0.125 2023-11-27 07:46:17,082 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 4450, loss[loss=0.09118, simple_loss=0.1331, pruned_loss=0.01794, audio_tagging_loss=0.006689, over 16232.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08977, pruned_loss=0.01181, audio_tagging_loss=0.008456, over 3050131.88 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:46:30,923 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.60 vs. limit=15.0 2023-11-27 07:46:36,197 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.23 vs. limit=15.0 2023-11-27 07:46:38,951 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2.whitening_limit, batch_count=3797293.3333333335, ans=15.0 2023-11-27 07:46:43,697 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 569600 2023-11-27 07:46:58,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3797360.0, ans=0.125 2023-11-27 07:47:07,792 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3797426.6666666665, ans=0.0 2023-11-27 07:47:12,975 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 4500, loss[loss=0.08367, simple_loss=0.1138, pruned_loss=0.01572, audio_tagging_loss=0.01107, over 15691.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.09008, pruned_loss=0.01191, audio_tagging_loss=0.00839, over 3061567.37 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:47:31,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3797560.0, ans=0.0 2023-11-27 07:47:32,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3797560.0, ans=0.125 2023-11-27 07:47:35,706 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.963e+01 9.073e+01 9.744e+01 1.047e+02 1.558e+02, threshold=1.949e+02, percent-clipped=0.0 2023-11-27 07:47:39,223 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3797626.6666666665, ans=0.125 2023-11-27 07:47:40,065 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 569650 2023-11-27 07:47:41,508 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=7.74 vs. limit=12.0 2023-11-27 07:47:46,482 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3797693.3333333335, ans=0.09899494936611666 2023-11-27 07:47:59,916 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.36 vs. limit=6.0 2023-11-27 07:48:08,488 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 4550, loss[loss=0.06912, simple_loss=0.09319, pruned_loss=0.01443, audio_tagging_loss=0.008092, over 15745.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.08938, pruned_loss=0.01193, audio_tagging_loss=0.00843, over 3058960.30 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:48:14,486 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3797826.6666666665, ans=0.95 2023-11-27 07:48:34,683 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=3797960.0, ans=0.2 2023-11-27 07:48:35,683 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 569700 2023-11-27 07:48:38,447 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.11 vs. limit=15.0 2023-11-27 07:48:43,409 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=3798026.6666666665, ans=0.1 2023-11-27 07:48:44,387 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3798026.6666666665, ans=0.125 2023-11-27 07:48:52,297 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 07:49:05,212 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 4600, loss[loss=0.05522, simple_loss=0.07423, pruned_loss=0.008106, audio_tagging_loss=0.009996, over 14143.00 frames. ], tot_loss[loss=0.06489, simple_loss=0.08876, pruned_loss=0.01196, audio_tagging_loss=0.008544, over 3054268.32 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:49:24,225 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3798226.6666666665, ans=0.125 2023-11-27 07:49:25,291 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3798226.6666666665, ans=0.1 2023-11-27 07:49:27,117 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.887e+01 8.803e+01 9.390e+01 1.011e+02 1.144e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-27 07:49:31,994 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 569750 2023-11-27 07:50:00,868 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 4650, loss[loss=0.06233, simple_loss=0.08375, pruned_loss=0.01105, audio_tagging_loss=0.009404, over 15040.00 frames. ], tot_loss[loss=0.06476, simple_loss=0.08834, pruned_loss=0.01191, audio_tagging_loss=0.008679, over 3055161.58 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:50:26,067 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.32 vs. limit=15.0 2023-11-27 07:50:28,634 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 569800 2023-11-27 07:50:50,928 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3798760.0, ans=0.0 2023-11-27 07:50:51,233 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.91 vs. limit=15.0 2023-11-27 07:50:57,446 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 4700, loss[loss=0.06628, simple_loss=0.08697, pruned_loss=0.01275, audio_tagging_loss=0.01004, over 16190.00 frames. ], tot_loss[loss=0.06416, simple_loss=0.08741, pruned_loss=0.01174, audio_tagging_loss=0.00871, over 3055088.56 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:51:05,567 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3798826.6666666665, ans=0.125 2023-11-27 07:51:09,314 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3798893.3333333335, ans=0.2 2023-11-27 07:51:19,709 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.561e+01 8.828e+01 9.714e+01 1.039e+02 1.424e+02, threshold=1.943e+02, percent-clipped=0.0 2023-11-27 07:51:23,963 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 569850 2023-11-27 07:51:33,059 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3799026.6666666665, ans=0.125 2023-11-27 07:51:51,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3799093.3333333335, ans=0.125 2023-11-27 07:51:53,642 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 4750, loss[loss=0.07832, simple_loss=0.1075, pruned_loss=0.0178, audio_tagging_loss=0.006774, over 15544.00 frames. ], tot_loss[loss=0.06482, simple_loss=0.08851, pruned_loss=0.01182, audio_tagging_loss=0.008743, over 3061496.85 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 07:51:55,933 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3799160.0, ans=0.2 2023-11-27 07:52:08,973 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.56 vs. limit=15.0 2023-11-27 07:52:09,868 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3799226.6666666665, ans=0.125 2023-11-27 07:52:09,928 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3799226.6666666665, ans=0.95 2023-11-27 07:52:11,319 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.14 vs. limit=22.5 2023-11-27 07:52:20,342 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 569900 2023-11-27 07:52:29,389 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3799360.0, ans=0.125 2023-11-27 07:52:35,248 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3799360.0, ans=0.125 2023-11-27 07:52:36,256 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3799360.0, ans=0.125 2023-11-27 07:52:40,020 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.58 vs. limit=15.0 2023-11-27 07:52:48,850 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 4800, loss[loss=0.07147, simple_loss=0.09716, pruned_loss=0.01651, audio_tagging_loss=0.006372, over 15671.00 frames. ], tot_loss[loss=0.06468, simple_loss=0.08835, pruned_loss=0.01172, audio_tagging_loss=0.008779, over 3053772.71 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 07:52:59,197 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3799560.0, ans=0.125 2023-11-27 07:53:08,128 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3799560.0, ans=0.0 2023-11-27 07:53:11,650 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.066e+01 9.043e+01 9.728e+01 1.036e+02 1.523e+02, threshold=1.946e+02, percent-clipped=0.0 2023-11-27 07:53:16,505 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 569950 2023-11-27 07:53:18,870 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.83 vs. limit=15.0 2023-11-27 07:53:36,098 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.20 vs. limit=15.0 2023-11-27 07:53:45,020 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 4850, loss[loss=0.05838, simple_loss=0.07792, pruned_loss=0.01053, audio_tagging_loss=0.008889, over 16154.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.08859, pruned_loss=0.0119, audio_tagging_loss=0.008906, over 3052551.44 frames. ], batch size: 62, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 07:53:48,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3799826.6666666665, ans=0.1 2023-11-27 07:54:11,679 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 570000 2023-11-27 07:54:13,202 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.79 vs. limit=15.0 2023-11-27 07:54:25,546 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3800026.6666666665, ans=0.125 2023-11-27 07:54:41,097 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 4900, loss[loss=0.08001, simple_loss=0.1095, pruned_loss=0.0173, audio_tagging_loss=0.007979, over 16005.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.08933, pruned_loss=0.0122, audio_tagging_loss=0.008912, over 3051167.54 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 07:54:45,594 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=3800160.0, ans=0.02 2023-11-27 07:54:56,425 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.97 vs. limit=15.0 2023-11-27 07:54:57,107 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3800226.6666666665, ans=0.1 2023-11-27 07:54:59,252 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3800226.6666666665, ans=0.2 2023-11-27 07:55:02,212 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.718e+01 9.110e+01 9.715e+01 1.029e+02 1.331e+02, threshold=1.943e+02, percent-clipped=0.0 2023-11-27 07:55:06,622 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 570050 2023-11-27 07:55:13,292 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3800360.0, ans=0.125 2023-11-27 07:55:18,960 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3800360.0, ans=0.1 2023-11-27 07:55:36,375 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 4950, loss[loss=0.07054, simple_loss=0.09723, pruned_loss=0.01456, audio_tagging_loss=0.007362, over 15576.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08965, pruned_loss=0.01212, audio_tagging_loss=0.008653, over 3046026.77 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 07:55:36,630 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3800493.3333333335, ans=0.125 2023-11-27 07:55:48,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3800560.0, ans=0.125 2023-11-27 07:55:53,730 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 07:55:55,108 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.36 vs. limit=22.5 2023-11-27 07:55:57,489 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.32 vs. limit=6.0 2023-11-27 07:56:04,236 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 570100 2023-11-27 07:56:13,341 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.01 vs. limit=15.0 2023-11-27 07:56:28,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3800760.0, ans=0.2 2023-11-27 07:56:29,950 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3800760.0, ans=0.125 2023-11-27 07:56:31,913 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 5000, loss[loss=0.06446, simple_loss=0.09082, pruned_loss=0.01265, audio_tagging_loss=0.006397, over 15304.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.0896, pruned_loss=0.01202, audio_tagging_loss=0.008557, over 3048640.87 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 07:56:38,392 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3800826.6666666665, ans=0.0 2023-11-27 07:56:44,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3800893.3333333335, ans=0.125 2023-11-27 07:56:49,437 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=3800893.3333333335, ans=15.0 2023-11-27 07:56:53,512 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3800893.3333333335, ans=0.0 2023-11-27 07:56:55,302 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.720e+01 8.884e+01 9.444e+01 1.042e+02 1.203e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-27 07:56:56,919 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.11 vs. limit=12.0 2023-11-27 07:56:57,679 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3800960.0, ans=0.5 2023-11-27 07:56:59,621 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 570150 2023-11-27 07:56:59,833 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 07:57:07,218 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3801026.6666666665, ans=0.0 2023-11-27 07:57:19,445 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3801093.3333333335, ans=0.0 2023-11-27 07:57:21,274 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.07 vs. limit=15.0 2023-11-27 07:57:22,392 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.99 vs. limit=22.5 2023-11-27 07:57:23,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3801093.3333333335, ans=0.0 2023-11-27 07:57:29,507 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 5050, loss[loss=0.05614, simple_loss=0.07594, pruned_loss=0.008674, audio_tagging_loss=0.009496, over 14943.00 frames. ], tot_loss[loss=0.06469, simple_loss=0.08867, pruned_loss=0.01193, audio_tagging_loss=0.008424, over 3039798.00 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 07:57:41,353 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3801226.6666666665, ans=0.125 2023-11-27 07:57:49,881 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3801293.3333333335, ans=0.0 2023-11-27 07:57:50,942 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3801293.3333333335, ans=0.1 2023-11-27 07:57:54,979 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 570200 2023-11-27 07:58:13,577 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3801426.6666666665, ans=0.125 2023-11-27 07:58:17,616 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3801426.6666666665, ans=0.125 2023-11-27 07:58:24,954 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 5100, loss[loss=0.06194, simple_loss=0.08616, pruned_loss=0.01006, audio_tagging_loss=0.008802, over 15605.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.08897, pruned_loss=0.01188, audio_tagging_loss=0.008437, over 3046610.03 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 07:58:30,373 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3801493.3333333335, ans=0.0 2023-11-27 07:58:34,554 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3801560.0, ans=0.125 2023-11-27 07:58:46,395 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.907e+01 8.861e+01 9.486e+01 1.041e+02 1.352e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-27 07:58:51,775 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 570250 2023-11-27 07:58:52,329 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.34 vs. limit=15.0 2023-11-27 07:58:52,993 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 07:58:54,658 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3801626.6666666665, ans=0.2 2023-11-27 07:59:00,978 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3801693.3333333335, ans=0.2 2023-11-27 07:59:05,149 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3801693.3333333335, ans=0.125 2023-11-27 07:59:10,451 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3801760.0, ans=0.2 2023-11-27 07:59:17,173 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.47 vs. limit=15.0 2023-11-27 07:59:18,896 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3801826.6666666665, ans=0.125 2023-11-27 07:59:19,741 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 5150, loss[loss=0.05086, simple_loss=0.07301, pruned_loss=0.006699, audio_tagging_loss=0.007656, over 14471.00 frames. ], tot_loss[loss=0.06468, simple_loss=0.0887, pruned_loss=0.01185, audio_tagging_loss=0.008483, over 3048188.62 frames. ], batch size: 53, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 07:59:19,915 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3801826.6666666665, ans=0.125 2023-11-27 07:59:23,653 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3801826.6666666665, ans=0.2 2023-11-27 07:59:25,968 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=3801826.6666666665, ans=10.0 2023-11-27 07:59:32,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3801893.3333333335, ans=0.125 2023-11-27 07:59:46,788 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 570300 2023-11-27 08:00:09,192 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.83 vs. limit=10.0 2023-11-27 08:00:15,512 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 5200, loss[loss=0.05319, simple_loss=0.07105, pruned_loss=0.009969, audio_tagging_loss=0.007699, over 14394.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.08884, pruned_loss=0.01195, audio_tagging_loss=0.008498, over 3044569.69 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 32.0 2023-11-27 08:00:28,442 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3802226.6666666665, ans=0.125 2023-11-27 08:00:36,940 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3802293.3333333335, ans=0.0 2023-11-27 08:00:39,807 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.659e+01 9.152e+01 9.640e+01 1.026e+02 1.239e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-27 08:00:42,073 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 570350 2023-11-27 08:01:05,684 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3802426.6666666665, ans=0.0 2023-11-27 08:01:11,809 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 5250, loss[loss=0.06559, simple_loss=0.0905, pruned_loss=0.008535, audio_tagging_loss=0.01181, over 15313.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.09013, pruned_loss=0.01203, audio_tagging_loss=0.008429, over 3051954.77 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 08:01:14,183 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3802493.3333333335, ans=0.1 2023-11-27 08:01:22,809 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3802560.0, ans=0.2 2023-11-27 08:01:38,162 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 570400 2023-11-27 08:01:49,690 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3802693.3333333335, ans=0.1 2023-11-27 08:01:58,212 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3802760.0, ans=0.0 2023-11-27 08:02:00,316 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3802760.0, ans=0.125 2023-11-27 08:02:07,493 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 5300, loss[loss=0.06892, simple_loss=0.09436, pruned_loss=0.01248, audio_tagging_loss=0.009269, over 14742.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.09074, pruned_loss=0.01213, audio_tagging_loss=0.008316, over 3047352.02 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 08:02:16,849 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3802826.6666666665, ans=0.1 2023-11-27 08:02:23,701 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3802893.3333333335, ans=0.0 2023-11-27 08:02:33,176 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.488e+01 9.130e+01 9.779e+01 1.044e+02 2.518e+02, threshold=1.956e+02, percent-clipped=1.0 2023-11-27 08:02:35,413 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 570450 2023-11-27 08:02:45,042 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3803026.6666666665, ans=0.125 2023-11-27 08:03:02,456 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3803160.0, ans=0.1 2023-11-27 08:03:03,253 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 5350, loss[loss=0.06333, simple_loss=0.08853, pruned_loss=0.01231, audio_tagging_loss=0.006753, over 16087.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.0903, pruned_loss=0.01211, audio_tagging_loss=0.008448, over 3041951.64 frames. ], batch size: 61, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 08:03:23,217 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3803226.6666666665, ans=0.0 2023-11-27 08:03:30,617 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 570500 2023-11-27 08:03:52,417 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3803426.6666666665, ans=0.125 2023-11-27 08:03:54,626 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3803426.6666666665, ans=0.125 2023-11-27 08:04:00,237 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 5400, loss[loss=0.04617, simple_loss=0.06167, pruned_loss=0.006656, audio_tagging_loss=0.008675, over 14515.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.09182, pruned_loss=0.01238, audio_tagging_loss=0.00842, over 3036875.00 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 08:04:04,733 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=3803493.3333333335, ans=0.2 2023-11-27 08:04:07,778 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3803493.3333333335, ans=0.125 2023-11-27 08:04:12,133 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3803560.0, ans=0.125 2023-11-27 08:04:25,134 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.573e+01 8.928e+01 9.462e+01 1.035e+02 1.260e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-27 08:04:26,252 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 570550 2023-11-27 08:04:27,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3803626.6666666665, ans=0.125 2023-11-27 08:04:45,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3803760.0, ans=0.0 2023-11-27 08:04:48,953 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3803760.0, ans=0.0 2023-11-27 08:04:55,117 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 5450, loss[loss=0.0742, simple_loss=0.09036, pruned_loss=0.01683, audio_tagging_loss=0.01218, over 14499.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.09025, pruned_loss=0.01207, audio_tagging_loss=0.008508, over 3038583.14 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 08:05:16,001 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3803893.3333333335, ans=0.125 2023-11-27 08:05:22,219 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 570600 2023-11-27 08:05:33,126 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.50 vs. limit=15.0 2023-11-27 08:05:42,383 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3804093.3333333335, ans=0.125 2023-11-27 08:05:51,032 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 5500, loss[loss=0.06933, simple_loss=0.106, pruned_loss=0.009748, audio_tagging_loss=0.006592, over 14467.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08989, pruned_loss=0.01192, audio_tagging_loss=0.008605, over 3041014.81 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 08:06:08,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3804226.6666666665, ans=0.125 2023-11-27 08:06:16,933 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.810e+01 9.180e+01 9.726e+01 1.043e+02 1.311e+02, threshold=1.945e+02, percent-clipped=0.0 2023-11-27 08:06:18,120 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 570650 2023-11-27 08:06:21,472 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3804293.3333333335, ans=0.2 2023-11-27 08:06:37,757 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.28 vs. limit=15.0 2023-11-27 08:06:47,586 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 5550, loss[loss=0.06592, simple_loss=0.08872, pruned_loss=0.01377, audio_tagging_loss=0.007788, over 15030.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.09006, pruned_loss=0.01195, audio_tagging_loss=0.008622, over 3039949.49 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 08:06:54,707 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3804493.3333333335, ans=0.125 2023-11-27 08:07:00,543 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.40 vs. limit=12.0 2023-11-27 08:07:14,310 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 570700 2023-11-27 08:07:38,415 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3804760.0, ans=10.0 2023-11-27 08:07:38,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3804760.0, ans=0.2 2023-11-27 08:07:43,566 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 5600, loss[loss=0.05758, simple_loss=0.0848, pruned_loss=0.007193, audio_tagging_loss=0.007989, over 15596.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.0899, pruned_loss=0.01181, audio_tagging_loss=0.008768, over 3048697.83 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 08:07:45,153 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.28 vs. limit=15.0 2023-11-27 08:07:47,053 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3804826.6666666665, ans=10.0 2023-11-27 08:07:50,617 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.82 vs. limit=22.5 2023-11-27 08:08:10,532 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.878e+01 8.987e+01 9.756e+01 1.044e+02 1.605e+02, threshold=1.951e+02, percent-clipped=0.0 2023-11-27 08:08:10,629 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 570750 2023-11-27 08:08:12,835 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3804960.0, ans=0.1 2023-11-27 08:08:23,709 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 08:08:25,960 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3805026.6666666665, ans=0.0 2023-11-27 08:08:32,529 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3805093.3333333335, ans=0.125 2023-11-27 08:08:39,245 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 5650, loss[loss=0.06456, simple_loss=0.09359, pruned_loss=0.009858, audio_tagging_loss=0.007912, over 14853.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.08947, pruned_loss=0.01174, audio_tagging_loss=0.00885, over 3055986.65 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 08:08:45,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3805160.0, ans=0.125 2023-11-27 08:09:06,195 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 570800 2023-11-27 08:09:07,429 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3805293.3333333335, ans=0.0 2023-11-27 08:09:35,197 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.15 vs. limit=10.0 2023-11-27 08:09:35,529 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 5700, loss[loss=0.07029, simple_loss=0.09488, pruned_loss=0.01286, audio_tagging_loss=0.009992, over 15415.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08954, pruned_loss=0.01196, audio_tagging_loss=0.0088, over 3047363.14 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 08:09:48,939 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3805560.0, ans=0.125 2023-11-27 08:09:55,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3805560.0, ans=0.04949747468305833 2023-11-27 08:09:56,258 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3805626.6666666665, ans=0.125 2023-11-27 08:10:01,764 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.519e+01 8.888e+01 9.534e+01 1.037e+02 1.369e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-27 08:10:01,855 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 570850 2023-11-27 08:10:04,197 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3805626.6666666665, ans=0.09899494936611666 2023-11-27 08:10:28,395 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3805760.0, ans=0.125 2023-11-27 08:10:30,868 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 5750, loss[loss=0.04953, simple_loss=0.07342, pruned_loss=0.006542, audio_tagging_loss=0.00628, over 14479.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08913, pruned_loss=0.0119, audio_tagging_loss=0.008665, over 3051042.45 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 08:10:32,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3805826.6666666665, ans=0.125 2023-11-27 08:10:34,293 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_na.min_abs, batch_count=3805826.6666666665, ans=0.02 2023-11-27 08:10:36,486 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3805826.6666666665, ans=0.2 2023-11-27 08:10:58,282 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 570900 2023-11-27 08:11:10,567 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.84 vs. limit=15.0 2023-11-27 08:11:17,059 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3806093.3333333335, ans=0.125 2023-11-27 08:11:27,064 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 5800, loss[loss=0.06576, simple_loss=0.09834, pruned_loss=0.01047, audio_tagging_loss=0.006113, over 14719.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.0897, pruned_loss=0.01192, audio_tagging_loss=0.008507, over 3042163.35 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 08:11:27,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3806160.0, ans=0.125 2023-11-27 08:11:34,629 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3806160.0, ans=0.1 2023-11-27 08:11:44,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3806226.6666666665, ans=0.0 2023-11-27 08:11:53,669 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.966e+01 9.206e+01 9.616e+01 1.021e+02 1.551e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-27 08:11:53,760 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 570950 2023-11-27 08:12:04,561 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.61 vs. limit=15.0 2023-11-27 08:12:11,946 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3806426.6666666665, ans=0.0 2023-11-27 08:12:12,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3806426.6666666665, ans=0.0 2023-11-27 08:12:16,166 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3806426.6666666665, ans=0.05 2023-11-27 08:12:18,900 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3806426.6666666665, ans=0.2 2023-11-27 08:12:23,501 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 5850, loss[loss=0.05594, simple_loss=0.07425, pruned_loss=0.01032, audio_tagging_loss=0.008488, over 15750.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08924, pruned_loss=0.01187, audio_tagging_loss=0.008527, over 3038946.40 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 08:12:26,876 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3806493.3333333335, ans=0.125 2023-11-27 08:12:35,296 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3806560.0, ans=0.1 2023-11-27 08:12:49,885 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 571000 2023-11-27 08:12:51,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3806626.6666666665, ans=0.125 2023-11-27 08:12:57,624 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3806693.3333333335, ans=0.125 2023-11-27 08:13:18,639 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 5900, loss[loss=0.07674, simple_loss=0.09909, pruned_loss=0.01806, audio_tagging_loss=0.009134, over 14885.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08969, pruned_loss=0.0119, audio_tagging_loss=0.008504, over 3042245.36 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 08:13:37,204 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.68 vs. limit=15.0 2023-11-27 08:13:45,529 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.446e+01 9.203e+01 9.720e+01 1.067e+02 1.821e+02, threshold=1.944e+02, percent-clipped=0.0 2023-11-27 08:13:45,625 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 571050 2023-11-27 08:13:48,364 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3806960.0, ans=0.04949747468305833 2023-11-27 08:13:49,489 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3806960.0, ans=0.0 2023-11-27 08:13:50,613 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3806960.0, ans=0.1 2023-11-27 08:13:54,045 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.27 vs. limit=10.0 2023-11-27 08:13:54,714 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3807026.6666666665, ans=0.0 2023-11-27 08:13:55,191 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.28 vs. limit=22.5 2023-11-27 08:14:01,338 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.00 vs. limit=22.5 2023-11-27 08:14:05,820 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.46 vs. limit=15.0 2023-11-27 08:14:14,863 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 5950, loss[loss=0.05104, simple_loss=0.06134, pruned_loss=0.005043, audio_tagging_loss=0.01533, over 13681.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08969, pruned_loss=0.01189, audio_tagging_loss=0.008454, over 3042533.59 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 8.0 2023-11-27 08:14:16,111 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3807160.0, ans=0.125 2023-11-27 08:14:20,336 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3807160.0, ans=0.125 2023-11-27 08:14:38,260 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3807293.3333333335, ans=0.0 2023-11-27 08:14:41,370 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 571100 2023-11-27 08:14:59,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3807426.6666666665, ans=0.0 2023-11-27 08:15:09,540 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3807493.3333333335, ans=0.0 2023-11-27 08:15:10,286 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 6000, loss[loss=0.07416, simple_loss=0.1115, pruned_loss=0.01118, audio_tagging_loss=0.007228, over 15589.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08971, pruned_loss=0.0118, audio_tagging_loss=0.008486, over 3041918.30 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 08:15:10,286 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-27 08:15:42,623 INFO [train_asr.py:1267] (3/4) Epoch 48, validation: loss=0.05815, simple_loss=0.05046, pruned_loss=0.005371, audio_tagging_loss=0.02755, over 4681554.00 frames. 2023-11-27 08:15:42,624 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-27 08:15:48,091 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3807493.3333333335, ans=0.1 2023-11-27 08:15:48,189 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3807493.3333333335, ans=0.5 2023-11-27 08:15:48,204 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3807493.3333333335, ans=0.0 2023-11-27 08:15:54,772 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.78 vs. limit=15.0 2023-11-27 08:15:56,666 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3807560.0, ans=0.1 2023-11-27 08:16:00,161 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.22 vs. limit=15.0 2023-11-27 08:16:00,887 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3807560.0, ans=0.125 2023-11-27 08:16:10,243 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.632e+01 8.889e+01 9.644e+01 1.039e+02 1.494e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-27 08:16:10,352 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 571150 2023-11-27 08:16:15,026 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.72 vs. limit=15.0 2023-11-27 08:16:16,920 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3807693.3333333335, ans=0.2 2023-11-27 08:16:22,049 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3807693.3333333335, ans=0.125 2023-11-27 08:16:24,050 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 08:16:37,298 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.25 vs. limit=15.0 2023-11-27 08:16:39,039 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 6050, loss[loss=0.04843, simple_loss=0.06271, pruned_loss=0.007284, audio_tagging_loss=0.009798, over 14279.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.08981, pruned_loss=0.01186, audio_tagging_loss=0.008463, over 3048511.76 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 08:16:43,743 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.39 vs. limit=15.0 2023-11-27 08:16:56,282 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3807893.3333333335, ans=0.125 2023-11-27 08:17:05,682 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 571200 2023-11-27 08:17:27,483 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3808093.3333333335, ans=0.125 2023-11-27 08:17:31,512 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.53 vs. limit=22.5 2023-11-27 08:17:35,805 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 6100, loss[loss=0.05838, simple_loss=0.0784, pruned_loss=0.01142, audio_tagging_loss=0.007752, over 16107.00 frames. ], tot_loss[loss=0.06511, simple_loss=0.08965, pruned_loss=0.01186, audio_tagging_loss=0.008427, over 3059909.75 frames. ], batch size: 64, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 08:17:37,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3808160.0, ans=0.0 2023-11-27 08:17:42,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3808160.0, ans=0.1 2023-11-27 08:17:45,505 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3808226.6666666665, ans=0.125 2023-11-27 08:17:49,837 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3808226.6666666665, ans=0.2 2023-11-27 08:17:56,319 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3808293.3333333335, ans=0.1 2023-11-27 08:18:01,819 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.766e+01 9.030e+01 9.632e+01 1.027e+02 1.334e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-27 08:18:01,925 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 571250 2023-11-27 08:18:18,589 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3808360.0, ans=0.0 2023-11-27 08:18:20,653 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3808426.6666666665, ans=0.125 2023-11-27 08:18:23,953 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3808426.6666666665, ans=0.07 2023-11-27 08:18:31,161 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 6150, loss[loss=0.06074, simple_loss=0.08269, pruned_loss=0.01078, audio_tagging_loss=0.008615, over 15359.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08988, pruned_loss=0.01186, audio_tagging_loss=0.008402, over 3056194.53 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-27 08:18:31,543 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.10 vs. limit=22.5 2023-11-27 08:18:50,453 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3808560.0, ans=0.125 2023-11-27 08:18:58,673 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 571300 2023-11-27 08:19:06,385 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3808693.3333333335, ans=0.0 2023-11-27 08:19:11,710 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3808693.3333333335, ans=0.5 2023-11-27 08:19:26,893 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 6200, loss[loss=0.06466, simple_loss=0.09195, pruned_loss=0.009048, audio_tagging_loss=0.00964, over 15334.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08994, pruned_loss=0.01194, audio_tagging_loss=0.008572, over 3053293.83 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:19:29,240 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3808826.6666666665, ans=0.2 2023-11-27 08:19:35,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3808826.6666666665, ans=0.125 2023-11-27 08:19:41,326 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3808893.3333333335, ans=0.125 2023-11-27 08:19:53,786 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.167e+01 8.915e+01 9.429e+01 1.009e+02 1.347e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-27 08:19:53,883 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 571350 2023-11-27 08:20:23,690 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 6250, loss[loss=0.05395, simple_loss=0.07673, pruned_loss=0.005928, audio_tagging_loss=0.00966, over 14398.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.0902, pruned_loss=0.01208, audio_tagging_loss=0.008648, over 3051226.08 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:20:49,671 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 571400 2023-11-27 08:21:10,913 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3809426.6666666665, ans=0.125 2023-11-27 08:21:19,184 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 6300, loss[loss=0.06636, simple_loss=0.08742, pruned_loss=0.01486, audio_tagging_loss=0.00779, over 15469.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.09048, pruned_loss=0.01224, audio_tagging_loss=0.008736, over 3056206.74 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:21:20,556 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3809493.3333333335, ans=0.1 2023-11-27 08:21:21,886 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.23 vs. limit=15.0 2023-11-27 08:21:33,798 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3809560.0, ans=0.125 2023-11-27 08:21:35,854 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3809560.0, ans=0.125 2023-11-27 08:21:46,284 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.650e+01 8.820e+01 9.366e+01 1.014e+02 1.355e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-27 08:21:46,386 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 571450 2023-11-27 08:22:15,186 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 6350, loss[loss=0.08757, simple_loss=0.1226, pruned_loss=0.01808, audio_tagging_loss=0.0082, over 14525.00 frames. ], tot_loss[loss=0.06674, simple_loss=0.09132, pruned_loss=0.01234, audio_tagging_loss=0.008741, over 3050525.94 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 8.0 2023-11-27 08:22:41,861 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 571500 2023-11-27 08:22:44,549 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.25 vs. limit=22.5 2023-11-27 08:22:52,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3810026.6666666665, ans=0.125 2023-11-27 08:22:55,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3810026.6666666665, ans=0.125 2023-11-27 08:22:55,471 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3810026.6666666665, ans=0.125 2023-11-27 08:22:58,560 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3810093.3333333335, ans=0.125 2023-11-27 08:23:10,370 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.55 vs. limit=22.5 2023-11-27 08:23:11,259 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 6400, loss[loss=0.06688, simple_loss=0.08784, pruned_loss=0.01224, audio_tagging_loss=0.01072, over 14552.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.08984, pruned_loss=0.01222, audio_tagging_loss=0.00887, over 3044926.59 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:23:34,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3810293.3333333335, ans=0.0 2023-11-27 08:23:36,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3810293.3333333335, ans=0.0 2023-11-27 08:23:38,163 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 571550 2023-11-27 08:23:39,071 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.509e+01 8.870e+01 9.357e+01 1.034e+02 1.188e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-27 08:23:59,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3810426.6666666665, ans=0.125 2023-11-27 08:24:07,245 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 6450, loss[loss=0.0561, simple_loss=0.07198, pruned_loss=0.01232, audio_tagging_loss=0.00779, over 14892.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08912, pruned_loss=0.01201, audio_tagging_loss=0.00888, over 3041281.36 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:24:16,164 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3810493.3333333335, ans=0.04949747468305833 2023-11-27 08:24:23,069 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3810560.0, ans=0.125 2023-11-27 08:24:33,671 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 571600 2023-11-27 08:25:02,678 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 6500, loss[loss=0.06158, simple_loss=0.08369, pruned_loss=0.01161, audio_tagging_loss=0.008131, over 14872.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08927, pruned_loss=0.01202, audio_tagging_loss=0.008818, over 3044779.73 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:25:02,903 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3810826.6666666665, ans=0.1 2023-11-27 08:25:02,948 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3810826.6666666665, ans=0.1 2023-11-27 08:25:30,471 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 571650 2023-11-27 08:25:31,463 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.977e+01 8.956e+01 9.682e+01 1.036e+02 1.299e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-27 08:25:48,040 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3811093.3333333335, ans=0.0 2023-11-27 08:25:57,597 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3811160.0, ans=0.0 2023-11-27 08:25:58,445 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 6550, loss[loss=0.0532, simple_loss=0.065, pruned_loss=0.009315, audio_tagging_loss=0.01139, over 14329.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08952, pruned_loss=0.01204, audio_tagging_loss=0.008717, over 3046898.98 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 8.0 2023-11-27 08:26:03,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3811160.0, ans=0.0 2023-11-27 08:26:19,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3811226.6666666665, ans=0.5 2023-11-27 08:26:21,566 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3811293.3333333335, ans=0.125 2023-11-27 08:26:25,722 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 571700 2023-11-27 08:26:28,908 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.62 vs. limit=22.5 2023-11-27 08:26:37,142 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3811360.0, ans=0.0 2023-11-27 08:26:39,184 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3811360.0, ans=0.125 2023-11-27 08:26:44,001 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3811426.6666666665, ans=0.125 2023-11-27 08:26:55,411 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 6600, loss[loss=0.04766, simple_loss=0.06059, pruned_loss=0.007276, audio_tagging_loss=0.01009, over 14620.00 frames. ], tot_loss[loss=0.06503, simple_loss=0.08935, pruned_loss=0.01181, audio_tagging_loss=0.008544, over 3043511.39 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 8.0 2023-11-27 08:27:21,301 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 571750 2023-11-27 08:27:23,322 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.192e+01 9.078e+01 9.642e+01 1.016e+02 1.265e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-27 08:27:34,760 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3811693.3333333335, ans=0.1 2023-11-27 08:27:41,091 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3811760.0, ans=0.025 2023-11-27 08:27:43,247 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3811760.0, ans=0.125 2023-11-27 08:27:50,314 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 6650, loss[loss=0.06892, simple_loss=0.09922, pruned_loss=0.01032, audio_tagging_loss=0.008989, over 15487.00 frames. ], tot_loss[loss=0.0646, simple_loss=0.08859, pruned_loss=0.01167, audio_tagging_loss=0.008632, over 3045792.70 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 8.0 2023-11-27 08:27:52,479 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3811826.6666666665, ans=0.0 2023-11-27 08:28:18,129 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 571800 2023-11-27 08:28:18,254 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3811960.0, ans=0.125 2023-11-27 08:28:26,035 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3812026.6666666665, ans=0.0 2023-11-27 08:28:30,222 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3812026.6666666665, ans=0.125 2023-11-27 08:28:44,429 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3812093.3333333335, ans=0.1 2023-11-27 08:28:46,280 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 6700, loss[loss=0.0639, simple_loss=0.09204, pruned_loss=0.01195, audio_tagging_loss=0.005929, over 15342.00 frames. ], tot_loss[loss=0.06477, simple_loss=0.08894, pruned_loss=0.0118, audio_tagging_loss=0.008506, over 3047552.94 frames. ], batch size: 61, lr: 1.40e-03, grad_scale: 8.0 2023-11-27 08:29:02,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3812226.6666666665, ans=0.125 2023-11-27 08:29:12,763 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.45 vs. limit=12.0 2023-11-27 08:29:13,401 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 571850 2023-11-27 08:29:15,462 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.479e+01 9.098e+01 9.634e+01 1.039e+02 1.370e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-27 08:29:15,774 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3812293.3333333335, ans=0.0 2023-11-27 08:29:17,874 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3812293.3333333335, ans=0.125 2023-11-27 08:29:26,121 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.31 vs. limit=22.5 2023-11-27 08:29:27,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3812360.0, ans=0.125 2023-11-27 08:29:42,725 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 6750, loss[loss=0.06235, simple_loss=0.09307, pruned_loss=0.007318, audio_tagging_loss=0.008498, over 14598.00 frames. ], tot_loss[loss=0.06493, simple_loss=0.0892, pruned_loss=0.01183, audio_tagging_loss=0.008508, over 3041044.69 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 8.0 2023-11-27 08:30:09,258 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 571900 2023-11-27 08:30:19,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3812693.3333333335, ans=0.125 2023-11-27 08:30:32,419 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.79 vs. limit=22.5 2023-11-27 08:30:38,297 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 6800, loss[loss=0.06293, simple_loss=0.0926, pruned_loss=0.009416, audio_tagging_loss=0.007216, over 15337.00 frames. ], tot_loss[loss=0.06482, simple_loss=0.08913, pruned_loss=0.01183, audio_tagging_loss=0.008425, over 3037513.70 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:30:52,784 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3812893.3333333335, ans=10.0 2023-11-27 08:30:54,913 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3812893.3333333335, ans=0.125 2023-11-27 08:31:02,293 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3812960.0, ans=0.125 2023-11-27 08:31:05,389 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 571950 2023-11-27 08:31:07,402 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.719e+01 9.218e+01 9.743e+01 1.054e+02 1.281e+02, threshold=1.949e+02, percent-clipped=0.0 2023-11-27 08:31:11,293 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.34 vs. limit=15.0 2023-11-27 08:31:17,730 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3813026.6666666665, ans=0.1 2023-11-27 08:31:33,855 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 6850, loss[loss=0.06152, simple_loss=0.08033, pruned_loss=0.01135, audio_tagging_loss=0.01001, over 14551.00 frames. ], tot_loss[loss=0.06482, simple_loss=0.08929, pruned_loss=0.01173, audio_tagging_loss=0.008448, over 3047024.45 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:31:35,589 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.72 vs. limit=22.5 2023-11-27 08:31:44,614 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3813226.6666666665, ans=0.1 2023-11-27 08:31:54,659 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.50 vs. limit=15.0 2023-11-27 08:32:01,243 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 572000 2023-11-27 08:32:15,691 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3813360.0, ans=0.125 2023-11-27 08:32:32,569 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 6900, loss[loss=0.06013, simple_loss=0.07936, pruned_loss=0.01129, audio_tagging_loss=0.009157, over 14748.00 frames. ], tot_loss[loss=0.0647, simple_loss=0.08921, pruned_loss=0.01165, audio_tagging_loss=0.008449, over 3048675.37 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:32:46,123 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.90 vs. limit=15.0 2023-11-27 08:32:52,240 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3813560.0, ans=0.0 2023-11-27 08:32:58,977 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 572050 2023-11-27 08:33:01,023 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.610e+01 8.797e+01 9.367e+01 1.009e+02 1.933e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-27 08:33:03,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3813626.6666666665, ans=0.07 2023-11-27 08:33:05,482 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3813693.3333333335, ans=0.125 2023-11-27 08:33:08,153 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3813693.3333333335, ans=0.125 2023-11-27 08:33:10,732 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3813693.3333333335, ans=0.2 2023-11-27 08:33:14,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3813693.3333333335, ans=0.125 2023-11-27 08:33:15,883 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 08:33:17,070 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3813760.0, ans=0.1 2023-11-27 08:33:26,039 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3813760.0, ans=0.125 2023-11-27 08:33:28,056 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 6950, loss[loss=0.06864, simple_loss=0.09096, pruned_loss=0.01516, audio_tagging_loss=0.008002, over 15034.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.0893, pruned_loss=0.01168, audio_tagging_loss=0.008465, over 3047233.11 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:33:55,164 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 572100 2023-11-27 08:33:57,354 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 08:34:13,827 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3814093.3333333335, ans=0.1 2023-11-27 08:34:14,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3814093.3333333335, ans=0.125 2023-11-27 08:34:15,879 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3814093.3333333335, ans=0.125 2023-11-27 08:34:20,839 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.04 vs. limit=15.0 2023-11-27 08:34:23,584 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 7000, loss[loss=0.06002, simple_loss=0.08924, pruned_loss=0.007996, audio_tagging_loss=0.007404, over 15117.00 frames. ], tot_loss[loss=0.06475, simple_loss=0.08916, pruned_loss=0.01167, audio_tagging_loss=0.008493, over 3040910.95 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:34:24,946 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3814160.0, ans=0.125 2023-11-27 08:34:42,965 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3814226.6666666665, ans=0.125 2023-11-27 08:34:50,161 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 572150 2023-11-27 08:34:52,165 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.553e+01 9.213e+01 9.596e+01 1.029e+02 1.427e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-27 08:35:13,713 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3814426.6666666665, ans=0.125 2023-11-27 08:35:18,438 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3814493.3333333335, ans=0.125 2023-11-27 08:35:19,303 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 7050, loss[loss=0.05801, simple_loss=0.07915, pruned_loss=0.01118, audio_tagging_loss=0.007246, over 15384.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.0895, pruned_loss=0.01185, audio_tagging_loss=0.008581, over 3042898.40 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:35:45,968 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 572200 2023-11-27 08:35:57,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3814693.3333333335, ans=0.1 2023-11-27 08:36:02,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3814693.3333333335, ans=0.2 2023-11-27 08:36:14,651 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 7100, loss[loss=0.09224, simple_loss=0.1294, pruned_loss=0.02172, audio_tagging_loss=0.005818, over 13999.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08935, pruned_loss=0.01187, audio_tagging_loss=0.008772, over 3049863.07 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:36:14,899 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3814826.6666666665, ans=0.0 2023-11-27 08:36:21,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3814826.6666666665, ans=0.1 2023-11-27 08:36:21,940 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 08:36:28,769 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3814893.3333333335, ans=0.125 2023-11-27 08:36:40,115 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3814960.0, ans=0.125 2023-11-27 08:36:42,600 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 572250 2023-11-27 08:36:44,590 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.212e+01 9.022e+01 9.654e+01 1.030e+02 1.274e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-27 08:36:55,406 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3815026.6666666665, ans=0.0 2023-11-27 08:37:11,091 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 7150, loss[loss=0.05076, simple_loss=0.06283, pruned_loss=0.008543, audio_tagging_loss=0.0108, over 15195.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08985, pruned_loss=0.01184, audio_tagging_loss=0.008732, over 3064217.18 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:37:21,020 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3815226.6666666665, ans=0.125 2023-11-27 08:37:37,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff3.min_abs, batch_count=3815293.3333333335, ans=0.2 2023-11-27 08:37:38,035 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 572300 2023-11-27 08:37:57,803 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=3815426.6666666665, ans=15.0 2023-11-27 08:38:01,681 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3815426.6666666665, ans=0.125 2023-11-27 08:38:07,754 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 7200, loss[loss=0.05029, simple_loss=0.06691, pruned_loss=0.007477, audio_tagging_loss=0.009363, over 15249.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08936, pruned_loss=0.01182, audio_tagging_loss=0.008918, over 3054194.48 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 08:38:09,107 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3815493.3333333335, ans=0.0 2023-11-27 08:38:33,737 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 572350 2023-11-27 08:38:35,750 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.066e+01 9.027e+01 9.481e+01 1.011e+02 1.295e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-27 08:39:02,503 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 7250, loss[loss=0.06284, simple_loss=0.082, pruned_loss=0.01354, audio_tagging_loss=0.0083, over 15316.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08969, pruned_loss=0.01185, audio_tagging_loss=0.008895, over 3053143.13 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 08:39:08,051 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3815826.6666666665, ans=0.125 2023-11-27 08:39:09,049 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3815826.6666666665, ans=0.1 2023-11-27 08:39:14,995 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3815893.3333333335, ans=0.125 2023-11-27 08:39:23,923 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3815960.0, ans=0.0 2023-11-27 08:39:29,495 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 572400 2023-11-27 08:39:30,769 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3815960.0, ans=0.125 2023-11-27 08:39:36,080 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.66 vs. limit=15.0 2023-11-27 08:39:40,960 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3816026.6666666665, ans=0.125 2023-11-27 08:39:58,249 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 7300, loss[loss=0.06025, simple_loss=0.07939, pruned_loss=0.01248, audio_tagging_loss=0.008071, over 15076.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08942, pruned_loss=0.01193, audio_tagging_loss=0.008811, over 3044369.39 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 08:40:05,365 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3816160.0, ans=0.125 2023-11-27 08:40:25,390 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 572450 2023-11-27 08:40:27,407 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.653e+01 9.262e+01 9.740e+01 1.057e+02 1.335e+02, threshold=1.948e+02, percent-clipped=0.0 2023-11-27 08:40:28,646 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3816293.3333333335, ans=0.125 2023-11-27 08:40:37,224 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3816360.0, ans=0.125 2023-11-27 08:40:48,602 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3816426.6666666665, ans=0.125 2023-11-27 08:40:54,393 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 7350, loss[loss=0.0689, simple_loss=0.09583, pruned_loss=0.01258, audio_tagging_loss=0.008411, over 15402.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.09068, pruned_loss=0.01216, audio_tagging_loss=0.008682, over 3049655.40 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 08:41:14,526 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.61 vs. limit=12.0 2023-11-27 08:41:15,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3816626.6666666665, ans=0.125 2023-11-27 08:41:17,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3816626.6666666665, ans=0.04949747468305833 2023-11-27 08:41:20,313 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 572500 2023-11-27 08:41:40,751 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.84 vs. limit=15.0 2023-11-27 08:41:49,718 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 7400, loss[loss=0.07116, simple_loss=0.1053, pruned_loss=0.01013, audio_tagging_loss=0.0084, over 14525.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.09083, pruned_loss=0.01225, audio_tagging_loss=0.008554, over 3051463.51 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:42:05,185 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3816893.3333333335, ans=0.2 2023-11-27 08:42:16,187 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 572550 2023-11-27 08:42:16,318 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3816960.0, ans=0.0 2023-11-27 08:42:19,727 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.911e+01 9.063e+01 9.701e+01 1.022e+02 1.505e+02, threshold=1.940e+02, percent-clipped=0.0 2023-11-27 08:42:23,054 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.58 vs. limit=22.5 2023-11-27 08:42:33,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3817093.3333333335, ans=0.125 2023-11-27 08:42:44,744 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 7450, loss[loss=0.07738, simple_loss=0.1027, pruned_loss=0.01932, audio_tagging_loss=0.006722, over 14838.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.0894, pruned_loss=0.01194, audio_tagging_loss=0.008611, over 3053654.06 frames. ], batch size: 53, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:42:48,625 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3817160.0, ans=0.1 2023-11-27 08:42:49,647 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3817160.0, ans=0.015 2023-11-27 08:42:49,648 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3817160.0, ans=0.125 2023-11-27 08:43:04,173 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3817226.6666666665, ans=0.125 2023-11-27 08:43:05,130 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 08:43:12,337 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 572600 2023-11-27 08:43:19,071 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3817360.0, ans=0.125 2023-11-27 08:43:29,263 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3817426.6666666665, ans=0.2 2023-11-27 08:43:32,473 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3817426.6666666665, ans=0.125 2023-11-27 08:43:35,676 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3817426.6666666665, ans=0.0 2023-11-27 08:43:41,245 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 7500, loss[loss=0.06436, simple_loss=0.08776, pruned_loss=0.01261, audio_tagging_loss=0.007872, over 14928.00 frames. ], tot_loss[loss=0.06447, simple_loss=0.08838, pruned_loss=0.01166, audio_tagging_loss=0.008625, over 3055095.47 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:43:41,416 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3817493.3333333335, ans=0.2 2023-11-27 08:43:56,403 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3817560.0, ans=0.1 2023-11-27 08:43:56,444 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3817560.0, ans=0.1 2023-11-27 08:44:07,904 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 572650 2023-11-27 08:44:11,556 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.822e+01 8.822e+01 9.501e+01 1.047e+02 1.367e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-27 08:44:16,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3817693.3333333335, ans=0.125 2023-11-27 08:44:26,617 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3817760.0, ans=0.125 2023-11-27 08:44:26,716 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3817760.0, ans=0.2 2023-11-27 08:44:37,568 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 7550, loss[loss=0.06283, simple_loss=0.08787, pruned_loss=0.01246, audio_tagging_loss=0.006439, over 15345.00 frames. ], tot_loss[loss=0.06409, simple_loss=0.08782, pruned_loss=0.01163, audio_tagging_loss=0.008547, over 3055374.31 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:44:47,402 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.85 vs. limit=22.5 2023-11-27 08:44:59,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3817960.0, ans=0.09899494936611666 2023-11-27 08:45:03,385 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 572700 2023-11-27 08:45:12,603 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=3818026.6666666665, ans=0.05 2023-11-27 08:45:31,772 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3818160.0, ans=0.1 2023-11-27 08:45:32,592 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 7600, loss[loss=0.05482, simple_loss=0.07721, pruned_loss=0.009706, audio_tagging_loss=0.006504, over 14884.00 frames. ], tot_loss[loss=0.0637, simple_loss=0.08735, pruned_loss=0.01153, audio_tagging_loss=0.008502, over 3058194.39 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 08:45:33,160 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.49 vs. limit=10.0 2023-11-27 08:45:35,620 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3818160.0, ans=0.125 2023-11-27 08:45:43,237 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.46 vs. limit=6.0 2023-11-27 08:45:56,187 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.68 vs. limit=15.0 2023-11-27 08:45:59,827 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 572750 2023-11-27 08:46:02,063 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3818293.3333333335, ans=0.125 2023-11-27 08:46:02,836 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.589e+01 8.740e+01 9.501e+01 1.030e+02 1.304e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-27 08:46:15,995 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3818426.6666666665, ans=0.125 2023-11-27 08:46:22,799 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3818426.6666666665, ans=0.125 2023-11-27 08:46:26,157 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3818426.6666666665, ans=0.09899494936611666 2023-11-27 08:46:27,240 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3818493.3333333335, ans=0.1 2023-11-27 08:46:28,095 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 7650, loss[loss=0.05947, simple_loss=0.0817, pruned_loss=0.009402, audio_tagging_loss=0.009223, over 14842.00 frames. ], tot_loss[loss=0.06379, simple_loss=0.0872, pruned_loss=0.0116, audio_tagging_loss=0.008596, over 3048885.03 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 08:46:34,709 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3818493.3333333335, ans=0.125 2023-11-27 08:46:36,395 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3818493.3333333335, ans=0.07 2023-11-27 08:46:55,249 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 572800 2023-11-27 08:47:12,539 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.50 vs. limit=22.5 2023-11-27 08:47:24,434 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 7700, loss[loss=0.07417, simple_loss=0.1047, pruned_loss=0.01335, audio_tagging_loss=0.008489, over 17512.00 frames. ], tot_loss[loss=0.06363, simple_loss=0.08709, pruned_loss=0.01152, audio_tagging_loss=0.008574, over 3054317.58 frames. ], batch size: 64, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 08:47:35,255 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 08:47:50,559 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 572850 2023-11-27 08:47:53,660 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.196e+01 9.058e+01 9.794e+01 1.057e+02 1.473e+02, threshold=1.959e+02, percent-clipped=0.0 2023-11-27 08:47:58,673 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3819026.6666666665, ans=0.125 2023-11-27 08:48:19,716 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 7750, loss[loss=0.05198, simple_loss=0.07181, pruned_loss=0.007433, audio_tagging_loss=0.008644, over 15226.00 frames. ], tot_loss[loss=0.06418, simple_loss=0.08784, pruned_loss=0.01163, audio_tagging_loss=0.00863, over 3057049.33 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 08:48:32,441 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.24 vs. limit=15.0 2023-11-27 08:48:40,082 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3819226.6666666665, ans=0.125 2023-11-27 08:48:47,313 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 572900 2023-11-27 08:48:48,724 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.08 vs. limit=15.0 2023-11-27 08:48:50,601 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-27 08:48:53,942 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.27 vs. limit=12.0 2023-11-27 08:48:56,933 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3819360.0, ans=0.0 2023-11-27 08:49:01,164 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3819360.0, ans=0.09899494936611666 2023-11-27 08:49:04,276 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3819426.6666666665, ans=0.125 2023-11-27 08:49:08,072 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3819426.6666666665, ans=0.2 2023-11-27 08:49:10,136 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3819426.6666666665, ans=0.0 2023-11-27 08:49:15,365 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 7800, loss[loss=0.06963, simple_loss=0.0928, pruned_loss=0.0146, audio_tagging_loss=0.008624, over 15492.00 frames. ], tot_loss[loss=0.0643, simple_loss=0.08783, pruned_loss=0.01172, audio_tagging_loss=0.008663, over 3050873.15 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 08:49:42,510 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 572950 2023-11-27 08:49:45,620 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.310e+01 9.181e+01 9.727e+01 1.046e+02 1.272e+02, threshold=1.945e+02, percent-clipped=0.0 2023-11-27 08:49:53,376 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.80 vs. limit=15.0 2023-11-27 08:49:59,129 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 08:50:01,370 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3819760.0, ans=0.125 2023-11-27 08:50:08,993 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.02 vs. limit=6.0 2023-11-27 08:50:10,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3819826.6666666665, ans=0.125 2023-11-27 08:50:11,737 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 7850, loss[loss=0.04638, simple_loss=0.06587, pruned_loss=0.005126, audio_tagging_loss=0.008322, over 13492.00 frames. ], tot_loss[loss=0.06429, simple_loss=0.08782, pruned_loss=0.01174, audio_tagging_loss=0.008646, over 3050585.69 frames. ], batch size: 53, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 08:50:21,226 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.87 vs. limit=10.0 2023-11-27 08:50:22,405 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.99 vs. limit=15.0 2023-11-27 08:50:38,037 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 573000 2023-11-27 08:51:04,261 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3820093.3333333335, ans=0.0 2023-11-27 08:51:07,216 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 7900, loss[loss=0.08606, simple_loss=0.1221, pruned_loss=0.01724, audio_tagging_loss=0.007779, over 15524.00 frames. ], tot_loss[loss=0.06478, simple_loss=0.08866, pruned_loss=0.01178, audio_tagging_loss=0.008667, over 3051967.11 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 08:51:31,672 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3820293.3333333335, ans=0.05 2023-11-27 08:51:34,173 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 573050 2023-11-27 08:51:37,303 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.827e+01 9.140e+01 9.856e+01 1.052e+02 1.450e+02, threshold=1.971e+02, percent-clipped=0.0 2023-11-27 08:51:37,574 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3820293.3333333335, ans=0.0 2023-11-27 08:51:45,482 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3820360.0, ans=0.0 2023-11-27 08:51:46,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3820360.0, ans=0.125 2023-11-27 08:51:47,930 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.01 vs. limit=6.0 2023-11-27 08:51:51,408 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.44 vs. limit=15.0 2023-11-27 08:51:58,749 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=3820426.6666666665, ans=0.05 2023-11-27 08:52:00,434 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.29 vs. limit=15.0 2023-11-27 08:52:02,773 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 7950, loss[loss=0.08282, simple_loss=0.1141, pruned_loss=0.01814, audio_tagging_loss=0.007653, over 16095.00 frames. ], tot_loss[loss=0.06474, simple_loss=0.08853, pruned_loss=0.01173, audio_tagging_loss=0.008744, over 3049261.03 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 08:52:04,018 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3820493.3333333335, ans=0.125 2023-11-27 08:52:17,619 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 08:52:23,101 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3820560.0, ans=0.125 2023-11-27 08:52:25,725 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3820626.6666666665, ans=0.125 2023-11-27 08:52:29,795 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 573100 2023-11-27 08:52:36,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3820693.3333333335, ans=0.125 2023-11-27 08:52:38,069 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3820693.3333333335, ans=0.1 2023-11-27 08:52:43,572 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.37 vs. limit=10.0 2023-11-27 08:52:44,294 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3820693.3333333335, ans=0.0 2023-11-27 08:52:59,102 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 8000, loss[loss=0.05043, simple_loss=0.06473, pruned_loss=0.009278, audio_tagging_loss=0.008794, over 15740.00 frames. ], tot_loss[loss=0.06445, simple_loss=0.08794, pruned_loss=0.01168, audio_tagging_loss=0.008806, over 3048331.97 frames. ], batch size: 62, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 08:52:59,351 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-27 08:52:59,417 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 08:53:03,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3820826.6666666665, ans=0.125 2023-11-27 08:53:25,489 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 573150 2023-11-27 08:53:28,617 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.317e+01 8.923e+01 9.617e+01 1.018e+02 1.242e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-27 08:53:54,558 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 8050, loss[loss=0.07686, simple_loss=0.1044, pruned_loss=0.01655, audio_tagging_loss=0.00809, over 13867.00 frames. ], tot_loss[loss=0.06446, simple_loss=0.08786, pruned_loss=0.01171, audio_tagging_loss=0.00882, over 3041372.07 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 08:53:57,870 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3821160.0, ans=0.125 2023-11-27 08:54:01,192 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3821160.0, ans=0.1 2023-11-27 08:54:21,111 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 573200 2023-11-27 08:54:22,850 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3821293.3333333335, ans=0.125 2023-11-27 08:54:50,349 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 8100, loss[loss=0.07547, simple_loss=0.1061, pruned_loss=0.01436, audio_tagging_loss=0.008074, over 14074.00 frames. ], tot_loss[loss=0.06477, simple_loss=0.0885, pruned_loss=0.01183, audio_tagging_loss=0.008694, over 3043229.69 frames. ], batch size: 53, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:55:17,023 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 573250 2023-11-27 08:55:18,361 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=3821626.6666666665, ans=15.0 2023-11-27 08:55:21,704 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.859e+01 8.982e+01 9.731e+01 1.040e+02 1.240e+02, threshold=1.946e+02, percent-clipped=0.0 2023-11-27 08:55:29,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3821693.3333333335, ans=0.2 2023-11-27 08:55:30,175 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.68 vs. limit=15.0 2023-11-27 08:55:43,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3821760.0, ans=0.0 2023-11-27 08:55:46,154 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 8150, loss[loss=0.0878, simple_loss=0.1244, pruned_loss=0.01959, audio_tagging_loss=0.006015, over 15552.00 frames. ], tot_loss[loss=0.06473, simple_loss=0.08891, pruned_loss=0.01182, audio_tagging_loss=0.008452, over 3043417.41 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:55:55,307 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3821826.6666666665, ans=0.125 2023-11-27 08:55:55,354 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3821826.6666666665, ans=0.2 2023-11-27 08:56:13,223 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 573300 2023-11-27 08:56:21,780 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3822026.6666666665, ans=0.125 2023-11-27 08:56:28,850 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3822026.6666666665, ans=0.0 2023-11-27 08:56:32,610 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3822093.3333333335, ans=0.0 2023-11-27 08:56:41,845 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 8200, loss[loss=0.05155, simple_loss=0.05585, pruned_loss=0.01024, audio_tagging_loss=0.01338, over 14687.00 frames. ], tot_loss[loss=0.06444, simple_loss=0.08837, pruned_loss=0.01184, audio_tagging_loss=0.008415, over 3045229.82 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:56:42,908 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 08:56:45,485 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.38 vs. limit=10.0 2023-11-27 08:56:48,938 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3822160.0, ans=0.2 2023-11-27 08:57:08,866 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 573350 2023-11-27 08:57:12,293 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3822293.3333333335, ans=0.125 2023-11-27 08:57:13,629 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.698e+01 9.014e+01 9.648e+01 1.048e+02 1.501e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-27 08:57:18,117 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3822360.0, ans=0.05 2023-11-27 08:57:21,223 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3822360.0, ans=0.04949747468305833 2023-11-27 08:57:38,037 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 8250, loss[loss=0.06462, simple_loss=0.08863, pruned_loss=0.01322, audio_tagging_loss=0.007085, over 16431.00 frames. ], tot_loss[loss=0.06465, simple_loss=0.08905, pruned_loss=0.01188, audio_tagging_loss=0.008251, over 3054913.70 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:57:49,101 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3822560.0, ans=0.0 2023-11-27 08:57:53,979 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3822560.0, ans=0.125 2023-11-27 08:58:02,931 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3822626.6666666665, ans=0.0 2023-11-27 08:58:04,950 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 573400 2023-11-27 08:58:20,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3822693.3333333335, ans=0.125 2023-11-27 08:58:34,674 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 8300, loss[loss=0.05918, simple_loss=0.07964, pruned_loss=0.008208, audio_tagging_loss=0.01115, over 14192.00 frames. ], tot_loss[loss=0.06459, simple_loss=0.08902, pruned_loss=0.0118, audio_tagging_loss=0.008279, over 3057179.03 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:58:37,065 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3822826.6666666665, ans=0.125 2023-11-27 08:58:40,029 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.15 vs. limit=15.0 2023-11-27 08:58:45,158 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.05 vs. limit=10.0 2023-11-27 08:58:50,215 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3822893.3333333335, ans=0.1 2023-11-27 08:59:01,358 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 573450 2023-11-27 08:59:05,531 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.009e+01 8.971e+01 9.543e+01 1.035e+02 1.385e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-27 08:59:29,744 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.74 vs. limit=15.0 2023-11-27 08:59:30,190 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 8350, loss[loss=0.07567, simple_loss=0.1042, pruned_loss=0.01446, audio_tagging_loss=0.009123, over 15589.00 frames. ], tot_loss[loss=0.06467, simple_loss=0.0891, pruned_loss=0.01185, audio_tagging_loss=0.008263, over 3056234.98 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 08:59:30,383 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3823160.0, ans=0.0 2023-11-27 08:59:32,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3823160.0, ans=0.125 2023-11-27 08:59:56,917 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 573500 2023-11-27 09:00:00,687 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3823293.3333333335, ans=0.2 2023-11-27 09:00:09,685 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3823360.0, ans=10.0 2023-11-27 09:00:16,350 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.15 vs. limit=10.0 2023-11-27 09:00:17,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3823426.6666666665, ans=0.125 2023-11-27 09:00:18,199 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3823426.6666666665, ans=0.125 2023-11-27 09:00:19,118 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3823426.6666666665, ans=0.0 2023-11-27 09:00:25,934 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 8400, loss[loss=0.05668, simple_loss=0.06625, pruned_loss=0.01144, audio_tagging_loss=0.01212, over 14841.00 frames. ], tot_loss[loss=0.06476, simple_loss=0.08933, pruned_loss=0.01186, audio_tagging_loss=0.008239, over 3049089.00 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:00:31,793 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3823493.3333333335, ans=0.125 2023-11-27 09:00:33,286 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.28 vs. limit=15.0 2023-11-27 09:00:38,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3823560.0, ans=0.125 2023-11-27 09:00:44,963 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3823560.0, ans=0.0 2023-11-27 09:00:47,571 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3823626.6666666665, ans=0.0 2023-11-27 09:00:52,710 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 573550 2023-11-27 09:00:56,271 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.10 vs. limit=15.0 2023-11-27 09:00:56,847 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.632e+01 8.941e+01 9.645e+01 1.032e+02 1.251e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-27 09:01:05,382 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3823693.3333333335, ans=0.125 2023-11-27 09:01:21,022 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 8450, loss[loss=0.07117, simple_loss=0.09771, pruned_loss=0.01509, audio_tagging_loss=0.007224, over 16635.00 frames. ], tot_loss[loss=0.06429, simple_loss=0.08834, pruned_loss=0.01172, audio_tagging_loss=0.008395, over 3045359.10 frames. ], batch size: 63, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:01:21,183 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3823826.6666666665, ans=0.125 2023-11-27 09:01:34,060 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.75 vs. limit=15.0 2023-11-27 09:01:35,903 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3823893.3333333335, ans=0.125 2023-11-27 09:01:35,962 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3823893.3333333335, ans=0.0 2023-11-27 09:01:47,445 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 573600 2023-11-27 09:01:53,704 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.79 vs. limit=15.0 2023-11-27 09:01:55,465 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3824026.6666666665, ans=0.1 2023-11-27 09:02:05,798 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3824093.3333333335, ans=0.125 2023-11-27 09:02:16,957 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 8500, loss[loss=0.06556, simple_loss=0.0927, pruned_loss=0.01019, audio_tagging_loss=0.00902, over 14862.00 frames. ], tot_loss[loss=0.0643, simple_loss=0.08818, pruned_loss=0.01176, audio_tagging_loss=0.00845, over 3049302.92 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:02:43,593 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 573650 2023-11-27 09:02:47,398 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3824293.3333333335, ans=0.07 2023-11-27 09:02:48,207 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.021e+01 9.075e+01 9.563e+01 1.041e+02 1.324e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-27 09:03:00,740 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3824426.6666666665, ans=0.1 2023-11-27 09:03:12,095 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 8550, loss[loss=0.06108, simple_loss=0.07998, pruned_loss=0.01016, audio_tagging_loss=0.01093, over 15494.00 frames. ], tot_loss[loss=0.06475, simple_loss=0.08879, pruned_loss=0.01175, audio_tagging_loss=0.008603, over 3054638.37 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:03:12,693 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.36 vs. limit=22.5 2023-11-27 09:03:19,210 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3824493.3333333335, ans=0.2 2023-11-27 09:03:39,708 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 573700 2023-11-27 09:03:50,599 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3824693.3333333335, ans=0.125 2023-11-27 09:04:08,392 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 8600, loss[loss=0.04606, simple_loss=0.06436, pruned_loss=0.004156, audio_tagging_loss=0.009723, over 16113.00 frames. ], tot_loss[loss=0.06451, simple_loss=0.08818, pruned_loss=0.01173, audio_tagging_loss=0.008681, over 3052590.65 frames. ], batch size: 62, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:04:15,652 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2023-11-27 09:04:21,412 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3824893.3333333335, ans=0.125 2023-11-27 09:04:25,571 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 09:04:34,993 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 573750 2023-11-27 09:04:39,602 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.804e+01 9.143e+01 9.907e+01 1.055e+02 1.409e+02, threshold=1.981e+02, percent-clipped=0.0 2023-11-27 09:04:54,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3825093.3333333335, ans=0.5 2023-11-27 09:04:56,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3825093.3333333335, ans=0.125 2023-11-27 09:04:58,960 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3825093.3333333335, ans=0.0 2023-11-27 09:05:04,593 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 8650, loss[loss=0.07123, simple_loss=0.1002, pruned_loss=0.01293, audio_tagging_loss=0.008211, over 15636.00 frames. ], tot_loss[loss=0.06494, simple_loss=0.08901, pruned_loss=0.01177, audio_tagging_loss=0.008668, over 3055959.13 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:05:11,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3825160.0, ans=0.05 2023-11-27 09:05:30,639 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 573800 2023-11-27 09:06:00,048 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 8700, loss[loss=0.06849, simple_loss=0.09491, pruned_loss=0.01229, audio_tagging_loss=0.008746, over 16247.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.08936, pruned_loss=0.01196, audio_tagging_loss=0.008584, over 3061266.06 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:06:27,597 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 573850 2023-11-27 09:06:32,831 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.045e+01 9.211e+01 9.788e+01 1.039e+02 1.317e+02, threshold=1.958e+02, percent-clipped=0.0 2023-11-27 09:06:34,097 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3825693.3333333335, ans=0.125 2023-11-27 09:06:39,419 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3825693.3333333335, ans=0.125 2023-11-27 09:06:40,943 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.49 vs. limit=15.0 2023-11-27 09:06:42,698 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3825693.3333333335, ans=0.0 2023-11-27 09:06:43,711 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3825760.0, ans=10.0 2023-11-27 09:06:46,285 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.90 vs. limit=15.0 2023-11-27 09:06:55,777 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 8750, loss[loss=0.07906, simple_loss=0.1078, pruned_loss=0.01672, audio_tagging_loss=0.008419, over 15749.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08923, pruned_loss=0.01201, audio_tagging_loss=0.008726, over 3050990.77 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:07:01,770 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3825826.6666666665, ans=0.1 2023-11-27 09:07:04,198 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.20 vs. limit=15.0 2023-11-27 09:07:22,872 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 573900 2023-11-27 09:07:30,949 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3826026.6666666665, ans=0.125 2023-11-27 09:07:44,192 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3826093.3333333335, ans=0.2 2023-11-27 09:07:52,414 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 8800, loss[loss=0.06065, simple_loss=0.082, pruned_loss=0.009383, audio_tagging_loss=0.01027, over 15301.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.09004, pruned_loss=0.01209, audio_tagging_loss=0.008817, over 3050212.83 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:08:07,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3826226.6666666665, ans=0.125 2023-11-27 09:08:18,409 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 573950 2023-11-27 09:08:19,571 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3826293.3333333335, ans=0.1 2023-11-27 09:08:23,599 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.888e+01 9.347e+01 1.002e+02 1.077e+02 1.340e+02, threshold=2.003e+02, percent-clipped=0.0 2023-11-27 09:08:23,896 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3826360.0, ans=0.125 2023-11-27 09:08:47,536 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 8850, loss[loss=0.07115, simple_loss=0.1048, pruned_loss=0.01195, audio_tagging_loss=0.006789, over 14444.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.08968, pruned_loss=0.01218, audio_tagging_loss=0.008833, over 3050826.82 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:08:54,890 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.58 vs. limit=6.0 2023-11-27 09:08:58,678 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 09:09:13,970 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 574000 2023-11-27 09:09:15,076 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3826626.6666666665, ans=0.125 2023-11-27 09:09:42,719 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 8900, loss[loss=0.04879, simple_loss=0.07046, pruned_loss=0.008093, audio_tagging_loss=0.005466, over 14584.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.09047, pruned_loss=0.01233, audio_tagging_loss=0.008708, over 3047603.36 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:09:45,057 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3826826.6666666665, ans=0.125 2023-11-27 09:09:50,910 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3826826.6666666665, ans=0.125 2023-11-27 09:09:57,857 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3826893.3333333335, ans=0.95 2023-11-27 09:10:03,696 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3826893.3333333335, ans=0.1 2023-11-27 09:10:04,733 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3826960.0, ans=0.125 2023-11-27 09:10:09,983 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 574050 2023-11-27 09:10:11,089 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3826960.0, ans=0.0 2023-11-27 09:10:16,219 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.150e+01 9.024e+01 9.616e+01 1.025e+02 1.217e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-27 09:10:16,997 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=14.02 vs. limit=15.0 2023-11-27 09:10:36,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3827093.3333333335, ans=0.125 2023-11-27 09:10:38,536 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 8950, loss[loss=0.05948, simple_loss=0.08536, pruned_loss=0.01003, audio_tagging_loss=0.006771, over 14155.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.08999, pruned_loss=0.01222, audio_tagging_loss=0.008543, over 3044054.59 frames. ], batch size: 53, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:10:43,063 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3827160.0, ans=0.125 2023-11-27 09:10:51,482 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3827226.6666666665, ans=0.1 2023-11-27 09:10:59,966 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3827293.3333333335, ans=0.2 2023-11-27 09:11:05,622 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 574100 2023-11-27 09:11:10,361 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.67 vs. limit=15.0 2023-11-27 09:11:17,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3827360.0, ans=0.125 2023-11-27 09:11:22,674 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3827426.6666666665, ans=0.025 2023-11-27 09:11:34,704 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 9000, loss[loss=0.06199, simple_loss=0.07979, pruned_loss=0.01338, audio_tagging_loss=0.008716, over 14196.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08966, pruned_loss=0.01218, audio_tagging_loss=0.008484, over 3047485.02 frames. ], batch size: 53, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:11:34,704 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-27 09:12:07,531 INFO [train_asr.py:1267] (3/4) Epoch 48, validation: loss=0.05893, simple_loss=0.05035, pruned_loss=0.005253, audio_tagging_loss=0.0285, over 4681554.00 frames. 2023-11-27 09:12:07,531 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-27 09:12:26,304 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3827560.0, ans=0.0 2023-11-27 09:12:30,478 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3827626.6666666665, ans=0.125 2023-11-27 09:12:34,556 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 574150 2023-11-27 09:12:40,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3827693.3333333335, ans=0.0 2023-11-27 09:12:40,852 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.883e+01 9.087e+01 9.718e+01 1.070e+02 1.602e+02, threshold=1.944e+02, percent-clipped=0.0 2023-11-27 09:12:43,711 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3827693.3333333335, ans=0.0 2023-11-27 09:13:03,711 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 9050, loss[loss=0.07089, simple_loss=0.1014, pruned_loss=0.01279, audio_tagging_loss=0.007397, over 14980.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.0894, pruned_loss=0.01203, audio_tagging_loss=0.008489, over 3044078.07 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:13:21,701 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.87 vs. limit=15.0 2023-11-27 09:13:30,182 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 574200 2023-11-27 09:13:30,309 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3827960.0, ans=0.125 2023-11-27 09:13:33,152 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.51 vs. limit=6.0 2023-11-27 09:13:34,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3827960.0, ans=0.0 2023-11-27 09:13:49,872 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.82 vs. limit=15.0 2023-11-27 09:13:59,483 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 9100, loss[loss=0.07574, simple_loss=0.1057, pruned_loss=0.01631, audio_tagging_loss=0.006583, over 15482.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.09052, pruned_loss=0.01221, audio_tagging_loss=0.008344, over 3050102.90 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:14:18,352 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3828226.6666666665, ans=0.2 2023-11-27 09:14:24,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3828293.3333333335, ans=0.0 2023-11-27 09:14:26,647 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 574250 2023-11-27 09:14:31,092 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3828293.3333333335, ans=0.0 2023-11-27 09:14:32,879 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.883e+01 9.030e+01 9.534e+01 1.010e+02 1.225e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-27 09:14:42,279 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3828360.0, ans=0.0 2023-11-27 09:14:46,650 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3828426.6666666665, ans=0.125 2023-11-27 09:14:47,677 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3828426.6666666665, ans=0.125 2023-11-27 09:14:55,584 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 9150, loss[loss=0.05142, simple_loss=0.0653, pruned_loss=0.006703, audio_tagging_loss=0.01206, over 14399.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.09073, pruned_loss=0.01217, audio_tagging_loss=0.00826, over 3054692.20 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:14:55,802 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3828493.3333333335, ans=0.0 2023-11-27 09:14:56,904 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3828493.3333333335, ans=0.125 2023-11-27 09:14:59,438 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.27 vs. limit=22.5 2023-11-27 09:15:02,790 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.29 vs. limit=15.0 2023-11-27 09:15:22,662 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 574300 2023-11-27 09:15:33,001 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3828693.3333333335, ans=0.1 2023-11-27 09:15:51,840 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 9200, loss[loss=0.07386, simple_loss=0.1035, pruned_loss=0.01467, audio_tagging_loss=0.007455, over 15228.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.09009, pruned_loss=0.012, audio_tagging_loss=0.008274, over 3054863.97 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:16:10,677 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3828893.3333333335, ans=0.0 2023-11-27 09:16:15,661 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3828960.0, ans=0.125 2023-11-27 09:16:16,609 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3828960.0, ans=0.0 2023-11-27 09:16:18,613 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 574350 2023-11-27 09:16:24,819 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.684e+01 9.042e+01 9.589e+01 1.020e+02 1.357e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-27 09:16:33,178 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.whiten.whitening_limit, batch_count=3829026.6666666665, ans=12.0 2023-11-27 09:16:34,517 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3829026.6666666665, ans=0.125 2023-11-27 09:16:36,103 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.60 vs. limit=15.0 2023-11-27 09:16:40,276 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.69 vs. limit=15.0 2023-11-27 09:16:47,642 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 9250, loss[loss=0.08869, simple_loss=0.1313, pruned_loss=0.01648, audio_tagging_loss=0.006554, over 14892.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.09048, pruned_loss=0.01215, audio_tagging_loss=0.008272, over 3054296.62 frames. ], batch size: 53, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:16:54,166 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3829160.0, ans=0.125 2023-11-27 09:17:11,740 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3829293.3333333335, ans=0.09899494936611666 2023-11-27 09:17:14,313 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 574400 2023-11-27 09:17:17,166 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3829293.3333333335, ans=0.5 2023-11-27 09:17:32,520 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3829426.6666666665, ans=0.1 2023-11-27 09:17:43,337 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 9300, loss[loss=0.08263, simple_loss=0.117, pruned_loss=0.01618, audio_tagging_loss=0.007972, over 16505.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.09016, pruned_loss=0.01202, audio_tagging_loss=0.008365, over 3064590.10 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:17:53,015 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3829560.0, ans=0.1 2023-11-27 09:17:55,917 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.22 vs. limit=15.0 2023-11-27 09:17:58,154 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.85 vs. limit=22.5 2023-11-27 09:18:09,926 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 574450 2023-11-27 09:18:16,653 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.064e+01 9.342e+01 9.838e+01 1.063e+02 1.386e+02, threshold=1.968e+02, percent-clipped=0.0 2023-11-27 09:18:21,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3829693.3333333335, ans=0.025 2023-11-27 09:18:28,088 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3829760.0, ans=0.125 2023-11-27 09:18:30,188 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3829760.0, ans=0.125 2023-11-27 09:18:38,919 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 9350, loss[loss=0.06485, simple_loss=0.08804, pruned_loss=0.01243, audio_tagging_loss=0.008401, over 15349.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.0898, pruned_loss=0.01196, audio_tagging_loss=0.00842, over 3055860.80 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:18:42,915 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3829826.6666666665, ans=0.1 2023-11-27 09:18:42,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3829826.6666666665, ans=0.5 2023-11-27 09:18:59,428 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3829893.3333333335, ans=0.125 2023-11-27 09:19:06,197 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 574500 2023-11-27 09:19:11,573 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3830026.6666666665, ans=0.0 2023-11-27 09:19:16,955 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3830026.6666666665, ans=0.125 2023-11-27 09:19:25,306 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3830093.3333333335, ans=0.2 2023-11-27 09:19:34,666 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 9400, loss[loss=0.05438, simple_loss=0.0694, pruned_loss=0.009434, audio_tagging_loss=0.01024, over 14273.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08963, pruned_loss=0.01195, audio_tagging_loss=0.008494, over 3053767.09 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:19:35,974 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3830160.0, ans=0.1 2023-11-27 09:19:43,961 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3830160.0, ans=0.0 2023-11-27 09:19:55,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3830226.6666666665, ans=0.125 2023-11-27 09:20:01,306 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 574550 2023-11-27 09:20:09,690 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.781e+01 8.905e+01 9.680e+01 1.031e+02 1.220e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-27 09:20:13,012 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 09:20:27,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3830426.6666666665, ans=0.125 2023-11-27 09:20:29,215 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 09:20:30,794 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 9450, loss[loss=0.05965, simple_loss=0.07908, pruned_loss=0.01034, audio_tagging_loss=0.009777, over 14567.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.08966, pruned_loss=0.01195, audio_tagging_loss=0.008594, over 3059443.87 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:20:33,045 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3830493.3333333335, ans=0.1 2023-11-27 09:20:45,379 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3830560.0, ans=0.1 2023-11-27 09:20:57,569 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 574600 2023-11-27 09:21:00,135 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3830626.6666666665, ans=0.0 2023-11-27 09:21:02,115 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3830626.6666666665, ans=0.125 2023-11-27 09:21:22,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3830760.0, ans=0.0 2023-11-27 09:21:26,372 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 9500, loss[loss=0.05938, simple_loss=0.08294, pruned_loss=0.01072, audio_tagging_loss=0.007186, over 14979.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.09032, pruned_loss=0.012, audio_tagging_loss=0.00858, over 3058782.65 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:21:50,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3830960.0, ans=0.2 2023-11-27 09:21:52,352 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.22 vs. limit=15.0 2023-11-27 09:21:52,666 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 574650 2023-11-27 09:22:00,261 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3831026.6666666665, ans=0.125 2023-11-27 09:22:01,093 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.673e+01 9.168e+01 9.748e+01 1.058e+02 1.599e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-27 09:22:06,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3831026.6666666665, ans=0.125 2023-11-27 09:22:20,224 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3831093.3333333335, ans=0.1 2023-11-27 09:22:21,324 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3831160.0, ans=0.1 2023-11-27 09:22:22,114 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 9550, loss[loss=0.04678, simple_loss=0.06774, pruned_loss=0.003567, audio_tagging_loss=0.009344, over 14672.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.09045, pruned_loss=0.01214, audio_tagging_loss=0.008668, over 3054539.52 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:22:33,980 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3831226.6666666665, ans=0.125 2023-11-27 09:22:36,532 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3831226.6666666665, ans=0.0 2023-11-27 09:22:40,290 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3831226.6666666665, ans=0.125 2023-11-27 09:22:48,530 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 574700 2023-11-27 09:22:48,701 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3831293.3333333335, ans=0.2 2023-11-27 09:23:15,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3831426.6666666665, ans=0.1 2023-11-27 09:23:16,911 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 9600, loss[loss=0.05378, simple_loss=0.07531, pruned_loss=0.006384, audio_tagging_loss=0.009744, over 12934.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.09011, pruned_loss=0.01207, audio_tagging_loss=0.008749, over 3047370.81 frames. ], batch size: 51, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:23:24,020 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3831493.3333333335, ans=0.2 2023-11-27 09:23:30,307 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3831560.0, ans=0.125 2023-11-27 09:23:39,949 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3831626.6666666665, ans=0.0 2023-11-27 09:23:41,432 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.62 vs. limit=10.0 2023-11-27 09:23:43,606 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.29 vs. limit=22.5 2023-11-27 09:23:44,156 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 574750 2023-11-27 09:23:51,667 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.142e+01 9.086e+01 9.692e+01 1.047e+02 1.227e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-27 09:24:04,652 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3831760.0, ans=0.0 2023-11-27 09:24:07,650 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3831760.0, ans=0.125 2023-11-27 09:24:12,912 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 9650, loss[loss=0.0631, simple_loss=0.07976, pruned_loss=0.01414, audio_tagging_loss=0.009073, over 15459.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.08964, pruned_loss=0.01204, audio_tagging_loss=0.008684, over 3044267.55 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:24:22,739 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3831826.6666666665, ans=0.1 2023-11-27 09:24:23,809 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3831893.3333333335, ans=0.125 2023-11-27 09:24:30,066 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3831893.3333333335, ans=0.125 2023-11-27 09:24:39,539 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 574800 2023-11-27 09:24:42,457 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.43 vs. limit=10.0 2023-11-27 09:24:45,866 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3832026.6666666665, ans=0.0 2023-11-27 09:24:52,734 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3832026.6666666665, ans=0.0 2023-11-27 09:25:09,694 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 9700, loss[loss=0.06915, simple_loss=0.1001, pruned_loss=0.00965, audio_tagging_loss=0.009478, over 15261.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.08887, pruned_loss=0.01186, audio_tagging_loss=0.008656, over 3041142.08 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:25:36,505 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 574850 2023-11-27 09:25:44,497 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.839e+01 9.066e+01 9.770e+01 1.059e+02 1.296e+02, threshold=1.954e+02, percent-clipped=0.0 2023-11-27 09:26:05,278 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 9750, loss[loss=0.04721, simple_loss=0.06613, pruned_loss=0.006883, audio_tagging_loss=0.007267, over 15631.00 frames. ], tot_loss[loss=0.06455, simple_loss=0.08814, pruned_loss=0.01188, audio_tagging_loss=0.008593, over 3039778.98 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:26:09,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3832493.3333333335, ans=0.125 2023-11-27 09:26:22,339 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3832560.0, ans=0.0 2023-11-27 09:26:30,889 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3832626.6666666665, ans=0.04949747468305833 2023-11-27 09:26:32,886 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 574900 2023-11-27 09:26:42,021 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.71 vs. limit=22.5 2023-11-27 09:26:50,037 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 09:26:53,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3832760.0, ans=0.125 2023-11-27 09:27:01,003 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 9800, loss[loss=0.05082, simple_loss=0.07263, pruned_loss=0.006595, audio_tagging_loss=0.007909, over 17378.00 frames. ], tot_loss[loss=0.06436, simple_loss=0.088, pruned_loss=0.01189, audio_tagging_loss=0.008465, over 3037326.04 frames. ], batch size: 66, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:27:20,899 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3832893.3333333335, ans=0.125 2023-11-27 09:27:20,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3832893.3333333335, ans=0.125 2023-11-27 09:27:23,991 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3832960.0, ans=0.125 2023-11-27 09:27:28,135 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 574950 2023-11-27 09:27:36,087 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.628e+01 9.044e+01 9.762e+01 1.048e+02 1.288e+02, threshold=1.952e+02, percent-clipped=0.0 2023-11-27 09:27:36,283 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3833026.6666666665, ans=0.0 2023-11-27 09:27:46,401 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3833093.3333333335, ans=0.125 2023-11-27 09:27:51,901 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 09:27:52,067 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3833093.3333333335, ans=0.125 2023-11-27 09:27:53,471 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.87 vs. limit=15.0 2023-11-27 09:27:57,774 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 9850, loss[loss=0.06277, simple_loss=0.09357, pruned_loss=0.009272, audio_tagging_loss=0.006715, over 15106.00 frames. ], tot_loss[loss=0.06448, simple_loss=0.08831, pruned_loss=0.01199, audio_tagging_loss=0.008339, over 3036347.19 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:28:01,252 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3833160.0, ans=0.125 2023-11-27 09:28:23,742 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 575000 2023-11-27 09:28:48,361 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3833426.6666666665, ans=0.04949747468305833 2023-11-27 09:28:48,629 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.30 vs. limit=6.0 2023-11-27 09:28:49,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3833426.6666666665, ans=0.0 2023-11-27 09:28:53,339 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 9900, loss[loss=0.05893, simple_loss=0.0812, pruned_loss=0.009826, audio_tagging_loss=0.008506, over 15167.00 frames. ], tot_loss[loss=0.0645, simple_loss=0.08863, pruned_loss=0.01194, audio_tagging_loss=0.008239, over 3037223.53 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:29:04,882 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3833560.0, ans=0.125 2023-11-27 09:29:05,939 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3833560.0, ans=0.125 2023-11-27 09:29:17,999 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.92 vs. limit=10.0 2023-11-27 09:29:21,180 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 575050 2023-11-27 09:29:22,364 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 09:29:28,626 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.548e+01 9.072e+01 9.748e+01 1.058e+02 2.513e+02, threshold=1.950e+02, percent-clipped=1.0 2023-11-27 09:29:28,850 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3833693.3333333335, ans=0.0 2023-11-27 09:29:36,147 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=3833693.3333333335, ans=0.025 2023-11-27 09:29:49,345 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 9950, loss[loss=0.08414, simple_loss=0.1243, pruned_loss=0.012, audio_tagging_loss=0.009989, over 16324.00 frames. ], tot_loss[loss=0.06453, simple_loss=0.08864, pruned_loss=0.01191, audio_tagging_loss=0.008298, over 3042439.26 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:30:15,995 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 575100 2023-11-27 09:30:34,698 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3834093.3333333335, ans=0.0 2023-11-27 09:30:45,627 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 10000, loss[loss=0.05166, simple_loss=0.0662, pruned_loss=0.007847, audio_tagging_loss=0.01072, over 14802.00 frames. ], tot_loss[loss=0.06415, simple_loss=0.08816, pruned_loss=0.01176, audio_tagging_loss=0.008307, over 3044804.47 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:30:55,470 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3834226.6666666665, ans=0.0 2023-11-27 09:31:02,087 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.03 vs. limit=15.0 2023-11-27 09:31:11,718 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 575150 2023-11-27 09:31:21,703 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.471e+01 9.035e+01 9.520e+01 1.022e+02 1.313e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-27 09:31:24,034 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3834360.0, ans=0.2 2023-11-27 09:31:26,621 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3834360.0, ans=0.0 2023-11-27 09:31:38,822 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3834426.6666666665, ans=0.0 2023-11-27 09:31:39,280 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.11 vs. limit=10.0 2023-11-27 09:31:40,771 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 10050, loss[loss=0.08236, simple_loss=0.1101, pruned_loss=0.01754, audio_tagging_loss=0.009786, over 16627.00 frames. ], tot_loss[loss=0.06455, simple_loss=0.08853, pruned_loss=0.01187, audio_tagging_loss=0.008415, over 3049362.92 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:32:02,808 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3834626.6666666665, ans=0.125 2023-11-27 09:32:07,302 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 575200 2023-11-27 09:32:17,579 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.98 vs. limit=15.0 2023-11-27 09:32:20,336 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3834693.3333333335, ans=0.0 2023-11-27 09:32:28,691 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3834760.0, ans=0.0 2023-11-27 09:32:36,495 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 10100, loss[loss=0.06363, simple_loss=0.08995, pruned_loss=0.01179, audio_tagging_loss=0.006868, over 15281.00 frames. ], tot_loss[loss=0.06486, simple_loss=0.08884, pruned_loss=0.01197, audio_tagging_loss=0.008473, over 3046902.81 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:32:36,757 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3834826.6666666665, ans=0.125 2023-11-27 09:33:01,335 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3834960.0, ans=0.125 2023-11-27 09:33:01,850 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.36 vs. limit=15.0 2023-11-27 09:33:03,234 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 575250 2023-11-27 09:33:03,453 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3834960.0, ans=0.125 2023-11-27 09:33:12,689 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.819e+01 8.969e+01 9.511e+01 1.051e+02 1.642e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-27 09:33:14,999 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.56 vs. limit=15.0 2023-11-27 09:33:17,633 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3835026.6666666665, ans=0.2 2023-11-27 09:33:21,664 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 09:33:21,819 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3835093.3333333335, ans=0.125 2023-11-27 09:33:25,370 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.73 vs. limit=15.0 2023-11-27 09:33:31,808 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 10150, loss[loss=0.05387, simple_loss=0.06785, pruned_loss=0.01084, audio_tagging_loss=0.009106, over 15182.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.08951, pruned_loss=0.01199, audio_tagging_loss=0.008558, over 3049025.73 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:33:33,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3835160.0, ans=0.125 2023-11-27 09:33:59,254 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 09:33:59,296 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 575300 2023-11-27 09:34:27,357 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3835493.3333333335, ans=0.2 2023-11-27 09:34:28,271 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 10200, loss[loss=0.08753, simple_loss=0.1197, pruned_loss=0.01889, audio_tagging_loss=0.008802, over 15689.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08961, pruned_loss=0.01205, audio_tagging_loss=0.008632, over 3050340.58 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:34:35,887 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3835493.3333333335, ans=0.0 2023-11-27 09:34:41,690 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3835560.0, ans=0.2 2023-11-27 09:34:48,952 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 09:34:49,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3835626.6666666665, ans=0.125 2023-11-27 09:34:52,415 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3835626.6666666665, ans=0.0 2023-11-27 09:34:54,852 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 575350 2023-11-27 09:34:59,877 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3835626.6666666665, ans=0.0 2023-11-27 09:35:05,780 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3835693.3333333335, ans=0.2 2023-11-27 09:35:06,658 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.675e+01 9.149e+01 9.728e+01 1.044e+02 1.552e+02, threshold=1.946e+02, percent-clipped=0.0 2023-11-27 09:35:14,312 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3835760.0, ans=0.125 2023-11-27 09:35:16,623 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.33 vs. limit=15.0 2023-11-27 09:35:20,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3835760.0, ans=0.0 2023-11-27 09:35:23,524 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.68 vs. limit=15.0 2023-11-27 09:35:24,065 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 10250, loss[loss=0.04602, simple_loss=0.05827, pruned_loss=0.009067, audio_tagging_loss=0.007821, over 17322.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.08966, pruned_loss=0.0121, audio_tagging_loss=0.008617, over 3049669.25 frames. ], batch size: 68, lr: 1.40e-03, grad_scale: 8.0 2023-11-27 09:35:39,832 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.80 vs. limit=15.0 2023-11-27 09:35:45,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3835960.0, ans=0.125 2023-11-27 09:35:51,169 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 575400 2023-11-27 09:35:55,736 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3835960.0, ans=0.0 2023-11-27 09:36:08,413 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.79 vs. limit=22.5 2023-11-27 09:36:09,206 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3836093.3333333335, ans=0.125 2023-11-27 09:36:19,990 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 10300, loss[loss=0.05806, simple_loss=0.07626, pruned_loss=0.008738, audio_tagging_loss=0.0112, over 14877.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08865, pruned_loss=0.01209, audio_tagging_loss=0.008716, over 3048846.65 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 8.0 2023-11-27 09:36:20,585 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.82 vs. limit=15.0 2023-11-27 09:36:34,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3836226.6666666665, ans=0.0 2023-11-27 09:36:39,071 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3836226.6666666665, ans=0.0 2023-11-27 09:36:46,933 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 575450 2023-11-27 09:36:57,292 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.710e+01 8.883e+01 9.810e+01 1.061e+02 1.854e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-27 09:37:10,406 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 09:37:16,036 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 10350, loss[loss=0.06675, simple_loss=0.08967, pruned_loss=0.01213, audio_tagging_loss=0.009777, over 15024.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.08974, pruned_loss=0.01214, audio_tagging_loss=0.008789, over 3051503.26 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 8.0 2023-11-27 09:37:23,763 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3836493.3333333335, ans=0.2 2023-11-27 09:37:42,141 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 575500 2023-11-27 09:38:00,827 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3836760.0, ans=0.125 2023-11-27 09:38:04,003 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3836760.0, ans=0.0 2023-11-27 09:38:11,097 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 10400, loss[loss=0.05204, simple_loss=0.07132, pruned_loss=0.005411, audio_tagging_loss=0.01097, over 14588.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08914, pruned_loss=0.0119, audio_tagging_loss=0.008904, over 3051822.66 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:38:38,219 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 575550 2023-11-27 09:38:49,402 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.735e+01 9.167e+01 9.734e+01 1.047e+02 2.020e+02, threshold=1.947e+02, percent-clipped=1.0 2023-11-27 09:38:51,988 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.49 vs. limit=15.0 2023-11-27 09:38:56,520 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3837093.3333333335, ans=0.125 2023-11-27 09:39:06,821 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 10450, loss[loss=0.05225, simple_loss=0.06917, pruned_loss=0.009543, audio_tagging_loss=0.008123, over 15224.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.08917, pruned_loss=0.01193, audio_tagging_loss=0.008862, over 3055788.90 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:39:09,742 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3837160.0, ans=0.125 2023-11-27 09:39:11,737 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3837160.0, ans=0.125 2023-11-27 09:39:13,090 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.69 vs. limit=6.0 2023-11-27 09:39:17,806 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.10 vs. limit=15.0 2023-11-27 09:39:18,899 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.85 vs. limit=22.5 2023-11-27 09:39:20,797 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3837226.6666666665, ans=0.1 2023-11-27 09:39:23,915 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3837226.6666666665, ans=0.04949747468305833 2023-11-27 09:39:27,856 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.23 vs. limit=15.0 2023-11-27 09:39:33,884 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 575600 2023-11-27 09:39:34,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3837293.3333333335, ans=0.125 2023-11-27 09:39:40,148 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3837360.0, ans=0.0 2023-11-27 09:39:48,692 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3837360.0, ans=0.0 2023-11-27 09:40:03,183 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 10500, loss[loss=0.05843, simple_loss=0.07805, pruned_loss=0.009677, audio_tagging_loss=0.009726, over 14632.00 frames. ], tot_loss[loss=0.06461, simple_loss=0.08832, pruned_loss=0.01177, audio_tagging_loss=0.008684, over 3053949.06 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:40:09,451 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3837493.3333333335, ans=0.2 2023-11-27 09:40:29,802 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 575650 2023-11-27 09:40:35,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3837693.3333333335, ans=0.125 2023-11-27 09:40:36,165 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.58 vs. limit=6.0 2023-11-27 09:40:41,389 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.578e+01 8.939e+01 9.707e+01 1.019e+02 1.510e+02, threshold=1.941e+02, percent-clipped=0.0 2023-11-27 09:40:44,747 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=3837693.3333333335, ans=0.025 2023-11-27 09:40:58,840 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 10550, loss[loss=0.05453, simple_loss=0.0704, pruned_loss=0.00866, audio_tagging_loss=0.01067, over 15851.00 frames. ], tot_loss[loss=0.06453, simple_loss=0.08832, pruned_loss=0.01176, audio_tagging_loss=0.008612, over 3048102.12 frames. ], batch size: 61, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:41:08,019 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3837826.6666666665, ans=0.125 2023-11-27 09:41:11,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3837893.3333333335, ans=0.125 2023-11-27 09:41:25,800 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 575700 2023-11-27 09:41:54,067 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 10600, loss[loss=0.06342, simple_loss=0.09556, pruned_loss=0.01051, audio_tagging_loss=0.00513, over 15820.00 frames. ], tot_loss[loss=0.06483, simple_loss=0.08892, pruned_loss=0.0119, audio_tagging_loss=0.008469, over 3045437.37 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:42:05,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3838226.6666666665, ans=0.125 2023-11-27 09:42:20,792 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 575750 2023-11-27 09:42:27,751 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3838360.0, ans=0.125 2023-11-27 09:42:31,645 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.836e+01 9.017e+01 9.530e+01 1.017e+02 1.584e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-27 09:42:36,271 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.28 vs. limit=10.0 2023-11-27 09:42:36,329 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.31 vs. limit=22.5 2023-11-27 09:42:49,581 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 10650, loss[loss=0.06894, simple_loss=0.09166, pruned_loss=0.0123, audio_tagging_loss=0.01081, over 13831.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.08866, pruned_loss=0.01199, audio_tagging_loss=0.008482, over 3041554.08 frames. ], batch size: 52, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:43:15,922 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 575800 2023-11-27 09:43:41,743 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.54 vs. limit=12.0 2023-11-27 09:43:44,271 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 10700, loss[loss=0.05633, simple_loss=0.07409, pruned_loss=0.01102, audio_tagging_loss=0.008263, over 15431.00 frames. ], tot_loss[loss=0.0647, simple_loss=0.08832, pruned_loss=0.01197, audio_tagging_loss=0.008569, over 3037786.64 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:44:09,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3838960.0, ans=0.0 2023-11-27 09:44:11,400 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 575850 2023-11-27 09:44:16,917 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3839026.6666666665, ans=0.125 2023-11-27 09:44:21,917 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.744e+01 9.303e+01 9.868e+01 1.046e+02 1.253e+02, threshold=1.974e+02, percent-clipped=0.0 2023-11-27 09:44:34,717 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3839093.3333333335, ans=0.07 2023-11-27 09:44:39,750 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 10750, loss[loss=0.05677, simple_loss=0.07476, pruned_loss=0.008426, audio_tagging_loss=0.01097, over 16126.00 frames. ], tot_loss[loss=0.0652, simple_loss=0.08907, pruned_loss=0.01206, audio_tagging_loss=0.008599, over 3044655.50 frames. ], batch size: 63, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:44:45,662 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3839160.0, ans=0.0 2023-11-27 09:44:51,366 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3839226.6666666665, ans=0.1 2023-11-27 09:44:54,794 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.76 vs. limit=22.5 2023-11-27 09:45:05,856 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 575900 2023-11-27 09:45:31,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3839426.6666666665, ans=0.2 2023-11-27 09:45:34,364 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 10800, loss[loss=0.06699, simple_loss=0.08523, pruned_loss=0.01496, audio_tagging_loss=0.009419, over 15373.00 frames. ], tot_loss[loss=0.06494, simple_loss=0.08878, pruned_loss=0.012, audio_tagging_loss=0.008555, over 3042879.58 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:46:00,515 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 575950 2023-11-27 09:46:03,757 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3839626.6666666665, ans=0.1 2023-11-27 09:46:08,953 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.75 vs. limit=15.0 2023-11-27 09:46:11,273 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.948e+01 8.958e+01 9.647e+01 1.051e+02 1.313e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-27 09:46:16,229 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3839693.3333333335, ans=0.125 2023-11-27 09:46:24,538 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3839760.0, ans=0.125 2023-11-27 09:46:28,739 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 10850, loss[loss=0.05017, simple_loss=0.06943, pruned_loss=0.007324, audio_tagging_loss=0.008133, over 15108.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.08849, pruned_loss=0.01199, audio_tagging_loss=0.008574, over 3044176.59 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:46:55,396 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 576000 2023-11-27 09:47:22,921 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 09:47:26,023 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 10900, loss[loss=0.06118, simple_loss=0.0797, pruned_loss=0.01073, audio_tagging_loss=0.0106, over 15196.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.08905, pruned_loss=0.01206, audio_tagging_loss=0.008519, over 3043636.56 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:47:28,387 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3840160.0, ans=0.0 2023-11-27 09:47:52,323 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 576050 2023-11-27 09:48:03,055 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.060e+01 9.173e+01 9.584e+01 1.016e+02 1.234e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-27 09:48:08,017 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3840360.0, ans=0.1 2023-11-27 09:48:18,984 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3840426.6666666665, ans=0.0 2023-11-27 09:48:21,470 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 10950, loss[loss=0.07688, simple_loss=0.1087, pruned_loss=0.01404, audio_tagging_loss=0.008477, over 14788.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08935, pruned_loss=0.01204, audio_tagging_loss=0.008505, over 3039498.74 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:48:23,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3840493.3333333335, ans=0.1 2023-11-27 09:48:27,955 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3840493.3333333335, ans=0.0 2023-11-27 09:48:47,721 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 576100 2023-11-27 09:48:49,833 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3840626.6666666665, ans=0.2 2023-11-27 09:48:59,701 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3840693.3333333335, ans=0.125 2023-11-27 09:49:04,589 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3840760.0, ans=0.125 2023-11-27 09:49:06,579 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3840760.0, ans=0.0 2023-11-27 09:49:06,670 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 09:49:09,780 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3840760.0, ans=0.125 2023-11-27 09:49:15,864 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 11000, loss[loss=0.05955, simple_loss=0.0749, pruned_loss=0.01055, audio_tagging_loss=0.01155, over 14940.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08967, pruned_loss=0.01195, audio_tagging_loss=0.008585, over 3048226.52 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:49:24,809 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 09:49:39,405 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 09:49:42,388 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 576150 2023-11-27 09:49:50,324 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3841026.6666666665, ans=0.125 2023-11-27 09:49:53,300 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.866e+01 8.909e+01 9.397e+01 1.014e+02 1.657e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-27 09:50:00,884 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3841093.3333333335, ans=0.125 2023-11-27 09:50:08,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3841093.3333333335, ans=0.125 2023-11-27 09:50:08,665 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3841093.3333333335, ans=0.0 2023-11-27 09:50:10,564 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 11050, loss[loss=0.04488, simple_loss=0.0608, pruned_loss=0.005654, audio_tagging_loss=0.008825, over 15332.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.08864, pruned_loss=0.01184, audio_tagging_loss=0.008644, over 3045766.62 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:50:23,145 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3841226.6666666665, ans=0.125 2023-11-27 09:50:32,093 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-27 09:50:36,208 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3841293.3333333335, ans=0.04949747468305833 2023-11-27 09:50:37,169 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 576200 2023-11-27 09:50:37,275 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3841293.3333333335, ans=0.04949747468305833 2023-11-27 09:50:40,805 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3841293.3333333335, ans=0.125 2023-11-27 09:51:05,944 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 11100, loss[loss=0.07964, simple_loss=0.1106, pruned_loss=0.01462, audio_tagging_loss=0.009733, over 15096.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08949, pruned_loss=0.01198, audio_tagging_loss=0.008689, over 3049267.68 frames. ], batch size: 53, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:51:32,091 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 576250 2023-11-27 09:51:44,484 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.822e+01 9.157e+01 9.860e+01 1.054e+02 1.486e+02, threshold=1.972e+02, percent-clipped=0.0 2023-11-27 09:51:49,204 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.27 vs. limit=15.0 2023-11-27 09:51:51,634 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3841760.0, ans=0.125 2023-11-27 09:52:00,848 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 11150, loss[loss=0.07829, simple_loss=0.1096, pruned_loss=0.01406, audio_tagging_loss=0.009452, over 15883.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08948, pruned_loss=0.01206, audio_tagging_loss=0.008782, over 3052669.32 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:52:01,060 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3841826.6666666665, ans=0.125 2023-11-27 09:52:06,313 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3841826.6666666665, ans=0.125 2023-11-27 09:52:07,265 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3841826.6666666665, ans=0.2 2023-11-27 09:52:27,409 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 576300 2023-11-27 09:52:37,799 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.77 vs. limit=22.5 2023-11-27 09:52:43,808 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3842093.3333333335, ans=0.0 2023-11-27 09:52:45,847 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3842093.3333333335, ans=0.125 2023-11-27 09:52:45,942 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3842093.3333333335, ans=0.125 2023-11-27 09:52:55,657 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 11200, loss[loss=0.06225, simple_loss=0.08631, pruned_loss=0.01006, audio_tagging_loss=0.009034, over 15380.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.08948, pruned_loss=0.01198, audio_tagging_loss=0.008824, over 3048846.51 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:52:58,976 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3842160.0, ans=0.1 2023-11-27 09:53:22,335 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 576350 2023-11-27 09:53:26,610 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=3842293.3333333335, ans=0.025 2023-11-27 09:53:29,860 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3842360.0, ans=0.0 2023-11-27 09:53:33,751 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.649e+01 9.019e+01 9.427e+01 1.023e+02 1.335e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-27 09:53:50,655 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 11250, loss[loss=0.07971, simple_loss=0.1101, pruned_loss=0.01647, audio_tagging_loss=0.008171, over 14769.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08943, pruned_loss=0.01199, audio_tagging_loss=0.008868, over 3049722.61 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 09:54:00,258 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.26 vs. limit=15.0 2023-11-27 09:54:04,946 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3842560.0, ans=0.125 2023-11-27 09:54:12,448 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3842626.6666666665, ans=0.2 2023-11-27 09:54:16,344 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 576400 2023-11-27 09:54:45,845 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 11300, loss[loss=0.0658, simple_loss=0.09502, pruned_loss=0.0111, audio_tagging_loss=0.007188, over 16681.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08886, pruned_loss=0.01188, audio_tagging_loss=0.00877, over 3053863.68 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:54:47,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3842826.6666666665, ans=0.1 2023-11-27 09:54:47,372 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=3842826.6666666665, ans=15.0 2023-11-27 09:54:55,681 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.78 vs. limit=15.0 2023-11-27 09:55:00,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3842893.3333333335, ans=0.2 2023-11-27 09:55:11,929 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 576450 2023-11-27 09:55:16,749 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3842960.0, ans=0.125 2023-11-27 09:55:19,357 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3843026.6666666665, ans=0.125 2023-11-27 09:55:23,554 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3843026.6666666665, ans=0.125 2023-11-27 09:55:25,398 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.239e+01 9.044e+01 9.676e+01 1.062e+02 1.427e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-27 09:55:40,069 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 11350, loss[loss=0.07123, simple_loss=0.09616, pruned_loss=0.01409, audio_tagging_loss=0.009065, over 15756.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08892, pruned_loss=0.01206, audio_tagging_loss=0.008662, over 3047129.70 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:55:46,198 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3843160.0, ans=0.0 2023-11-27 09:56:04,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3843293.3333333335, ans=0.125 2023-11-27 09:56:05,398 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3843293.3333333335, ans=0.125 2023-11-27 09:56:07,301 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 576500 2023-11-27 09:56:11,526 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3843293.3333333335, ans=0.0 2023-11-27 09:56:17,036 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.61 vs. limit=15.0 2023-11-27 09:56:35,307 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 11400, loss[loss=0.06161, simple_loss=0.07867, pruned_loss=0.01064, audio_tagging_loss=0.01164, over 15617.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08942, pruned_loss=0.01203, audio_tagging_loss=0.008545, over 3042010.50 frames. ], batch size: 62, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 09:56:58,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3843626.6666666665, ans=0.0 2023-11-27 09:57:01,493 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 576550 2023-11-27 09:57:09,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3843693.3333333335, ans=0.125 2023-11-27 09:57:15,548 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.588e+01 9.108e+01 9.687e+01 1.045e+02 1.288e+02, threshold=1.937e+02, percent-clipped=0.0 2023-11-27 09:57:30,711 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 11450, loss[loss=0.07367, simple_loss=0.1002, pruned_loss=0.01604, audio_tagging_loss=0.007525, over 15354.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.09059, pruned_loss=0.01223, audio_tagging_loss=0.008472, over 3049990.11 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 8.0 2023-11-27 09:57:39,347 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3843826.6666666665, ans=0.125 2023-11-27 09:57:40,334 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3843893.3333333335, ans=0.125 2023-11-27 09:57:49,153 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3843893.3333333335, ans=0.0 2023-11-27 09:57:56,245 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 576600 2023-11-27 09:58:08,721 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3844026.6666666665, ans=0.125 2023-11-27 09:58:21,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3844093.3333333335, ans=0.125 2023-11-27 09:58:25,203 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 11500, loss[loss=0.06666, simple_loss=0.09293, pruned_loss=0.01113, audio_tagging_loss=0.009059, over 15193.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.09021, pruned_loss=0.01217, audio_tagging_loss=0.008512, over 3049833.05 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 8.0 2023-11-27 09:58:25,473 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3844160.0, ans=0.0 2023-11-27 09:58:33,147 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3844160.0, ans=0.125 2023-11-27 09:58:52,293 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 576650 2023-11-27 09:58:59,962 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.19 vs. limit=15.0 2023-11-27 09:59:02,083 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.38 vs. limit=15.0 2023-11-27 09:59:05,732 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.980e+01 9.059e+01 9.598e+01 1.033e+02 1.422e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-27 09:59:19,831 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 11550, loss[loss=0.04698, simple_loss=0.0622, pruned_loss=0.006455, audio_tagging_loss=0.009422, over 14111.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08907, pruned_loss=0.01216, audio_tagging_loss=0.008589, over 3047704.47 frames. ], batch size: 53, lr: 1.40e-03, grad_scale: 8.0 2023-11-27 09:59:44,057 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.98 vs. limit=6.0 2023-11-27 09:59:46,701 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 576700 2023-11-27 09:59:54,492 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 10:00:01,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3844693.3333333335, ans=10.0 2023-11-27 10:00:05,971 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.51 vs. limit=10.0 2023-11-27 10:00:15,295 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 11600, loss[loss=0.06828, simple_loss=0.08932, pruned_loss=0.01345, audio_tagging_loss=0.01017, over 14799.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08954, pruned_loss=0.01203, audio_tagging_loss=0.008556, over 3045626.11 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 10:00:23,327 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3844826.6666666665, ans=0.125 2023-11-27 10:00:26,454 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3844893.3333333335, ans=0.1 2023-11-27 10:00:41,400 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 576750 2023-11-27 10:00:55,896 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.192e+01 9.083e+01 9.816e+01 1.051e+02 1.317e+02, threshold=1.963e+02, percent-clipped=0.0 2023-11-27 10:01:07,093 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3845093.3333333335, ans=0.125 2023-11-27 10:01:09,976 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 11650, loss[loss=0.06057, simple_loss=0.08813, pruned_loss=0.009439, audio_tagging_loss=0.007064, over 16204.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08991, pruned_loss=0.01212, audio_tagging_loss=0.008523, over 3047708.33 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 10:01:13,565 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.74 vs. limit=15.0 2023-11-27 10:01:31,127 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3845293.3333333335, ans=0.0 2023-11-27 10:01:36,729 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 576800 2023-11-27 10:01:40,309 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3845293.3333333335, ans=0.125 2023-11-27 10:01:54,626 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3845426.6666666665, ans=0.09899494936611666 2023-11-27 10:02:02,465 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3845426.6666666665, ans=0.0 2023-11-27 10:02:05,306 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 11700, loss[loss=0.04355, simple_loss=0.05694, pruned_loss=0.006894, audio_tagging_loss=0.008186, over 15950.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.08866, pruned_loss=0.01185, audio_tagging_loss=0.00863, over 3043687.15 frames. ], batch size: 61, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 10:02:13,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3845493.3333333335, ans=0.1 2023-11-27 10:02:13,339 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3845493.3333333335, ans=0.125 2023-11-27 10:02:26,966 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3845626.6666666665, ans=0.125 2023-11-27 10:02:31,936 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 576850 2023-11-27 10:02:38,352 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3845693.3333333335, ans=0.1 2023-11-27 10:02:45,919 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.976e+01 8.983e+01 9.560e+01 1.031e+02 1.339e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-27 10:02:49,413 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3845760.0, ans=0.1 2023-11-27 10:02:53,940 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3845760.0, ans=0.125 2023-11-27 10:03:00,670 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 11750, loss[loss=0.06227, simple_loss=0.08347, pruned_loss=0.01138, audio_tagging_loss=0.009153, over 15468.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.08847, pruned_loss=0.01183, audio_tagging_loss=0.008732, over 3051819.97 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 10:03:04,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3845826.6666666665, ans=0.0 2023-11-27 10:03:08,696 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3845826.6666666665, ans=0.0 2023-11-27 10:03:21,275 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3845960.0, ans=0.035 2023-11-27 10:03:26,916 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 576900 2023-11-27 10:03:45,859 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3846093.3333333335, ans=10.0 2023-11-27 10:03:55,766 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 11800, loss[loss=0.06437, simple_loss=0.08514, pruned_loss=0.01136, audio_tagging_loss=0.01044, over 15649.00 frames. ], tot_loss[loss=0.06457, simple_loss=0.08789, pruned_loss=0.01183, audio_tagging_loss=0.008801, over 3045655.87 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 10:03:59,042 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3846160.0, ans=0.0 2023-11-27 10:04:07,030 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3846226.6666666665, ans=0.2 2023-11-27 10:04:13,128 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3846226.6666666665, ans=0.0 2023-11-27 10:04:17,288 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.21 vs. limit=15.0 2023-11-27 10:04:18,302 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.17 vs. limit=12.0 2023-11-27 10:04:22,314 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 576950 2023-11-27 10:04:29,799 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3846360.0, ans=0.0 2023-11-27 10:04:32,908 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3846360.0, ans=0.1 2023-11-27 10:04:35,513 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3846360.0, ans=0.125 2023-11-27 10:04:36,323 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.535e+01 9.212e+01 9.788e+01 1.058e+02 1.368e+02, threshold=1.958e+02, percent-clipped=0.0 2023-11-27 10:04:42,814 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 10:04:49,623 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3846493.3333333335, ans=0.2 2023-11-27 10:04:50,442 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 11850, loss[loss=0.0743, simple_loss=0.1053, pruned_loss=0.01415, audio_tagging_loss=0.007484, over 15159.00 frames. ], tot_loss[loss=0.06455, simple_loss=0.08784, pruned_loss=0.01174, audio_tagging_loss=0.00889, over 3050445.21 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 10:05:01,604 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3846560.0, ans=0.125 2023-11-27 10:05:06,210 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3846560.0, ans=0.0 2023-11-27 10:05:15,574 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3846626.6666666665, ans=0.125 2023-11-27 10:05:17,046 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 577000 2023-11-27 10:05:21,736 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3846626.6666666665, ans=0.125 2023-11-27 10:05:26,129 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.30 vs. limit=10.0 2023-11-27 10:05:46,122 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 11900, loss[loss=0.06211, simple_loss=0.08115, pruned_loss=0.01225, audio_tagging_loss=0.009281, over 16468.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08854, pruned_loss=0.01179, audio_tagging_loss=0.008905, over 3055112.02 frames. ], batch size: 62, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 10:05:52,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3846826.6666666665, ans=0.0 2023-11-27 10:06:12,385 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 577050 2023-11-27 10:06:12,500 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3846960.0, ans=0.0 2023-11-27 10:06:17,745 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3847026.6666666665, ans=0.125 2023-11-27 10:06:26,331 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.851e+01 8.935e+01 9.684e+01 1.048e+02 1.256e+02, threshold=1.937e+02, percent-clipped=0.0 2023-11-27 10:06:28,201 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3847026.6666666665, ans=10.0 2023-11-27 10:06:32,306 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3847093.3333333335, ans=0.0 2023-11-27 10:06:32,434 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3847093.3333333335, ans=0.125 2023-11-27 10:06:32,577 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.78 vs. limit=22.5 2023-11-27 10:06:40,518 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 11950, loss[loss=0.06234, simple_loss=0.08122, pruned_loss=0.01052, audio_tagging_loss=0.01121, over 15407.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08861, pruned_loss=0.01195, audio_tagging_loss=0.009009, over 3055056.18 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 16.0 2023-11-27 10:06:50,860 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.62 vs. limit=22.5 2023-11-27 10:06:56,816 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.17 vs. limit=22.5 2023-11-27 10:07:00,640 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3847226.6666666665, ans=0.125 2023-11-27 10:07:07,312 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 577100 2023-11-27 10:07:10,276 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.90 vs. limit=22.5 2023-11-27 10:07:10,450 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.38 vs. limit=12.0 2023-11-27 10:07:12,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3847293.3333333335, ans=0.125 2023-11-27 10:07:18,252 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3847360.0, ans=0.0 2023-11-27 10:07:30,534 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.29 vs. limit=15.0 2023-11-27 10:07:34,049 INFO [train_asr.py:1235] (3/4) Epoch 48, batch 12000, loss[loss=0.06181, simple_loss=0.08076, pruned_loss=0.01034, audio_tagging_loss=0.01109, over 14242.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08866, pruned_loss=0.01192, audio_tagging_loss=0.008997, over 3053979.39 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 32.0 2023-11-27 10:07:34,050 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-27 10:07:55,509 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.0435, 2.9293, 2.8098, 2.7740, 3.3123, 3.3818, 3.2189, 3.6133], device='cuda:3') 2023-11-27 10:08:06,003 INFO [train_asr.py:1267] (3/4) Epoch 48, validation: loss=0.05797, simple_loss=0.05046, pruned_loss=0.005369, audio_tagging_loss=0.02737, over 4681554.00 frames. 2023-11-27 10:08:06,004 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-27 10:08:27,684 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3847626.6666666665, ans=0.125 2023-11-27 10:08:58,285 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 0, loss[loss=0.07208, simple_loss=0.08686, pruned_loss=0.01019, audio_tagging_loss=0.01847, over 16317.00 frames. ], tot_loss[loss=0.07208, simple_loss=0.08686, pruned_loss=0.01019, audio_tagging_loss=0.01847, over 16317.00 frames. ], batch size: 60, lr: 1.38e-03, grad_scale: 32.0 2023-11-27 10:08:58,286 INFO [train_asr.py:1258] (3/4) Computing validation loss 2023-11-27 10:09:12,918 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.3116, 4.8213, 5.1916, 4.5296], device='cuda:3') 2023-11-27 10:09:21,940 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.3359, 4.3142, 4.4834, 4.4758], device='cuda:3') 2023-11-27 10:09:25,731 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.8084, 4.9789, 5.0744, 4.9140], device='cuda:3') 2023-11-27 10:09:29,259 INFO [train_asr.py:1267] (3/4) Epoch 49, validation: loss=0.05781, simple_loss=0.05038, pruned_loss=0.005301, audio_tagging_loss=0.02732, over 4681554.00 frames. 2023-11-27 10:09:29,260 INFO [train_asr.py:1268] (3/4) Maximum memory allocated so far is 24894MB 2023-11-27 10:09:29,325 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 577150 2023-11-27 10:09:42,844 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.973e+01 9.407e+01 1.008e+02 1.108e+02 1.423e+02, threshold=2.015e+02, percent-clipped=0.0 2023-11-27 10:09:50,309 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3847786.6666666665, ans=0.2 2023-11-27 10:10:00,935 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.56 vs. limit=15.0 2023-11-27 10:10:07,026 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3847853.3333333335, ans=0.125 2023-11-27 10:10:23,728 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 50, loss[loss=0.05472, simple_loss=0.05773, pruned_loss=0.007066, audio_tagging_loss=0.01879, over 16321.00 frames. ], tot_loss[loss=0.07301, simple_loss=0.08928, pruned_loss=0.01189, audio_tagging_loss=0.01648, over 691103.02 frames. ], batch size: 63, lr: 1.38e-03, grad_scale: 32.0 2023-11-27 10:10:23,786 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 577200 2023-11-27 10:10:28,994 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3847986.6666666665, ans=0.1 2023-11-27 10:10:30,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3847986.6666666665, ans=0.125 2023-11-27 10:11:01,495 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.89 vs. limit=22.5 2023-11-27 10:11:16,487 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3848253.3333333335, ans=0.2 2023-11-27 10:11:19,480 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 100, loss[loss=0.05562, simple_loss=0.06775, pruned_loss=0.009032, audio_tagging_loss=0.01271, over 14596.00 frames. ], tot_loss[loss=0.07255, simple_loss=0.09011, pruned_loss=0.01182, audio_tagging_loss=0.01568, over 1211588.14 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 32.0 2023-11-27 10:11:19,541 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 577250 2023-11-27 10:11:21,731 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3848320.0, ans=0.125 2023-11-27 10:11:32,622 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.90 vs. limit=22.5 2023-11-27 10:11:34,070 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.548e+01 9.835e+01 1.039e+02 1.086e+02 1.551e+02, threshold=2.079e+02, percent-clipped=0.0 2023-11-27 10:11:44,790 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3848453.3333333335, ans=0.125 2023-11-27 10:11:56,210 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3848520.0, ans=0.0 2023-11-27 10:11:57,187 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3848520.0, ans=0.125 2023-11-27 10:12:12,220 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.28 vs. limit=12.0 2023-11-27 10:12:14,689 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 150, loss[loss=0.08326, simple_loss=0.116, pruned_loss=0.01578, audio_tagging_loss=0.009508, over 15637.00 frames. ], tot_loss[loss=0.07002, simple_loss=0.08853, pruned_loss=0.01169, audio_tagging_loss=0.01407, over 1617376.87 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:12:14,753 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 577300 2023-11-27 10:12:15,400 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.21 vs. limit=22.5 2023-11-27 10:12:20,215 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3848653.3333333335, ans=0.125 2023-11-27 10:12:52,837 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.88 vs. limit=15.0 2023-11-27 10:12:53,705 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3848853.3333333335, ans=0.125 2023-11-27 10:12:56,918 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3848853.3333333335, ans=0.1 2023-11-27 10:13:09,314 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 200, loss[loss=0.06219, simple_loss=0.08515, pruned_loss=0.009011, audio_tagging_loss=0.0106, over 16015.00 frames. ], tot_loss[loss=0.068, simple_loss=0.08827, pruned_loss=0.01141, audio_tagging_loss=0.01245, over 1933053.81 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:13:09,392 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 577350 2023-11-27 10:13:20,636 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3849053.3333333335, ans=0.125 2023-11-27 10:13:25,070 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.806e+01 9.137e+01 9.838e+01 1.045e+02 1.312e+02, threshold=1.968e+02, percent-clipped=0.0 2023-11-27 10:13:31,072 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3849120.0, ans=0.1 2023-11-27 10:13:38,905 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3849120.0, ans=0.125 2023-11-27 10:14:04,760 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 250, loss[loss=0.07571, simple_loss=0.104, pruned_loss=0.01478, audio_tagging_loss=0.008914, over 16244.00 frames. ], tot_loss[loss=0.06717, simple_loss=0.08833, pruned_loss=0.0116, audio_tagging_loss=0.01141, over 2182775.34 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:14:04,825 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 577400 2023-11-27 10:14:05,946 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3849320.0, ans=0.125 2023-11-27 10:14:13,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3849320.0, ans=0.125 2023-11-27 10:14:26,139 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3849453.3333333335, ans=0.1 2023-11-27 10:14:32,425 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3849453.3333333335, ans=0.05 2023-11-27 10:15:00,650 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 300, loss[loss=0.07856, simple_loss=0.1107, pruned_loss=0.01558, audio_tagging_loss=0.007627, over 14792.00 frames. ], tot_loss[loss=0.06776, simple_loss=0.0903, pruned_loss=0.01206, audio_tagging_loss=0.01055, over 2371774.92 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:15:00,715 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 577450 2023-11-27 10:15:05,316 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.15 vs. limit=12.0 2023-11-27 10:15:15,353 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.597e+01 9.252e+01 9.785e+01 1.052e+02 1.385e+02, threshold=1.957e+02, percent-clipped=0.0 2023-11-27 10:15:35,076 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3849853.3333333335, ans=0.125 2023-11-27 10:15:41,367 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.89 vs. limit=10.0 2023-11-27 10:15:55,341 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 350, loss[loss=0.0687, simple_loss=0.09901, pruned_loss=0.01197, audio_tagging_loss=0.007224, over 14795.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.08868, pruned_loss=0.01167, audio_tagging_loss=0.01019, over 2525613.38 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:15:55,403 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 577500 2023-11-27 10:16:09,072 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.whiten.whitening_limit, batch_count=3850053.3333333335, ans=12.0 2023-11-27 10:16:21,614 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3850120.0, ans=0.0 2023-11-27 10:16:27,055 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3850120.0, ans=0.1 2023-11-27 10:16:40,036 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.01 vs. limit=10.0 2023-11-27 10:16:40,239 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.93 vs. limit=6.0 2023-11-27 10:16:50,574 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 400, loss[loss=0.07117, simple_loss=0.104, pruned_loss=0.01307, audio_tagging_loss=0.006122, over 14230.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08889, pruned_loss=0.01159, audio_tagging_loss=0.009654, over 2638806.62 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:16:50,640 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 577550 2023-11-27 10:17:07,272 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.422e+01 8.990e+01 9.603e+01 1.040e+02 1.304e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-27 10:17:07,440 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3850386.6666666665, ans=0.125 2023-11-27 10:17:30,641 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3850520.0, ans=0.125 2023-11-27 10:17:41,609 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3850586.6666666665, ans=0.1 2023-11-27 10:17:46,064 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 450, loss[loss=0.06796, simple_loss=0.09507, pruned_loss=0.01278, audio_tagging_loss=0.007645, over 16614.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08889, pruned_loss=0.01171, audio_tagging_loss=0.009388, over 2729923.35 frames. ], batch size: 63, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:17:46,133 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 577600 2023-11-27 10:17:48,617 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3850653.3333333335, ans=0.2 2023-11-27 10:17:55,383 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.23 vs. limit=10.0 2023-11-27 10:17:58,122 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3850720.0, ans=0.125 2023-11-27 10:18:03,438 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3850720.0, ans=0.125 2023-11-27 10:18:06,538 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3850786.6666666665, ans=0.2 2023-11-27 10:18:08,630 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3850786.6666666665, ans=0.125 2023-11-27 10:18:40,539 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 500, loss[loss=0.06455, simple_loss=0.08026, pruned_loss=0.01452, audio_tagging_loss=0.009897, over 14779.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08905, pruned_loss=0.0118, audio_tagging_loss=0.009188, over 2801435.29 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:18:40,603 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 577650 2023-11-27 10:18:41,155 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.45 vs. limit=6.0 2023-11-27 10:18:49,090 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3850986.6666666665, ans=0.0 2023-11-27 10:18:53,208 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3851053.3333333335, ans=0.0 2023-11-27 10:18:56,633 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.792e+01 9.066e+01 9.728e+01 1.042e+02 1.279e+02, threshold=1.946e+02, percent-clipped=0.0 2023-11-27 10:19:05,928 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3851120.0, ans=0.0 2023-11-27 10:19:11,219 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3851120.0, ans=0.125 2023-11-27 10:19:11,621 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.11 vs. limit=22.5 2023-11-27 10:19:12,107 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3851120.0, ans=0.0 2023-11-27 10:19:32,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3851253.3333333335, ans=0.125 2023-11-27 10:19:35,128 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 550, loss[loss=0.07132, simple_loss=0.1041, pruned_loss=0.01411, audio_tagging_loss=0.005152, over 15399.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08859, pruned_loss=0.01175, audio_tagging_loss=0.009097, over 2856998.39 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:19:35,190 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 577700 2023-11-27 10:19:37,090 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3851320.0, ans=0.07 2023-11-27 10:19:39,131 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 10:20:07,824 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.45 vs. limit=15.0 2023-11-27 10:20:09,685 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3851520.0, ans=0.125 2023-11-27 10:20:12,820 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3851520.0, ans=0.125 2023-11-27 10:20:14,876 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3851520.0, ans=0.0 2023-11-27 10:20:30,422 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 600, loss[loss=0.05058, simple_loss=0.0699, pruned_loss=0.007075, audio_tagging_loss=0.008558, over 15008.00 frames. ], tot_loss[loss=0.06488, simple_loss=0.08841, pruned_loss=0.01167, audio_tagging_loss=0.009, over 2893563.63 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:20:30,481 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 577750 2023-11-27 10:20:34,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3851653.3333333335, ans=0.1 2023-11-27 10:20:37,010 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3851653.3333333335, ans=0.0 2023-11-27 10:20:42,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3851720.0, ans=0.07 2023-11-27 10:20:42,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3851720.0, ans=0.125 2023-11-27 10:20:43,171 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3851720.0, ans=0.125 2023-11-27 10:20:47,118 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.817e+01 9.017e+01 9.537e+01 1.031e+02 1.710e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-27 10:20:48,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3851720.0, ans=0.125 2023-11-27 10:20:57,876 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3851786.6666666665, ans=0.125 2023-11-27 10:21:15,776 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.00 vs. limit=8.0 2023-11-27 10:21:19,822 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3851920.0, ans=0.125 2023-11-27 10:21:25,964 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 650, loss[loss=0.063, simple_loss=0.08569, pruned_loss=0.01071, audio_tagging_loss=0.009453, over 15401.00 frames. ], tot_loss[loss=0.06503, simple_loss=0.08895, pruned_loss=0.01167, audio_tagging_loss=0.008888, over 2930201.80 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:21:26,029 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 577800 2023-11-27 10:21:29,702 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3851986.6666666665, ans=0.04949747468305833 2023-11-27 10:21:32,827 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 10:21:39,129 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3852053.3333333335, ans=0.2 2023-11-27 10:21:42,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3852053.3333333335, ans=0.125 2023-11-27 10:21:58,373 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3852186.6666666665, ans=0.015 2023-11-27 10:22:06,333 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3852186.6666666665, ans=0.09899494936611666 2023-11-27 10:22:15,708 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3852253.3333333335, ans=0.2 2023-11-27 10:22:20,664 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 700, loss[loss=0.06659, simple_loss=0.08993, pruned_loss=0.01058, audio_tagging_loss=0.01105, over 14874.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.0891, pruned_loss=0.01161, audio_tagging_loss=0.008882, over 2956243.44 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:22:20,729 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 577850 2023-11-27 10:22:37,745 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.943e+01 9.117e+01 9.739e+01 1.041e+02 1.243e+02, threshold=1.948e+02, percent-clipped=0.0 2023-11-27 10:22:41,198 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3852386.6666666665, ans=0.125 2023-11-27 10:22:46,980 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3852453.3333333335, ans=0.125 2023-11-27 10:22:54,251 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3852520.0, ans=0.0 2023-11-27 10:23:16,082 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 750, loss[loss=0.07839, simple_loss=0.1156, pruned_loss=0.0146, audio_tagging_loss=0.005982, over 15890.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.08936, pruned_loss=0.01178, audio_tagging_loss=0.008924, over 2982464.72 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:23:16,149 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 577900 2023-11-27 10:23:23,539 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.99 vs. limit=22.5 2023-11-27 10:23:31,771 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3852720.0, ans=0.125 2023-11-27 10:23:50,155 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3852853.3333333335, ans=0.125 2023-11-27 10:24:11,160 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 800, loss[loss=0.06049, simple_loss=0.08909, pruned_loss=0.006497, audio_tagging_loss=0.009451, over 15613.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08977, pruned_loss=0.01184, audio_tagging_loss=0.008922, over 2998242.17 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 32.0 2023-11-27 10:24:11,222 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 577950 2023-11-27 10:24:26,904 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.508e+01 9.085e+01 9.807e+01 1.032e+02 1.313e+02, threshold=1.961e+02, percent-clipped=0.0 2023-11-27 10:24:28,196 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3853053.3333333335, ans=0.0 2023-11-27 10:24:30,735 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3853053.3333333335, ans=0.2 2023-11-27 10:24:39,653 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3853120.0, ans=0.05 2023-11-27 10:25:05,440 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 850, loss[loss=0.06067, simple_loss=0.07256, pruned_loss=0.0146, audio_tagging_loss=0.009788, over 14919.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08866, pruned_loss=0.01183, audio_tagging_loss=0.008982, over 3007971.87 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 32.0 2023-11-27 10:25:05,505 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 578000 2023-11-27 10:25:14,708 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3853320.0, ans=0.2 2023-11-27 10:25:17,965 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3853386.6666666665, ans=0.125 2023-11-27 10:25:33,069 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.75 vs. limit=15.0 2023-11-27 10:26:01,084 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 900, loss[loss=0.0717, simple_loss=0.09725, pruned_loss=0.01438, audio_tagging_loss=0.008695, over 15177.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08915, pruned_loss=0.01181, audio_tagging_loss=0.00901, over 3025191.57 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:26:01,148 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 578050 2023-11-27 10:26:07,086 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3853653.3333333335, ans=0.2 2023-11-27 10:26:16,287 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.92 vs. limit=22.5 2023-11-27 10:26:19,315 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.811e+01 9.242e+01 9.846e+01 1.086e+02 1.686e+02, threshold=1.969e+02, percent-clipped=0.0 2023-11-27 10:26:42,710 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.59 vs. limit=10.0 2023-11-27 10:26:44,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3853920.0, ans=0.1 2023-11-27 10:26:49,783 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.33 vs. limit=10.0 2023-11-27 10:26:56,626 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 950, loss[loss=0.07247, simple_loss=0.08826, pruned_loss=0.01983, audio_tagging_loss=0.008508, over 14293.00 frames. ], tot_loss[loss=0.06484, simple_loss=0.08812, pruned_loss=0.01182, audio_tagging_loss=0.008961, over 3026431.84 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:26:56,694 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 578100 2023-11-27 10:26:59,552 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.47 vs. limit=5.0 2023-11-27 10:27:13,087 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3854053.3333333335, ans=0.2 2023-11-27 10:27:15,349 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 10:27:17,301 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3854120.0, ans=0.125 2023-11-27 10:27:19,897 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3854120.0, ans=0.125 2023-11-27 10:27:24,122 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3854120.0, ans=0.125 2023-11-27 10:27:29,811 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 10:27:44,606 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.46 vs. limit=5.0 2023-11-27 10:27:51,636 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 1000, loss[loss=0.05416, simple_loss=0.06867, pruned_loss=0.01189, audio_tagging_loss=0.007938, over 15217.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08908, pruned_loss=0.01188, audio_tagging_loss=0.008708, over 3035680.02 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:27:51,706 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 578150 2023-11-27 10:27:55,151 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3854320.0, ans=0.0 2023-11-27 10:28:09,320 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.477e+01 9.145e+01 9.757e+01 1.033e+02 1.378e+02, threshold=1.951e+02, percent-clipped=0.0 2023-11-27 10:28:15,064 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 10:28:18,671 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.29 vs. limit=15.0 2023-11-27 10:28:38,622 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3854586.6666666665, ans=0.125 2023-11-27 10:28:46,197 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 1050, loss[loss=0.06041, simple_loss=0.08405, pruned_loss=0.009882, audio_tagging_loss=0.008502, over 15615.00 frames. ], tot_loss[loss=0.06484, simple_loss=0.08904, pruned_loss=0.01181, audio_tagging_loss=0.008506, over 3042863.04 frames. ], batch size: 60, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:28:46,257 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 578200 2023-11-27 10:28:52,175 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3854653.3333333335, ans=0.1 2023-11-27 10:28:54,996 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3854653.3333333335, ans=0.1 2023-11-27 10:29:00,134 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3854720.0, ans=0.125 2023-11-27 10:29:06,783 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=3854720.0, ans=15.0 2023-11-27 10:29:06,873 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.14 vs. limit=12.0 2023-11-27 10:29:08,467 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3854786.6666666665, ans=0.1 2023-11-27 10:29:24,593 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.88 vs. limit=15.0 2023-11-27 10:29:28,426 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3854853.3333333335, ans=0.125 2023-11-27 10:29:41,679 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 1100, loss[loss=0.06203, simple_loss=0.09024, pruned_loss=0.009378, audio_tagging_loss=0.007529, over 15820.00 frames. ], tot_loss[loss=0.06399, simple_loss=0.08774, pruned_loss=0.01159, audio_tagging_loss=0.00853, over 3043517.60 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:29:41,742 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 578250 2023-11-27 10:29:43,855 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 10:29:58,449 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.23 vs. limit=15.0 2023-11-27 10:29:58,964 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.766e+01 8.982e+01 9.681e+01 1.049e+02 1.414e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-27 10:30:30,228 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3855253.3333333335, ans=0.125 2023-11-27 10:30:36,811 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 1150, loss[loss=0.06189, simple_loss=0.08343, pruned_loss=0.01292, audio_tagging_loss=0.00725, over 14929.00 frames. ], tot_loss[loss=0.06398, simple_loss=0.08772, pruned_loss=0.01152, audio_tagging_loss=0.008598, over 3043657.19 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:30:36,885 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 578300 2023-11-27 10:30:36,964 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3855320.0, ans=0.125 2023-11-27 10:31:14,994 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=3855520.0, ans=10.0 2023-11-27 10:31:31,609 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 1200, loss[loss=0.07399, simple_loss=0.1107, pruned_loss=0.01343, audio_tagging_loss=0.005227, over 16493.00 frames. ], tot_loss[loss=0.06395, simple_loss=0.08763, pruned_loss=0.01153, audio_tagging_loss=0.008607, over 3036662.30 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 32.0 2023-11-27 10:31:31,671 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 578350 2023-11-27 10:31:37,650 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3855653.3333333335, ans=0.2 2023-11-27 10:31:43,326 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3855720.0, ans=0.05 2023-11-27 10:31:48,017 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.66 vs. limit=22.5 2023-11-27 10:31:49,421 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.244e+01 9.092e+01 9.675e+01 1.031e+02 1.166e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-27 10:31:54,852 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3855786.6666666665, ans=0.0 2023-11-27 10:32:03,705 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3855853.3333333335, ans=0.1 2023-11-27 10:32:27,225 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 1250, loss[loss=0.05732, simple_loss=0.0719, pruned_loss=0.01203, audio_tagging_loss=0.009341, over 14100.00 frames. ], tot_loss[loss=0.06396, simple_loss=0.08781, pruned_loss=0.01155, audio_tagging_loss=0.008509, over 3038913.74 frames. ], batch size: 53, lr: 1.38e-03, grad_scale: 32.0 2023-11-27 10:32:27,285 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 578400 2023-11-27 10:32:29,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3855986.6666666665, ans=0.125 2023-11-27 10:32:48,270 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 10:32:53,286 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.98 vs. limit=6.0 2023-11-27 10:33:02,645 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3856186.6666666665, ans=0.125 2023-11-27 10:33:02,730 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3856186.6666666665, ans=0.0 2023-11-27 10:33:05,836 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3856186.6666666665, ans=0.0 2023-11-27 10:33:09,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3856186.6666666665, ans=0.05 2023-11-27 10:33:17,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3856253.3333333335, ans=0.125 2023-11-27 10:33:20,006 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3856253.3333333335, ans=0.125 2023-11-27 10:33:21,802 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 1300, loss[loss=0.0641, simple_loss=0.09221, pruned_loss=0.01137, audio_tagging_loss=0.006634, over 15920.00 frames. ], tot_loss[loss=0.06403, simple_loss=0.08788, pruned_loss=0.01156, audio_tagging_loss=0.008529, over 3037517.04 frames. ], batch size: 60, lr: 1.38e-03, grad_scale: 32.0 2023-11-27 10:33:21,867 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 578450 2023-11-27 10:33:26,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3856320.0, ans=0.1 2023-11-27 10:33:39,549 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.313e+01 9.062e+01 9.714e+01 1.030e+02 1.237e+02, threshold=1.943e+02, percent-clipped=0.0 2023-11-27 10:33:39,886 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3856386.6666666665, ans=0.125 2023-11-27 10:33:41,897 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3856386.6666666665, ans=0.125 2023-11-27 10:33:41,933 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3856386.6666666665, ans=0.0 2023-11-27 10:34:02,687 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3856520.0, ans=0.0 2023-11-27 10:34:09,484 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.27 vs. limit=6.0 2023-11-27 10:34:17,145 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 1350, loss[loss=0.05192, simple_loss=0.0671, pruned_loss=0.01089, audio_tagging_loss=0.007473, over 14064.00 frames. ], tot_loss[loss=0.06379, simple_loss=0.08762, pruned_loss=0.01148, audio_tagging_loss=0.008498, over 3043322.57 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 32.0 2023-11-27 10:34:17,208 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 578500 2023-11-27 10:34:23,023 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.70 vs. limit=15.0 2023-11-27 10:34:37,356 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 10:34:43,493 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3856786.6666666665, ans=0.125 2023-11-27 10:34:49,835 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3856853.3333333335, ans=0.125 2023-11-27 10:34:55,389 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 10:34:57,620 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3856853.3333333335, ans=0.125 2023-11-27 10:35:01,154 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3856920.0, ans=0.125 2023-11-27 10:35:12,563 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 1400, loss[loss=0.05958, simple_loss=0.07649, pruned_loss=0.01299, audio_tagging_loss=0.008344, over 15606.00 frames. ], tot_loss[loss=0.0642, simple_loss=0.08801, pruned_loss=0.01166, audio_tagging_loss=0.008531, over 3044400.22 frames. ], batch size: 61, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:35:12,624 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 578550 2023-11-27 10:35:30,482 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.412e+01 9.274e+01 9.843e+01 1.071e+02 1.343e+02, threshold=1.969e+02, percent-clipped=0.0 2023-11-27 10:35:31,785 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3857053.3333333335, ans=0.125 2023-11-27 10:35:44,473 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3857186.6666666665, ans=0.2 2023-11-27 10:35:58,149 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3857253.3333333335, ans=0.0 2023-11-27 10:36:07,263 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 1450, loss[loss=0.07269, simple_loss=0.09937, pruned_loss=0.01517, audio_tagging_loss=0.007835, over 15978.00 frames. ], tot_loss[loss=0.0646, simple_loss=0.08823, pruned_loss=0.01183, audio_tagging_loss=0.008648, over 3047748.73 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:36:07,328 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 578600 2023-11-27 10:36:11,464 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.41 vs. limit=15.0 2023-11-27 10:36:24,447 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3857386.6666666665, ans=0.1 2023-11-27 10:36:37,569 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.10 vs. limit=15.0 2023-11-27 10:36:48,590 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3857520.0, ans=0.0 2023-11-27 10:37:00,149 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3857586.6666666665, ans=0.125 2023-11-27 10:37:02,052 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 1500, loss[loss=0.0646, simple_loss=0.08957, pruned_loss=0.009248, audio_tagging_loss=0.01057, over 15118.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08881, pruned_loss=0.01187, audio_tagging_loss=0.008639, over 3045636.27 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:37:02,122 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 578650 2023-11-27 10:37:04,991 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3857653.3333333335, ans=0.2 2023-11-27 10:37:13,786 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3857720.0, ans=0.05 2023-11-27 10:37:21,435 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.975e+01 9.188e+01 9.715e+01 1.038e+02 1.214e+02, threshold=1.943e+02, percent-clipped=0.0 2023-11-27 10:37:52,336 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.68 vs. limit=10.0 2023-11-27 10:37:57,649 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 1550, loss[loss=0.07051, simple_loss=0.09441, pruned_loss=0.01453, audio_tagging_loss=0.008772, over 15733.00 frames. ], tot_loss[loss=0.06468, simple_loss=0.08829, pruned_loss=0.01171, audio_tagging_loss=0.008827, over 3049818.38 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:37:57,711 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 578700 2023-11-27 10:38:05,278 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=3857986.6666666665, ans=10.0 2023-11-27 10:38:13,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3858053.3333333335, ans=0.0 2023-11-27 10:38:32,596 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.69 vs. limit=6.0 2023-11-27 10:38:39,109 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.15 vs. limit=15.0 2023-11-27 10:38:52,550 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 1600, loss[loss=0.07323, simple_loss=0.1021, pruned_loss=0.01389, audio_tagging_loss=0.008303, over 15721.00 frames. ], tot_loss[loss=0.06493, simple_loss=0.08882, pruned_loss=0.01172, audio_tagging_loss=0.008805, over 3047396.16 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 32.0 2023-11-27 10:38:52,612 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 578750 2023-11-27 10:39:03,270 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3858386.6666666665, ans=0.07 2023-11-27 10:39:10,773 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.536e+01 9.050e+01 9.679e+01 1.052e+02 1.346e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-27 10:39:15,812 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3858453.3333333335, ans=0.1 2023-11-27 10:39:17,757 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3858453.3333333335, ans=0.0 2023-11-27 10:39:23,530 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.07 vs. limit=5.0 2023-11-27 10:39:27,030 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3858520.0, ans=0.125 2023-11-27 10:39:46,737 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 1650, loss[loss=0.07632, simple_loss=0.105, pruned_loss=0.01602, audio_tagging_loss=0.0078, over 15610.00 frames. ], tot_loss[loss=0.06442, simple_loss=0.08814, pruned_loss=0.0115, audio_tagging_loss=0.008849, over 3055664.65 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 32.0 2023-11-27 10:39:46,804 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 578800 2023-11-27 10:39:51,466 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.19 vs. limit=22.5 2023-11-27 10:40:05,433 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.46 vs. limit=6.0 2023-11-27 10:40:19,856 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3858853.3333333335, ans=0.125 2023-11-27 10:40:38,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3858920.0, ans=0.0 2023-11-27 10:40:43,182 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 1700, loss[loss=0.04821, simple_loss=0.05912, pruned_loss=0.006452, audio_tagging_loss=0.01219, over 14723.00 frames. ], tot_loss[loss=0.06408, simple_loss=0.08761, pruned_loss=0.01137, audio_tagging_loss=0.008913, over 3047150.20 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:40:43,247 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 578850 2023-11-27 10:40:59,597 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3859053.3333333335, ans=0.125 2023-11-27 10:41:02,524 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.708e+01 9.167e+01 9.822e+01 1.054e+02 1.344e+02, threshold=1.964e+02, percent-clipped=0.0 2023-11-27 10:41:06,970 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3859120.0, ans=0.125 2023-11-27 10:41:17,980 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3859186.6666666665, ans=0.1 2023-11-27 10:41:38,518 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 1750, loss[loss=0.07103, simple_loss=0.1079, pruned_loss=0.01252, audio_tagging_loss=0.004575, over 15408.00 frames. ], tot_loss[loss=0.06355, simple_loss=0.08685, pruned_loss=0.01123, audio_tagging_loss=0.008893, over 3049616.40 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:41:38,579 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 578900 2023-11-27 10:41:50,720 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.96 vs. limit=15.0 2023-11-27 10:41:53,470 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3859386.6666666665, ans=0.2 2023-11-27 10:42:17,119 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3859520.0, ans=0.125 2023-11-27 10:42:23,817 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.69 vs. limit=15.0 2023-11-27 10:42:32,882 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 1800, loss[loss=0.05372, simple_loss=0.07917, pruned_loss=0.007486, audio_tagging_loss=0.006647, over 15563.00 frames. ], tot_loss[loss=0.06368, simple_loss=0.08734, pruned_loss=0.0113, audio_tagging_loss=0.008716, over 3049609.08 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:42:32,947 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 578950 2023-11-27 10:42:36,875 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.26 vs. limit=10.0 2023-11-27 10:42:46,163 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3859720.0, ans=0.0 2023-11-27 10:42:53,803 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.101e+01 8.995e+01 9.639e+01 1.040e+02 1.222e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-27 10:42:54,365 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.04 vs. limit=10.0 2023-11-27 10:43:00,439 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3859786.6666666665, ans=0.2 2023-11-27 10:43:01,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3859786.6666666665, ans=0.125 2023-11-27 10:43:10,291 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.61 vs. limit=6.0 2023-11-27 10:43:15,193 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3859853.3333333335, ans=0.125 2023-11-27 10:43:25,185 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3859920.0, ans=0.0 2023-11-27 10:43:28,236 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 1850, loss[loss=0.06136, simple_loss=0.07553, pruned_loss=0.009957, audio_tagging_loss=0.01364, over 14459.00 frames. ], tot_loss[loss=0.06415, simple_loss=0.08808, pruned_loss=0.0115, audio_tagging_loss=0.008611, over 3051911.10 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:43:28,295 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 579000 2023-11-27 10:43:42,533 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3860053.3333333335, ans=0.125 2023-11-27 10:43:45,067 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.04 vs. limit=12.0 2023-11-27 10:43:47,818 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3860053.3333333335, ans=0.1 2023-11-27 10:43:53,069 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3860120.0, ans=0.125 2023-11-27 10:44:23,733 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 1900, loss[loss=0.0721, simple_loss=0.09295, pruned_loss=0.01931, audio_tagging_loss=0.00632, over 14508.00 frames. ], tot_loss[loss=0.0634, simple_loss=0.08681, pruned_loss=0.01142, audio_tagging_loss=0.008575, over 3048458.05 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 8.0 2023-11-27 10:44:23,796 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 579050 2023-11-27 10:44:24,991 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3860320.0, ans=0.0 2023-11-27 10:44:32,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3860320.0, ans=0.125 2023-11-27 10:44:44,343 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.344e+01 9.131e+01 9.734e+01 1.046e+02 1.295e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-27 10:44:45,825 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3860453.3333333335, ans=0.0 2023-11-27 10:44:54,634 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3860453.3333333335, ans=0.0 2023-11-27 10:45:09,388 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3860586.6666666665, ans=0.125 2023-11-27 10:45:18,709 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 1950, loss[loss=0.06873, simple_loss=0.0928, pruned_loss=0.01173, audio_tagging_loss=0.01061, over 15853.00 frames. ], tot_loss[loss=0.06342, simple_loss=0.08648, pruned_loss=0.01155, audio_tagging_loss=0.008635, over 3049009.90 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 8.0 2023-11-27 10:45:18,775 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 579100 2023-11-27 10:45:43,106 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3860786.6666666665, ans=0.125 2023-11-27 10:45:49,300 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3860786.6666666665, ans=0.125 2023-11-27 10:45:50,435 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3860786.6666666665, ans=0.125 2023-11-27 10:45:57,642 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 10:46:13,736 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 2000, loss[loss=0.05496, simple_loss=0.07812, pruned_loss=0.006831, audio_tagging_loss=0.009069, over 13989.00 frames. ], tot_loss[loss=0.06368, simple_loss=0.08685, pruned_loss=0.01156, audio_tagging_loss=0.008698, over 3043701.71 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:46:13,802 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 579150 2023-11-27 10:46:35,550 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.005e+01 8.839e+01 9.475e+01 1.022e+02 1.680e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-27 10:46:56,638 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.08 vs. limit=15.0 2023-11-27 10:46:59,364 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=3861253.3333333335, ans=0.05 2023-11-27 10:47:06,749 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3861253.3333333335, ans=0.0 2023-11-27 10:47:10,264 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 2050, loss[loss=0.07094, simple_loss=0.09836, pruned_loss=0.01423, audio_tagging_loss=0.007529, over 15310.00 frames. ], tot_loss[loss=0.06392, simple_loss=0.08762, pruned_loss=0.01156, audio_tagging_loss=0.008552, over 3050282.81 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:47:10,325 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 579200 2023-11-27 10:47:20,286 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.92 vs. limit=12.0 2023-11-27 10:47:22,476 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.78 vs. limit=15.0 2023-11-27 10:47:23,306 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3861386.6666666665, ans=0.125 2023-11-27 10:47:30,687 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3861453.3333333335, ans=0.125 2023-11-27 10:47:46,564 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3861520.0, ans=0.95 2023-11-27 10:47:50,999 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.68 vs. limit=6.0 2023-11-27 10:47:56,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3861586.6666666665, ans=0.0 2023-11-27 10:48:07,634 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 2100, loss[loss=0.08937, simple_loss=0.1291, pruned_loss=0.01701, audio_tagging_loss=0.007793, over 15963.00 frames. ], tot_loss[loss=0.06426, simple_loss=0.08829, pruned_loss=0.01159, audio_tagging_loss=0.008522, over 3053269.52 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:48:07,711 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 579250 2023-11-27 10:48:23,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3861720.0, ans=0.125 2023-11-27 10:48:25,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3861720.0, ans=0.1 2023-11-27 10:48:28,804 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.817e+01 8.950e+01 9.629e+01 1.055e+02 1.441e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-27 10:48:29,102 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3861786.6666666665, ans=0.2 2023-11-27 10:48:45,218 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.48 vs. limit=22.5 2023-11-27 10:48:46,879 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3861853.3333333335, ans=0.0 2023-11-27 10:48:48,176 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2023-11-27 10:48:53,388 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3861920.0, ans=0.1 2023-11-27 10:48:56,515 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3861920.0, ans=0.125 2023-11-27 10:49:00,197 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3861920.0, ans=0.2 2023-11-27 10:49:03,227 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 2150, loss[loss=0.06685, simple_loss=0.09741, pruned_loss=0.01167, audio_tagging_loss=0.00648, over 15356.00 frames. ], tot_loss[loss=0.06443, simple_loss=0.08848, pruned_loss=0.01165, audio_tagging_loss=0.008539, over 3048726.19 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:49:03,299 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 579300 2023-11-27 10:49:14,114 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3862053.3333333335, ans=0.0 2023-11-27 10:49:30,435 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3862120.0, ans=0.0 2023-11-27 10:49:32,533 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3862120.0, ans=0.1 2023-11-27 10:49:35,560 WARNING [train_asr.py:1481] (3/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 10:49:50,436 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.44 vs. limit=22.5 2023-11-27 10:49:55,173 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3862253.3333333335, ans=0.09899494936611666 2023-11-27 10:49:59,858 INFO [train_asr.py:1235] (3/4) Epoch 49, batch 2200, loss[loss=0.07278, simple_loss=0.1024, pruned_loss=0.01378, audio_tagging_loss=0.007787, over 15003.00 frames. ], tot_loss[loss=0.06401, simple_loss=0.08773, pruned_loss=0.01154, audio_tagging_loss=0.0086, over 3041522.43 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-27 10:49:59,978 INFO [model.py:807] (3/4) Freeze_encoder: False; Current batch idx: 579350