2023-11-19 16:28:29,701 INFO [train_asr.py:1330] (1/4) Training started 2023-11-19 16:28:29,702 INFO [train_asr.py:1340] (1/4) Device: cuda:1 2023-11-19 16:28:29,705 INFO [train_asr.py:1352] (1/4) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '2b2ac14b326d61d79d04e53fbd69b1ff6d630411', 'k2-git-date': 'Thu Aug 24 05:58:26 2023', 'lhotse-version': '1.16.0', 'torch-version': '2.0.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.1', 'icefall-git-branch': 'multi_KD', 'icefall-git-sha1': 'ae3d64ff-dirty', 'icefall-git-date': 'Sun Nov 19 00:54:09 2023', 'icefall-path': '/star-xy/softwares/icefall_development/icefall_multi_KD', 'k2-path': '/star-xy/softwares/k2_development/k2/k2/python/k2/__init__.py', 'lhotse-path': '/star-xy/softwares/anaconda3/envs/multi_KD/lib/python3.10/site-packages/lhotse/__init__.py', 'hostname': 'de-74279-k2-train-6-0423201309-7c68fd68fb-qfn6b', 'IP address': '10.177.58.19'}, 'world_size': 4, 'master_port': 13490, 'tensorboard': True, 'num_epochs': 40, 'start_epoch': 10, 'start_batch': 0, 'exp_dir': PosixPath('multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0'), 'bpe_model': 'data/lang_bpe_500/bpe.model', 'base_lr': 0.045, 'lr_batches': 7500, 'lr_epochs': 3.5, 'ref_duration': 600, 'context_size': 2, 'prune_range': 5, 'lm_scale': 0.25, 'am_scale': 0.0, 'simple_loss_scale': 0.5, 'ctc_loss_scale': 0.2, 'audio_tagging_loss_scale': 1.0, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'use_fp16': True, 'do_finetune': False, 'init_modules': None, 'freeze_modules': None, 'finetune_ckpt': None, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': False, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'use_transducer': True, 'use_ctc': False, 'do_audio_tagging': True, 'use_encoder_projection': False, 'encoder_projection_dim': -1, 'freeze_encoder': False, 'freezing_encoder_layer_index': '-1', 'freeze_encoder_steps': -1, 'encoder_lr_scale': 1.0, 'full_libri': True, 'mini_libri': False, 'use_vox2': False, 'use_libriheavy': False, 'libriheavy_subset': 'small', 'use_audioset': True, 'audioset_subset': 'unbalanced', 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 1000, 'bucketing_sampler': False, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'enable_audioset': False, 'use_musan_separately': False, 'input_strategy': 'PrecomputedFeatures', 'drop_features': False, 'return_audio': False, 'use_beats': True, 'use_ecapa': True, 'use_whisper': True, 'whisper_mvq': False, 'beats_ckpt': 'data/models/BEATs/BEATs_iter3_plus_AS2M_finetuned_on_AS2M_cpt2.pt', 'whisper_version': 'small.en', 'blank_id': 0, 'vocab_size': 500} 2023-11-19 16:28:29,705 INFO [train_asr.py:1361] (1/4) About to create model 2023-11-19 16:28:30,813 INFO [train_asr.py:1365] (1/4) Number of model parameters: 65819362 2023-11-19 16:28:30,814 INFO [checkpoint.py:112] (1/4) Loading checkpoint from multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/epoch-9.pt 2023-11-19 16:28:34,233 INFO [checkpoint.py:112] (1/4) Loading checkpoint from multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/epoch-9.pt 2023-11-19 16:28:36,505 INFO [train_asr.py:1396] (1/4) Setting the lr scale of parameters in encoder and encoder_embed to 1.0 2023-11-19 16:28:40,455 INFO [train_asr.py:1405] (1/4) Using DDP 2023-11-19 16:28:40,987 INFO [train_asr.py:1428] (1/4) Loading optimizer state dict 2023-11-19 16:28:41,860 INFO [train_asr.py:1436] (1/4) Loading scheduler state dict 2023-11-19 16:28:41,863 INFO [train_asr.py:1458] (1/4) Getting audioset cuts 2023-11-19 16:28:41,863 INFO [kd_datamodule.py:796] (1/4) About to get the audioset cuts. 2023-11-19 16:28:41,866 INFO [train_asr.py:1464] (1/4) Using mux to combine Librispeech with audioset 2023-11-19 16:28:41,866 INFO [train_asr.py:1474] (1/4) CutSet(len=2748469) [underlying data type: ] 2023-11-19 16:28:57,491 INFO [kd_datamodule.py:396] (1/4) Enable MUSAN 2023-11-19 16:28:57,491 INFO [kd_datamodule.py:397] (1/4) About to get Musan cuts 2023-11-19 16:29:01,085 INFO [kd_datamodule.py:427] (1/4) Enable SpecAugment 2023-11-19 16:29:01,085 INFO [kd_datamodule.py:428] (1/4) Time warp factor: 80 2023-11-19 16:29:01,086 INFO [kd_datamodule.py:438] (1/4) Num frame mask: 10 2023-11-19 16:29:01,086 INFO [kd_datamodule.py:451] (1/4) About to create train dataset 2023-11-19 16:29:01,088 INFO [kd_datamodule.py:487] (1/4) Using SimpleCutSampler 2023-11-19 16:29:01,088 INFO [kd_datamodule.py:495] (1/4) About to create train dataloader 2023-11-19 16:29:01,092 INFO [kd_datamodule.py:814] (1/4) About to get the audioset eval cuts. 2023-11-19 16:29:01,093 INFO [train_asr.py:1538] (1/4) CutSet(len=20681) [underlying data type: ] 2023-11-19 16:29:01,184 INFO [kd_datamodule.py:529] (1/4) About to create dev dataset 2023-11-19 16:29:01,960 INFO [kd_datamodule.py:550] (1/4) About to create dev dataloader 2023-11-19 16:29:01,961 INFO [train_asr.py:1552] (1/4) Loading grad scaler state dict 2023-11-19 16:29:40,850 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 0, loss[loss=0.1042, simple_loss=0.1044, pruned_loss=0.02427, audio_tagging_loss=0.02778, over 16514.00 frames. ], tot_loss[loss=0.1042, simple_loss=0.1044, pruned_loss=0.02427, audio_tagging_loss=0.02778, over 16514.00 frames. ], batch size: 62, lr: 7.12e-03, grad_scale: 32.0 2023-11-19 16:29:40,851 INFO [train_asr.py:1285] (1/4) Computing validation loss 2023-11-19 16:30:13,748 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.6604, 3.5153, 3.7115, 3.4137], device='cuda:1') 2023-11-19 16:30:18,309 INFO [train_asr.py:1294] (1/4) Epoch 10, validation: loss=0.06458, simple_loss=0.05578, pruned_loss=0.006608, audio_tagging_loss=0.03008, over 4681554.00 frames. 2023-11-19 16:30:18,310 INFO [train_asr.py:1295] (1/4) Maximum memory allocated so far is 26082MB 2023-11-19 16:30:20,233 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.60 vs. limit=15.0 2023-11-19 16:30:20,281 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.03 vs. limit=22.5 2023-11-19 16:30:23,104 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.80 vs. limit=22.5 2023-11-19 16:30:27,516 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.852e+01 8.461e+01 9.125e+01 9.697e+01 1.516e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-19 16:30:44,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=721466.6666666666, ans=0.125 2023-11-19 16:31:00,701 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.21 vs. limit=15.0 2023-11-19 16:31:03,033 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.43 vs. limit=22.5 2023-11-19 16:31:08,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=721600.0, ans=0.09899494936611666 2023-11-19 16:31:09,999 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 108250 2023-11-19 16:31:16,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=721666.6666666666, ans=0.0 2023-11-19 16:31:19,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=721666.6666666666, ans=0.125 2023-11-19 16:31:25,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=721733.3333333334, ans=0.125 2023-11-19 16:31:26,667 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 50, loss[loss=0.08758, simple_loss=0.09, pruned_loss=0.02099, audio_tagging_loss=0.02158, over 14808.00 frames. ], tot_loss[loss=0.09978, simple_loss=0.1096, pruned_loss=0.02456, audio_tagging_loss=0.02044, over 686944.35 frames. ], batch size: 57, lr: 7.12e-03, grad_scale: 32.0 2023-11-19 16:31:48,883 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.47 vs. limit=5.0 2023-11-19 16:31:54,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=721866.6666666666, ans=0.0 2023-11-19 16:32:12,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=721933.3333333334, ans=0.125 2023-11-19 16:32:16,145 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 108300 2023-11-19 16:32:18,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=722000.0, ans=0.125 2023-11-19 16:32:31,692 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 100, loss[loss=0.1029, simple_loss=0.1147, pruned_loss=0.02568, audio_tagging_loss=0.01987, over 14528.00 frames. ], tot_loss[loss=0.09684, simple_loss=0.1059, pruned_loss=0.02399, audio_tagging_loss=0.0199, over 1215781.95 frames. ], batch size: 54, lr: 7.12e-03, grad_scale: 32.0 2023-11-19 16:32:40,246 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.545e+01 8.782e+01 9.608e+01 1.042e+02 1.365e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-19 16:32:46,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=722133.3333333334, ans=10.0 2023-11-19 16:33:06,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff3.min_abs, batch_count=722200.0, ans=0.2 2023-11-19 16:33:18,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=722266.6666666666, ans=0.0 2023-11-19 16:33:20,372 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 108350 2023-11-19 16:33:35,188 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 150, loss[loss=0.103, simple_loss=0.1198, pruned_loss=0.02766, audio_tagging_loss=0.01539, over 17017.00 frames. ], tot_loss[loss=0.09454, simple_loss=0.1064, pruned_loss=0.02382, audio_tagging_loss=0.01753, over 1623415.95 frames. ], batch size: 62, lr: 7.12e-03, grad_scale: 32.0 2023-11-19 16:33:35,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=722400.0, ans=0.0 2023-11-19 16:34:09,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=722533.3333333334, ans=0.0 2023-11-19 16:34:13,608 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 16:34:24,515 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 108400 2023-11-19 16:34:27,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=722666.6666666666, ans=0.1 2023-11-19 16:34:27,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=722666.6666666666, ans=0.0 2023-11-19 16:34:39,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=722733.3333333334, ans=0.1 2023-11-19 16:34:40,686 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 200, loss[loss=0.1035, simple_loss=0.1256, pruned_loss=0.0293, audio_tagging_loss=0.01136, over 14823.00 frames. ], tot_loss[loss=0.09301, simple_loss=0.1073, pruned_loss=0.02399, audio_tagging_loss=0.01538, over 1940399.90 frames. ], batch size: 54, lr: 7.12e-03, grad_scale: 16.0 2023-11-19 16:34:45,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=722733.3333333334, ans=0.0 2023-11-19 16:34:51,739 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.543e+01 8.378e+01 9.256e+01 1.031e+02 1.304e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-19 16:35:17,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=722933.3333333334, ans=0.0 2023-11-19 16:35:19,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=722933.3333333334, ans=0.1 2023-11-19 16:35:24,591 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.72 vs. limit=22.5 2023-11-19 16:35:25,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=722933.3333333334, ans=0.04949747468305833 2023-11-19 16:35:26,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=722933.3333333334, ans=0.1 2023-11-19 16:35:29,469 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 108450 2023-11-19 16:35:38,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=723000.0, ans=0.0 2023-11-19 16:35:39,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=723000.0, ans=0.0 2023-11-19 16:35:45,239 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 250, loss[loss=0.09227, simple_loss=0.1114, pruned_loss=0.02638, audio_tagging_loss=0.01016, over 15099.00 frames. ], tot_loss[loss=0.09246, simple_loss=0.109, pruned_loss=0.02422, audio_tagging_loss=0.01375, over 2185201.65 frames. ], batch size: 58, lr: 7.12e-03, grad_scale: 16.0 2023-11-19 16:35:55,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=723066.6666666666, ans=0.1 2023-11-19 16:35:55,406 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.51 vs. limit=15.0 2023-11-19 16:36:00,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=723133.3333333334, ans=0.1 2023-11-19 16:36:05,465 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.41 vs. limit=12.0 2023-11-19 16:36:10,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=723200.0, ans=0.0 2023-11-19 16:36:11,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=723200.0, ans=0.2 2023-11-19 16:36:19,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=723200.0, ans=0.0 2023-11-19 16:36:33,697 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 108500 2023-11-19 16:36:37,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=723333.3333333334, ans=0.2 2023-11-19 16:36:45,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=723333.3333333334, ans=0.0 2023-11-19 16:36:48,250 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 300, loss[loss=0.09509, simple_loss=0.1241, pruned_loss=0.02225, audio_tagging_loss=0.01079, over 15292.00 frames. ], tot_loss[loss=0.09104, simple_loss=0.1086, pruned_loss=0.02408, audio_tagging_loss=0.01268, over 2370340.15 frames. ], batch size: 54, lr: 7.11e-03, grad_scale: 16.0 2023-11-19 16:36:58,049 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.707e+01 8.557e+01 9.217e+01 9.967e+01 1.431e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-19 16:37:06,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=723466.6666666666, ans=0.0 2023-11-19 16:37:11,376 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.24 vs. limit=15.0 2023-11-19 16:37:33,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=723600.0, ans=0.125 2023-11-19 16:37:37,033 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 108550 2023-11-19 16:37:40,243 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.12 vs. limit=15.0 2023-11-19 16:37:51,884 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 350, loss[loss=0.09478, simple_loss=0.1168, pruned_loss=0.02864, audio_tagging_loss=0.007723, over 16527.00 frames. ], tot_loss[loss=0.08975, simple_loss=0.1078, pruned_loss=0.02381, audio_tagging_loss=0.01206, over 2517279.15 frames. ], batch size: 63, lr: 7.11e-03, grad_scale: 16.0 2023-11-19 16:38:13,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=723800.0, ans=0.0 2023-11-19 16:38:24,863 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.66 vs. limit=15.0 2023-11-19 16:38:27,293 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.12 vs. limit=15.0 2023-11-19 16:38:40,112 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 108600 2023-11-19 16:38:53,543 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=724000.0, ans=0.125 2023-11-19 16:38:55,845 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 400, loss[loss=0.08741, simple_loss=0.1075, pruned_loss=0.02333, audio_tagging_loss=0.01031, over 14511.00 frames. ], tot_loss[loss=0.0893, simple_loss=0.1078, pruned_loss=0.02385, audio_tagging_loss=0.01154, over 2634950.31 frames. ], batch size: 54, lr: 7.11e-03, grad_scale: 32.0 2023-11-19 16:39:00,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=724066.6666666666, ans=0.0 2023-11-19 16:39:06,210 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.939e+01 8.916e+01 9.621e+01 1.044e+02 1.431e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-19 16:39:22,122 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.82 vs. limit=22.5 2023-11-19 16:39:44,966 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 108650 2023-11-19 16:39:49,283 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 16:39:59,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=724400.0, ans=0.125 2023-11-19 16:39:59,924 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 450, loss[loss=0.09641, simple_loss=0.1145, pruned_loss=0.02667, audio_tagging_loss=0.01247, over 15165.00 frames. ], tot_loss[loss=0.08894, simple_loss=0.1075, pruned_loss=0.02386, audio_tagging_loss=0.01135, over 2725453.96 frames. ], batch size: 57, lr: 7.11e-03, grad_scale: 32.0 2023-11-19 16:40:35,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=724533.3333333334, ans=0.125 2023-11-19 16:40:48,032 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 108700 2023-11-19 16:40:48,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=724600.0, ans=0.2 2023-11-19 16:41:01,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=724733.3333333334, ans=0.125 2023-11-19 16:41:02,577 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 500, loss[loss=0.08679, simple_loss=0.1001, pruned_loss=0.02612, audio_tagging_loss=0.0106, over 15242.00 frames. ], tot_loss[loss=0.08827, simple_loss=0.1068, pruned_loss=0.02375, audio_tagging_loss=0.01111, over 2797031.34 frames. ], batch size: 57, lr: 7.11e-03, grad_scale: 32.0 2023-11-19 16:41:13,946 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.793e+01 8.347e+01 9.313e+01 1.051e+02 1.429e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-19 16:41:19,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=724800.0, ans=0.1 2023-11-19 16:41:22,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=724800.0, ans=0.125 2023-11-19 16:41:50,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=724933.3333333334, ans=0.0 2023-11-19 16:41:51,535 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 108750 2023-11-19 16:41:57,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=725000.0, ans=0.0 2023-11-19 16:42:07,368 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 550, loss[loss=0.08639, simple_loss=0.09989, pruned_loss=0.0219, audio_tagging_loss=0.01455, over 15536.00 frames. ], tot_loss[loss=0.08725, simple_loss=0.1057, pruned_loss=0.02341, audio_tagging_loss=0.01099, over 2862908.91 frames. ], batch size: 57, lr: 7.11e-03, grad_scale: 32.0 2023-11-19 16:42:12,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=725066.6666666666, ans=0.125 2023-11-19 16:42:18,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=725066.6666666666, ans=0.0 2023-11-19 16:42:23,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=725133.3333333334, ans=0.125 2023-11-19 16:42:56,362 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 108800 2023-11-19 16:43:07,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=725333.3333333334, ans=0.0 2023-11-19 16:43:12,317 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 600, loss[loss=0.0781, simple_loss=0.09764, pruned_loss=0.01825, audio_tagging_loss=0.01103, over 13933.00 frames. ], tot_loss[loss=0.08736, simple_loss=0.1058, pruned_loss=0.02349, audio_tagging_loss=0.01099, over 2909729.91 frames. ], batch size: 56, lr: 7.10e-03, grad_scale: 32.0 2023-11-19 16:43:21,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=725400.0, ans=0.0 2023-11-19 16:43:21,979 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.124e+01 8.186e+01 8.803e+01 9.595e+01 1.577e+02, threshold=1.761e+02, percent-clipped=0.0 2023-11-19 16:43:27,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=725466.6666666666, ans=15.0 2023-11-19 16:43:58,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=725600.0, ans=0.1 2023-11-19 16:44:01,068 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 108850 2023-11-19 16:44:15,744 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 650, loss[loss=0.09367, simple_loss=0.1215, pruned_loss=0.02768, audio_tagging_loss=0.005241, over 15971.00 frames. ], tot_loss[loss=0.08728, simple_loss=0.1061, pruned_loss=0.02348, audio_tagging_loss=0.01077, over 2942856.01 frames. ], batch size: 58, lr: 7.10e-03, grad_scale: 16.0 2023-11-19 16:44:16,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=725733.3333333334, ans=0.125 2023-11-19 16:44:25,260 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 16:44:41,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=725866.6666666666, ans=0.125 2023-11-19 16:45:04,121 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 108900 2023-11-19 16:45:19,998 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 700, loss[loss=0.08592, simple_loss=0.1066, pruned_loss=0.02444, audio_tagging_loss=0.00818, over 16047.00 frames. ], tot_loss[loss=0.0867, simple_loss=0.1054, pruned_loss=0.02323, audio_tagging_loss=0.01079, over 2958976.43 frames. ], batch size: 62, lr: 7.10e-03, grad_scale: 16.0 2023-11-19 16:45:28,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=726066.6666666666, ans=0.2 2023-11-19 16:45:30,778 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.186e+01 8.106e+01 8.886e+01 9.595e+01 1.122e+02, threshold=1.777e+02, percent-clipped=0.0 2023-11-19 16:45:32,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=726133.3333333334, ans=0.125 2023-11-19 16:45:47,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=726200.0, ans=0.125 2023-11-19 16:46:06,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=726266.6666666666, ans=0.0 2023-11-19 16:46:07,928 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 108950 2023-11-19 16:46:20,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=726333.3333333334, ans=0.125 2023-11-19 16:46:22,480 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 750, loss[loss=0.09898, simple_loss=0.1273, pruned_loss=0.02754, audio_tagging_loss=0.007781, over 14450.00 frames. ], tot_loss[loss=0.0861, simple_loss=0.1047, pruned_loss=0.02295, audio_tagging_loss=0.01082, over 2979861.12 frames. ], batch size: 55, lr: 7.10e-03, grad_scale: 16.0 2023-11-19 16:46:22,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=726400.0, ans=0.025 2023-11-19 16:46:25,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=726400.0, ans=0.125 2023-11-19 16:46:26,918 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.52 vs. limit=10.0 2023-11-19 16:46:59,310 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.31 vs. limit=12.0 2023-11-19 16:47:10,496 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.68 vs. limit=22.5 2023-11-19 16:47:11,204 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 109000 2023-11-19 16:47:27,215 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 800, loss[loss=0.07231, simple_loss=0.08345, pruned_loss=0.01906, audio_tagging_loss=0.01153, over 14367.00 frames. ], tot_loss[loss=0.0868, simple_loss=0.1057, pruned_loss=0.02324, audio_tagging_loss=0.01074, over 2998349.58 frames. ], batch size: 56, lr: 7.10e-03, grad_scale: 32.0 2023-11-19 16:47:38,633 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.715e+01 8.461e+01 9.150e+01 1.030e+02 1.294e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-19 16:48:01,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=726866.6666666666, ans=0.0 2023-11-19 16:48:02,091 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.23 vs. limit=22.5 2023-11-19 16:48:04,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=726933.3333333334, ans=0.125 2023-11-19 16:48:15,670 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 109050 2023-11-19 16:48:18,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=727000.0, ans=0.125 2023-11-19 16:48:29,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=727000.0, ans=0.125 2023-11-19 16:48:31,317 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 850, loss[loss=0.09546, simple_loss=0.1194, pruned_loss=0.02608, audio_tagging_loss=0.009672, over 15758.00 frames. ], tot_loss[loss=0.08681, simple_loss=0.1057, pruned_loss=0.02323, audio_tagging_loss=0.01075, over 3005072.34 frames. ], batch size: 57, lr: 7.10e-03, grad_scale: 32.0 2023-11-19 16:48:59,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=727200.0, ans=0.2 2023-11-19 16:49:12,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=727266.6666666666, ans=0.0 2023-11-19 16:49:19,640 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 109100 2023-11-19 16:49:19,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=727266.6666666666, ans=0.0 2023-11-19 16:49:24,785 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=3.261e-01 2023-11-19 16:49:31,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=727333.3333333334, ans=0.1 2023-11-19 16:49:33,834 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.65 vs. limit=15.0 2023-11-19 16:49:34,190 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 900, loss[loss=0.07469, simple_loss=0.08138, pruned_loss=0.02172, audio_tagging_loss=0.01228, over 14426.00 frames. ], tot_loss[loss=0.08627, simple_loss=0.1047, pruned_loss=0.02309, audio_tagging_loss=0.01084, over 3013491.59 frames. ], batch size: 55, lr: 7.09e-03, grad_scale: 32.0 2023-11-19 16:49:34,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=727400.0, ans=0.125 2023-11-19 16:49:45,191 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.987e+01 8.269e+01 9.055e+01 9.679e+01 1.261e+02, threshold=1.811e+02, percent-clipped=0.0 2023-11-19 16:49:52,649 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.80 vs. limit=15.0 2023-11-19 16:50:03,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=727533.3333333334, ans=0.1 2023-11-19 16:50:10,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=727533.3333333334, ans=0.025 2023-11-19 16:50:22,852 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 109150 2023-11-19 16:50:24,783 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.30 vs. limit=10.0 2023-11-19 16:50:37,251 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 950, loss[loss=0.09127, simple_loss=0.1062, pruned_loss=0.02345, audio_tagging_loss=0.01473, over 14933.00 frames. ], tot_loss[loss=0.08747, simple_loss=0.1066, pruned_loss=0.02347, audio_tagging_loss=0.01072, over 3026117.67 frames. ], batch size: 54, lr: 7.09e-03, grad_scale: 32.0 2023-11-19 16:50:39,280 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 16:51:11,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=727866.6666666666, ans=0.2 2023-11-19 16:51:11,700 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.92 vs. limit=22.5 2023-11-19 16:51:25,981 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 109200 2023-11-19 16:51:42,990 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 1000, loss[loss=0.09059, simple_loss=0.1132, pruned_loss=0.02294, audio_tagging_loss=0.01104, over 16021.00 frames. ], tot_loss[loss=0.08647, simple_loss=0.1054, pruned_loss=0.02317, audio_tagging_loss=0.01059, over 3031284.63 frames. ], batch size: 63, lr: 7.09e-03, grad_scale: 32.0 2023-11-19 16:51:53,748 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.418e+01 8.048e+01 8.889e+01 9.743e+01 1.398e+02, threshold=1.778e+02, percent-clipped=0.0 2023-11-19 16:51:54,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=728133.3333333334, ans=0.0 2023-11-19 16:51:54,709 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2023-11-19 16:52:08,474 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 16:52:11,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=728200.0, ans=0.0 2023-11-19 16:52:13,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=728200.0, ans=0.125 2023-11-19 16:52:21,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=728266.6666666666, ans=0.025 2023-11-19 16:52:31,583 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 109250 2023-11-19 16:52:34,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=728333.3333333334, ans=0.2 2023-11-19 16:52:35,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=728333.3333333334, ans=0.125 2023-11-19 16:52:36,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=728333.3333333334, ans=0.2 2023-11-19 16:52:40,628 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.06 vs. limit=10.0 2023-11-19 16:52:46,322 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 1050, loss[loss=0.07418, simple_loss=0.09502, pruned_loss=0.01886, audio_tagging_loss=0.007804, over 14593.00 frames. ], tot_loss[loss=0.08578, simple_loss=0.1046, pruned_loss=0.02301, audio_tagging_loss=0.01044, over 3032980.99 frames. ], batch size: 55, lr: 7.09e-03, grad_scale: 32.0 2023-11-19 16:53:00,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=728466.6666666666, ans=0.125 2023-11-19 16:53:06,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=728466.6666666666, ans=0.125 2023-11-19 16:53:14,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=728533.3333333334, ans=0.1 2023-11-19 16:53:20,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=728533.3333333334, ans=0.2 2023-11-19 16:53:25,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=728600.0, ans=0.0 2023-11-19 16:53:35,472 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 109300 2023-11-19 16:53:49,976 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 1100, loss[loss=0.08241, simple_loss=0.1017, pruned_loss=0.02083, audio_tagging_loss=0.01075, over 14216.00 frames. ], tot_loss[loss=0.08492, simple_loss=0.1036, pruned_loss=0.02266, audio_tagging_loss=0.01044, over 3028123.65 frames. ], batch size: 52, lr: 7.09e-03, grad_scale: 32.0 2023-11-19 16:53:52,584 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 16:54:01,860 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.245e+01 8.411e+01 9.070e+01 1.020e+02 1.382e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-19 16:54:07,847 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=728800.0, ans=0.0 2023-11-19 16:54:15,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=728866.6666666666, ans=0.0 2023-11-19 16:54:17,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=728866.6666666666, ans=0.125 2023-11-19 16:54:31,912 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.41 vs. limit=15.0 2023-11-19 16:54:37,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=728933.3333333334, ans=0.125 2023-11-19 16:54:38,767 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 109350 2023-11-19 16:54:47,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=729000.0, ans=0.0 2023-11-19 16:54:54,102 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 1150, loss[loss=0.1003, simple_loss=0.1324, pruned_loss=0.0256, audio_tagging_loss=0.008467, over 15641.00 frames. ], tot_loss[loss=0.08576, simple_loss=0.1045, pruned_loss=0.02309, audio_tagging_loss=0.01044, over 3031686.17 frames. ], batch size: 55, lr: 7.09e-03, grad_scale: 32.0 2023-11-19 16:55:15,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=729133.3333333334, ans=0.125 2023-11-19 16:55:26,134 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 16:55:35,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=729266.6666666666, ans=0.1 2023-11-19 16:55:40,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=729266.6666666666, ans=0.09899494936611666 2023-11-19 16:55:42,508 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 109400 2023-11-19 16:55:53,673 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.11 vs. limit=15.0 2023-11-19 16:55:58,773 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 1200, loss[loss=0.09962, simple_loss=0.1264, pruned_loss=0.02435, audio_tagging_loss=0.01206, over 16085.00 frames. ], tot_loss[loss=0.08591, simple_loss=0.1048, pruned_loss=0.02318, audio_tagging_loss=0.01033, over 3037125.87 frames. ], batch size: 59, lr: 7.08e-03, grad_scale: 32.0 2023-11-19 16:56:03,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=729400.0, ans=0.1 2023-11-19 16:56:09,729 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.219e+01 8.170e+01 9.038e+01 9.712e+01 1.366e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-19 16:56:29,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=729533.3333333334, ans=0.125 2023-11-19 16:56:47,480 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 109450 2023-11-19 16:56:51,444 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.03 vs. limit=15.0 2023-11-19 16:57:02,017 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 1250, loss[loss=0.07458, simple_loss=0.09672, pruned_loss=0.01701, audio_tagging_loss=0.009215, over 14870.00 frames. ], tot_loss[loss=0.08589, simple_loss=0.1049, pruned_loss=0.02314, audio_tagging_loss=0.0103, over 3032234.38 frames. ], batch size: 56, lr: 7.08e-03, grad_scale: 32.0 2023-11-19 16:57:17,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=729800.0, ans=0.0 2023-11-19 16:57:22,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=729800.0, ans=0.0 2023-11-19 16:57:37,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=729866.6666666666, ans=0.0 2023-11-19 16:57:51,008 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 109500 2023-11-19 16:58:02,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=730000.0, ans=0.0 2023-11-19 16:58:05,807 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 1300, loss[loss=0.0768, simple_loss=0.09047, pruned_loss=0.01968, audio_tagging_loss=0.01189, over 15642.00 frames. ], tot_loss[loss=0.08578, simple_loss=0.1047, pruned_loss=0.02303, audio_tagging_loss=0.01039, over 3034417.41 frames. ], batch size: 58, lr: 7.08e-03, grad_scale: 32.0 2023-11-19 16:58:12,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=730066.6666666666, ans=0.0 2023-11-19 16:58:12,816 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.49 vs. limit=12.0 2023-11-19 16:58:18,231 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.587e+01 8.086e+01 8.673e+01 9.719e+01 1.253e+02, threshold=1.735e+02, percent-clipped=0.0 2023-11-19 16:58:27,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=730133.3333333334, ans=0.125 2023-11-19 16:58:42,196 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.21 vs. limit=15.0 2023-11-19 16:58:43,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=730266.6666666666, ans=0.5 2023-11-19 16:58:46,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=730266.6666666666, ans=0.0 2023-11-19 16:58:54,050 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 109550 2023-11-19 16:59:06,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=730333.3333333334, ans=0.1 2023-11-19 16:59:10,757 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 1350, loss[loss=0.08578, simple_loss=0.1106, pruned_loss=0.02317, audio_tagging_loss=0.007319, over 16400.00 frames. ], tot_loss[loss=0.08626, simple_loss=0.1051, pruned_loss=0.0233, audio_tagging_loss=0.01039, over 3046604.34 frames. ], batch size: 58, lr: 7.08e-03, grad_scale: 32.0 2023-11-19 16:59:12,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=730400.0, ans=0.125 2023-11-19 16:59:18,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=730400.0, ans=0.125 2023-11-19 16:59:56,672 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 16:59:59,120 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 109600 2023-11-19 17:00:14,627 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 1400, loss[loss=0.08265, simple_loss=0.1049, pruned_loss=0.01876, audio_tagging_loss=0.01144, over 14272.00 frames. ], tot_loss[loss=0.08631, simple_loss=0.1049, pruned_loss=0.02342, audio_tagging_loss=0.01043, over 3044769.67 frames. ], batch size: 55, lr: 7.08e-03, grad_scale: 32.0 2023-11-19 17:00:25,879 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.607e+01 8.437e+01 9.173e+01 9.925e+01 1.308e+02, threshold=1.835e+02, percent-clipped=0.0 2023-11-19 17:00:53,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=730933.3333333334, ans=0.2 2023-11-19 17:01:03,343 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 109650 2023-11-19 17:01:18,055 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 1450, loss[loss=0.09603, simple_loss=0.1171, pruned_loss=0.02529, audio_tagging_loss=0.01218, over 15245.00 frames. ], tot_loss[loss=0.08617, simple_loss=0.105, pruned_loss=0.02331, audio_tagging_loss=0.01035, over 3044838.21 frames. ], batch size: 58, lr: 7.08e-03, grad_scale: 16.0 2023-11-19 17:01:19,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=731066.6666666666, ans=0.0 2023-11-19 17:01:37,580 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.18 vs. limit=12.0 2023-11-19 17:01:45,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=731200.0, ans=0.125 2023-11-19 17:01:48,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=731200.0, ans=0.125 2023-11-19 17:02:06,400 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 109700 2023-11-19 17:02:07,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=731333.3333333334, ans=0.0 2023-11-19 17:02:22,242 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 1500, loss[loss=0.1037, simple_loss=0.1269, pruned_loss=0.03171, audio_tagging_loss=0.008494, over 14814.00 frames. ], tot_loss[loss=0.0867, simple_loss=0.1057, pruned_loss=0.0235, audio_tagging_loss=0.01037, over 3046956.17 frames. ], batch size: 53, lr: 7.08e-03, grad_scale: 16.0 2023-11-19 17:02:29,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=731400.0, ans=0.0 2023-11-19 17:02:35,056 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.251e+01 8.288e+01 9.153e+01 9.955e+01 1.243e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-19 17:02:43,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=731466.6666666666, ans=0.1 2023-11-19 17:03:05,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=731600.0, ans=0.125 2023-11-19 17:03:11,036 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 109750 2023-11-19 17:03:11,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=731600.0, ans=0.125 2023-11-19 17:03:22,142 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.42 vs. limit=22.5 2023-11-19 17:03:26,123 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 1550, loss[loss=0.0773, simple_loss=0.08907, pruned_loss=0.02245, audio_tagging_loss=0.01031, over 16303.00 frames. ], tot_loss[loss=0.08736, simple_loss=0.1064, pruned_loss=0.02371, audio_tagging_loss=0.01047, over 3051456.57 frames. ], batch size: 64, lr: 7.07e-03, grad_scale: 16.0 2023-11-19 17:03:28,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=731733.3333333334, ans=0.125 2023-11-19 17:03:33,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=731733.3333333334, ans=0.0 2023-11-19 17:03:38,974 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=8.18 vs. limit=12.0 2023-11-19 17:03:43,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=731800.0, ans=0.125 2023-11-19 17:03:48,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=731800.0, ans=0.125 2023-11-19 17:03:52,092 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.52 vs. limit=15.0 2023-11-19 17:04:03,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=731933.3333333334, ans=0.1 2023-11-19 17:04:10,299 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.12 vs. limit=6.0 2023-11-19 17:04:12,853 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.29 vs. limit=22.5 2023-11-19 17:04:14,517 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 109800 2023-11-19 17:04:21,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=732000.0, ans=0.2 2023-11-19 17:04:29,593 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 1600, loss[loss=0.09621, simple_loss=0.117, pruned_loss=0.02531, audio_tagging_loss=0.01242, over 15799.00 frames. ], tot_loss[loss=0.08769, simple_loss=0.107, pruned_loss=0.02371, audio_tagging_loss=0.01049, over 3052380.54 frames. ], batch size: 58, lr: 7.07e-03, grad_scale: 32.0 2023-11-19 17:04:32,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=732066.6666666666, ans=0.0 2023-11-19 17:04:42,924 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.880e+01 8.694e+01 9.571e+01 1.026e+02 1.392e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-19 17:04:48,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=732133.3333333334, ans=0.1 2023-11-19 17:04:50,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=732133.3333333334, ans=0.0 2023-11-19 17:04:58,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=732200.0, ans=0.2 2023-11-19 17:05:11,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=732266.6666666666, ans=0.125 2023-11-19 17:05:11,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=732266.6666666666, ans=0.0 2023-11-19 17:05:18,451 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 109850 2023-11-19 17:05:27,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=732333.3333333334, ans=0.2 2023-11-19 17:05:34,208 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 1650, loss[loss=0.1155, simple_loss=0.1404, pruned_loss=0.03431, audio_tagging_loss=0.01099, over 15736.00 frames. ], tot_loss[loss=0.08769, simple_loss=0.1071, pruned_loss=0.02361, audio_tagging_loss=0.01052, over 3056862.23 frames. ], batch size: 57, lr: 7.07e-03, grad_scale: 32.0 2023-11-19 17:05:35,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=732400.0, ans=0.125 2023-11-19 17:05:49,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=732466.6666666666, ans=0.5 2023-11-19 17:05:53,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=732466.6666666666, ans=0.0 2023-11-19 17:06:22,810 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 109900 2023-11-19 17:06:34,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=732666.6666666666, ans=0.0 2023-11-19 17:06:38,148 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 1700, loss[loss=0.08027, simple_loss=0.1011, pruned_loss=0.01985, audio_tagging_loss=0.009868, over 15314.00 frames. ], tot_loss[loss=0.08727, simple_loss=0.1062, pruned_loss=0.0235, audio_tagging_loss=0.01066, over 3052449.78 frames. ], batch size: 58, lr: 7.07e-03, grad_scale: 32.0 2023-11-19 17:06:49,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=732800.0, ans=0.125 2023-11-19 17:06:50,474 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.181e+01 8.219e+01 8.857e+01 9.747e+01 1.189e+02, threshold=1.771e+02, percent-clipped=0.0 2023-11-19 17:06:59,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=732800.0, ans=0.125 2023-11-19 17:07:02,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=732866.6666666666, ans=0.5 2023-11-19 17:07:09,002 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.37 vs. limit=10.0 2023-11-19 17:07:13,763 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.13 vs. limit=15.0 2023-11-19 17:07:21,021 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.53 vs. limit=15.0 2023-11-19 17:07:27,173 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 109950 2023-11-19 17:07:37,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=733000.0, ans=0.0 2023-11-19 17:07:39,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=733000.0, ans=0.0 2023-11-19 17:07:41,725 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 1750, loss[loss=0.05625, simple_loss=0.0615, pruned_loss=0.01192, audio_tagging_loss=0.01357, over 14796.00 frames. ], tot_loss[loss=0.08776, simple_loss=0.1069, pruned_loss=0.02372, audio_tagging_loss=0.01059, over 3055279.89 frames. ], batch size: 58, lr: 7.07e-03, grad_scale: 32.0 2023-11-19 17:07:47,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=733066.6666666666, ans=0.125 2023-11-19 17:07:54,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=733133.3333333334, ans=0.2 2023-11-19 17:07:56,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=733133.3333333334, ans=0.0 2023-11-19 17:08:00,677 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.71 vs. limit=15.0 2023-11-19 17:08:30,313 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 110000 2023-11-19 17:08:33,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=733333.3333333334, ans=0.0 2023-11-19 17:08:46,848 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 1800, loss[loss=0.08747, simple_loss=0.1076, pruned_loss=0.02419, audio_tagging_loss=0.009491, over 15074.00 frames. ], tot_loss[loss=0.0865, simple_loss=0.1056, pruned_loss=0.02316, audio_tagging_loss=0.01053, over 3056290.48 frames. ], batch size: 55, lr: 7.07e-03, grad_scale: 32.0 2023-11-19 17:08:51,262 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.54 vs. limit=22.5 2023-11-19 17:08:57,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=733400.0, ans=0.125 2023-11-19 17:09:00,175 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.669e+01 8.372e+01 9.088e+01 1.009e+02 1.305e+02, threshold=1.818e+02, percent-clipped=0.0 2023-11-19 17:09:02,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=733466.6666666666, ans=0.035 2023-11-19 17:09:06,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=733466.6666666666, ans=0.1 2023-11-19 17:09:23,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=733600.0, ans=0.125 2023-11-19 17:09:23,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=733600.0, ans=0.0 2023-11-19 17:09:35,495 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 110050 2023-11-19 17:09:45,968 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.44 vs. limit=22.5 2023-11-19 17:09:49,318 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.00 vs. limit=15.0 2023-11-19 17:09:50,022 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 1850, loss[loss=0.06811, simple_loss=0.0877, pruned_loss=0.01575, audio_tagging_loss=0.008516, over 14586.00 frames. ], tot_loss[loss=0.08599, simple_loss=0.1049, pruned_loss=0.02302, audio_tagging_loss=0.01053, over 3055894.77 frames. ], batch size: 58, lr: 7.06e-03, grad_scale: 16.0 2023-11-19 17:10:09,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=733800.0, ans=0.0 2023-11-19 17:10:38,851 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 110100 2023-11-19 17:10:54,480 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 1900, loss[loss=0.07249, simple_loss=0.08445, pruned_loss=0.01722, audio_tagging_loss=0.01305, over 14956.00 frames. ], tot_loss[loss=0.08657, simple_loss=0.1058, pruned_loss=0.02326, audio_tagging_loss=0.0104, over 3062295.15 frames. ], batch size: 58, lr: 7.06e-03, grad_scale: 16.0 2023-11-19 17:11:08,887 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.512e+01 8.528e+01 8.978e+01 9.700e+01 1.316e+02, threshold=1.796e+02, percent-clipped=0.0 2023-11-19 17:11:12,916 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.21 vs. limit=5.0 2023-11-19 17:11:19,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff3.min_abs, batch_count=734200.0, ans=0.2 2023-11-19 17:11:21,307 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.61 vs. limit=15.0 2023-11-19 17:11:40,671 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.86 vs. limit=15.0 2023-11-19 17:11:43,527 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 110150 2023-11-19 17:11:51,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=734333.3333333334, ans=10.0 2023-11-19 17:11:59,392 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 1950, loss[loss=0.07945, simple_loss=0.09109, pruned_loss=0.02242, audio_tagging_loss=0.01148, over 14679.00 frames. ], tot_loss[loss=0.08629, simple_loss=0.1057, pruned_loss=0.02316, audio_tagging_loss=0.01029, over 3054491.65 frames. ], batch size: 56, lr: 7.06e-03, grad_scale: 16.0 2023-11-19 17:12:02,300 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.18 vs. limit=12.0 2023-11-19 17:12:04,520 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 17:12:07,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=734400.0, ans=0.2 2023-11-19 17:12:12,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=734466.6666666666, ans=0.0 2023-11-19 17:12:38,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=734600.0, ans=0.0 2023-11-19 17:12:48,382 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 110200 2023-11-19 17:12:48,938 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.80 vs. limit=22.5 2023-11-19 17:12:56,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=734666.6666666666, ans=0.125 2023-11-19 17:13:03,587 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 2000, loss[loss=0.08729, simple_loss=0.1063, pruned_loss=0.02377, audio_tagging_loss=0.01036, over 14999.00 frames. ], tot_loss[loss=0.08551, simple_loss=0.1045, pruned_loss=0.02286, audio_tagging_loss=0.0104, over 3053104.90 frames. ], batch size: 57, lr: 7.06e-03, grad_scale: 32.0 2023-11-19 17:13:08,110 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.05 vs. limit=15.0 2023-11-19 17:13:13,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=734733.3333333334, ans=0.1 2023-11-19 17:13:17,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=734800.0, ans=0.0 2023-11-19 17:13:17,909 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.098e+01 8.336e+01 8.914e+01 9.433e+01 1.309e+02, threshold=1.783e+02, percent-clipped=0.0 2023-11-19 17:13:38,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=734866.6666666666, ans=0.125 2023-11-19 17:13:52,713 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 110250 2023-11-19 17:14:08,076 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 2050, loss[loss=0.07093, simple_loss=0.09197, pruned_loss=0.01704, audio_tagging_loss=0.007904, over 14766.00 frames. ], tot_loss[loss=0.08545, simple_loss=0.1047, pruned_loss=0.02282, audio_tagging_loss=0.01031, over 3051211.33 frames. ], batch size: 55, lr: 7.06e-03, grad_scale: 32.0 2023-11-19 17:14:09,418 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=735066.6666666666, ans=0.0 2023-11-19 17:14:21,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=735133.3333333334, ans=0.2 2023-11-19 17:14:46,649 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.91 vs. limit=15.0 2023-11-19 17:14:55,486 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 110300 2023-11-19 17:14:55,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=735266.6666666666, ans=0.0 2023-11-19 17:15:10,850 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 2100, loss[loss=0.07779, simple_loss=0.09737, pruned_loss=0.01736, audio_tagging_loss=0.01175, over 15589.00 frames. ], tot_loss[loss=0.08571, simple_loss=0.1051, pruned_loss=0.0229, audio_tagging_loss=0.01024, over 3055561.65 frames. ], batch size: 59, lr: 7.06e-03, grad_scale: 32.0 2023-11-19 17:15:14,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=735400.0, ans=0.1 2023-11-19 17:15:18,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=735400.0, ans=0.95 2023-11-19 17:15:18,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=735400.0, ans=0.0 2023-11-19 17:15:25,431 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.080e+01 8.159e+01 8.890e+01 9.967e+01 1.434e+02, threshold=1.778e+02, percent-clipped=0.0 2023-11-19 17:15:29,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=735466.6666666666, ans=0.125 2023-11-19 17:15:58,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=735600.0, ans=0.125 2023-11-19 17:16:00,459 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 110350 2023-11-19 17:16:13,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=735666.6666666666, ans=0.1 2023-11-19 17:16:15,519 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 2150, loss[loss=0.08506, simple_loss=0.1163, pruned_loss=0.01978, audio_tagging_loss=0.007143, over 14347.00 frames. ], tot_loss[loss=0.08612, simple_loss=0.1057, pruned_loss=0.02303, audio_tagging_loss=0.01023, over 3054314.16 frames. ], batch size: 53, lr: 7.05e-03, grad_scale: 32.0 2023-11-19 17:16:27,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=735800.0, ans=0.1 2023-11-19 17:16:30,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=735800.0, ans=0.125 2023-11-19 17:16:55,014 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 17:16:57,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=735933.3333333334, ans=0.2 2023-11-19 17:17:02,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=735933.3333333334, ans=0.125 2023-11-19 17:17:04,955 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 110400 2023-11-19 17:17:12,344 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.51 vs. limit=15.0 2023-11-19 17:17:20,310 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 2200, loss[loss=0.1107, simple_loss=0.138, pruned_loss=0.03203, audio_tagging_loss=0.009686, over 14954.00 frames. ], tot_loss[loss=0.0862, simple_loss=0.1059, pruned_loss=0.02308, audio_tagging_loss=0.01019, over 3050707.35 frames. ], batch size: 54, lr: 7.05e-03, grad_scale: 16.0 2023-11-19 17:17:21,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=736066.6666666666, ans=0.035 2023-11-19 17:17:35,743 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.097e+01 8.594e+01 9.409e+01 1.055e+02 1.451e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-19 17:17:44,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=736133.3333333334, ans=0.0 2023-11-19 17:18:10,110 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 110450 2023-11-19 17:18:13,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=736333.3333333334, ans=0.125 2023-11-19 17:18:16,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=736333.3333333334, ans=0.95 2023-11-19 17:18:18,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=736333.3333333334, ans=0.125 2023-11-19 17:18:25,416 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 2250, loss[loss=0.0843, simple_loss=0.1104, pruned_loss=0.01958, audio_tagging_loss=0.009531, over 15420.00 frames. ], tot_loss[loss=0.08677, simple_loss=0.1064, pruned_loss=0.02334, audio_tagging_loss=0.01021, over 3052724.68 frames. ], batch size: 56, lr: 7.05e-03, grad_scale: 16.0 2023-11-19 17:18:47,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=736466.6666666666, ans=0.1 2023-11-19 17:19:01,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=736533.3333333334, ans=0.0 2023-11-19 17:19:15,053 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 110500 2023-11-19 17:19:31,624 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 2300, loss[loss=0.09427, simple_loss=0.1091, pruned_loss=0.02922, audio_tagging_loss=0.01049, over 14308.00 frames. ], tot_loss[loss=0.08634, simple_loss=0.1052, pruned_loss=0.02329, audio_tagging_loss=0.01043, over 3046367.57 frames. ], batch size: 54, lr: 7.05e-03, grad_scale: 8.0 2023-11-19 17:19:31,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=736733.3333333334, ans=0.0 2023-11-19 17:19:47,336 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.852e+01 8.474e+01 9.296e+01 1.022e+02 1.350e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-19 17:19:53,339 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.05 vs. limit=10.0 2023-11-19 17:20:21,203 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 110550 2023-11-19 17:20:28,649 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 17:20:36,089 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 2350, loss[loss=0.1148, simple_loss=0.1348, pruned_loss=0.03604, audio_tagging_loss=0.01135, over 16310.00 frames. ], tot_loss[loss=0.08721, simple_loss=0.1063, pruned_loss=0.02362, audio_tagging_loss=0.01046, over 3045866.84 frames. ], batch size: 62, lr: 7.05e-03, grad_scale: 8.0 2023-11-19 17:20:36,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=737066.6666666666, ans=0.125 2023-11-19 17:20:40,119 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=5.270e-03 2023-11-19 17:20:54,666 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.03 vs. limit=15.0 2023-11-19 17:21:14,907 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.36 vs. limit=15.0 2023-11-19 17:21:15,035 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.21 vs. limit=15.0 2023-11-19 17:21:25,375 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 110600 2023-11-19 17:21:40,465 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 2400, loss[loss=0.08586, simple_loss=0.09886, pruned_loss=0.02417, audio_tagging_loss=0.01226, over 14388.00 frames. ], tot_loss[loss=0.08641, simple_loss=0.1051, pruned_loss=0.02317, audio_tagging_loss=0.01069, over 3048197.35 frames. ], batch size: 56, lr: 7.05e-03, grad_scale: 16.0 2023-11-19 17:21:42,309 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.26 vs. limit=15.0 2023-11-19 17:21:53,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=737400.0, ans=0.125 2023-11-19 17:21:58,849 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.978e+01 8.550e+01 9.088e+01 1.010e+02 1.299e+02, threshold=1.818e+02, percent-clipped=0.0 2023-11-19 17:22:04,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=737466.6666666666, ans=0.07 2023-11-19 17:22:07,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=737533.3333333334, ans=0.0 2023-11-19 17:22:16,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=737533.3333333334, ans=0.125 2023-11-19 17:22:18,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=737600.0, ans=0.2 2023-11-19 17:22:23,806 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.56 vs. limit=22.5 2023-11-19 17:22:29,513 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 110650 2023-11-19 17:22:29,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=737600.0, ans=0.125 2023-11-19 17:22:46,859 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 2450, loss[loss=0.09831, simple_loss=0.1298, pruned_loss=0.0227, audio_tagging_loss=0.0107, over 15556.00 frames. ], tot_loss[loss=0.08609, simple_loss=0.1049, pruned_loss=0.02295, audio_tagging_loss=0.01067, over 3050714.66 frames. ], batch size: 57, lr: 7.04e-03, grad_scale: 16.0 2023-11-19 17:22:55,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=737733.3333333334, ans=0.0 2023-11-19 17:22:55,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=737733.3333333334, ans=0.125 2023-11-19 17:23:18,247 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.65 vs. limit=8.0 2023-11-19 17:23:35,180 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 110700 2023-11-19 17:23:49,614 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 2500, loss[loss=0.08923, simple_loss=0.1096, pruned_loss=0.02324, audio_tagging_loss=0.01119, over 15330.00 frames. ], tot_loss[loss=0.08605, simple_loss=0.1047, pruned_loss=0.02299, audio_tagging_loss=0.01071, over 3048676.28 frames. ], batch size: 59, lr: 7.04e-03, grad_scale: 16.0 2023-11-19 17:24:05,694 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.741e+01 8.346e+01 8.795e+01 9.751e+01 1.396e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-19 17:24:11,889 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.41 vs. limit=15.0 2023-11-19 17:24:12,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=738133.3333333334, ans=0.0 2023-11-19 17:24:23,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=738200.0, ans=0.05 2023-11-19 17:24:38,671 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 110750 2023-11-19 17:24:53,074 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 2550, loss[loss=0.07796, simple_loss=0.09994, pruned_loss=0.02068, audio_tagging_loss=0.007314, over 16121.00 frames. ], tot_loss[loss=0.08567, simple_loss=0.1043, pruned_loss=0.02291, audio_tagging_loss=0.01062, over 3043369.46 frames. ], batch size: 62, lr: 7.04e-03, grad_scale: 16.0 2023-11-19 17:24:57,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=738400.0, ans=0.125 2023-11-19 17:25:28,120 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.85 vs. limit=15.0 2023-11-19 17:25:41,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=738600.0, ans=0.125 2023-11-19 17:25:42,424 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 110800 2023-11-19 17:25:58,238 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.68 vs. limit=15.0 2023-11-19 17:26:00,189 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 2600, loss[loss=0.08864, simple_loss=0.1012, pruned_loss=0.02677, audio_tagging_loss=0.01126, over 15053.00 frames. ], tot_loss[loss=0.08585, simple_loss=0.1044, pruned_loss=0.02306, audio_tagging_loss=0.01059, over 3047326.18 frames. ], batch size: 59, lr: 7.04e-03, grad_scale: 16.0 2023-11-19 17:26:16,156 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.092e+01 8.270e+01 8.898e+01 9.575e+01 2.029e+02, threshold=1.780e+02, percent-clipped=1.0 2023-11-19 17:26:17,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=738800.0, ans=0.2 2023-11-19 17:26:18,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=738800.0, ans=0.125 2023-11-19 17:26:32,930 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.01 vs. limit=15.0 2023-11-19 17:26:40,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=738933.3333333334, ans=0.125 2023-11-19 17:26:49,064 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 110850 2023-11-19 17:27:03,855 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 2650, loss[loss=0.07551, simple_loss=0.0975, pruned_loss=0.01872, audio_tagging_loss=0.008039, over 14627.00 frames. ], tot_loss[loss=0.08617, simple_loss=0.105, pruned_loss=0.02325, audio_tagging_loss=0.01039, over 3052006.51 frames. ], batch size: 54, lr: 7.04e-03, grad_scale: 16.0 2023-11-19 17:27:25,294 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.77 vs. limit=22.5 2023-11-19 17:27:37,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=739200.0, ans=0.2 2023-11-19 17:27:42,522 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.78 vs. limit=15.0 2023-11-19 17:27:49,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=739266.6666666666, ans=0.125 2023-11-19 17:27:53,014 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 110900 2023-11-19 17:28:07,634 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 2700, loss[loss=0.07606, simple_loss=0.08878, pruned_loss=0.02012, audio_tagging_loss=0.01155, over 15119.00 frames. ], tot_loss[loss=0.08552, simple_loss=0.1043, pruned_loss=0.02306, audio_tagging_loss=0.0103, over 3055920.62 frames. ], batch size: 58, lr: 7.04e-03, grad_scale: 16.0 2023-11-19 17:28:25,562 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.580e+01 8.552e+01 9.403e+01 1.042e+02 1.397e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-19 17:28:30,584 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.82 vs. limit=6.0 2023-11-19 17:28:37,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=739533.3333333334, ans=0.0 2023-11-19 17:28:49,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=739600.0, ans=0.125 2023-11-19 17:28:57,040 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 110950 2023-11-19 17:28:57,718 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.93 vs. limit=15.0 2023-11-19 17:28:59,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=739666.6666666666, ans=0.1 2023-11-19 17:29:07,969 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.69 vs. limit=15.0 2023-11-19 17:29:11,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=739733.3333333334, ans=0.0 2023-11-19 17:29:12,987 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 2750, loss[loss=0.08039, simple_loss=0.1004, pruned_loss=0.02064, audio_tagging_loss=0.009527, over 14997.00 frames. ], tot_loss[loss=0.08534, simple_loss=0.1042, pruned_loss=0.02294, audio_tagging_loss=0.01028, over 3052406.51 frames. ], batch size: 55, lr: 7.04e-03, grad_scale: 16.0 2023-11-19 17:29:14,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=739733.3333333334, ans=0.125 2023-11-19 17:30:01,060 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 111000 2023-11-19 17:30:06,991 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 17:30:11,092 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.43 vs. limit=6.0 2023-11-19 17:30:16,674 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 2800, loss[loss=0.08525, simple_loss=0.1081, pruned_loss=0.02102, audio_tagging_loss=0.01018, over 15521.00 frames. ], tot_loss[loss=0.08518, simple_loss=0.1038, pruned_loss=0.02296, audio_tagging_loss=0.01034, over 3050305.73 frames. ], batch size: 58, lr: 7.03e-03, grad_scale: 32.0 2023-11-19 17:30:32,825 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.580e+01 8.267e+01 8.759e+01 9.728e+01 1.191e+02, threshold=1.752e+02, percent-clipped=0.0 2023-11-19 17:30:35,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=740133.3333333334, ans=0.1 2023-11-19 17:30:41,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=740200.0, ans=0.05 2023-11-19 17:30:46,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=740200.0, ans=0.125 2023-11-19 17:30:48,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=740200.0, ans=0.1 2023-11-19 17:31:05,936 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 111050 2023-11-19 17:31:13,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=740333.3333333334, ans=0.125 2023-11-19 17:31:20,587 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 2850, loss[loss=0.07067, simple_loss=0.08653, pruned_loss=0.01591, audio_tagging_loss=0.0115, over 14974.00 frames. ], tot_loss[loss=0.0848, simple_loss=0.1035, pruned_loss=0.02274, audio_tagging_loss=0.01033, over 3056093.28 frames. ], batch size: 61, lr: 7.03e-03, grad_scale: 32.0 2023-11-19 17:31:23,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=740400.0, ans=0.125 2023-11-19 17:31:59,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=740600.0, ans=0.125 2023-11-19 17:32:03,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=740600.0, ans=0.125 2023-11-19 17:32:09,365 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 111100 2023-11-19 17:32:09,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=740600.0, ans=0.125 2023-11-19 17:32:16,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=740666.6666666666, ans=0.125 2023-11-19 17:32:19,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=740666.6666666666, ans=0.0 2023-11-19 17:32:25,193 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 2900, loss[loss=0.09, simple_loss=0.1149, pruned_loss=0.02525, audio_tagging_loss=0.007279, over 15154.00 frames. ], tot_loss[loss=0.08495, simple_loss=0.104, pruned_loss=0.02266, audio_tagging_loss=0.01029, over 3056369.08 frames. ], batch size: 56, lr: 7.03e-03, grad_scale: 16.0 2023-11-19 17:32:43,300 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.731e+01 8.402e+01 9.299e+01 9.982e+01 1.196e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-19 17:32:47,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=740800.0, ans=0.125 2023-11-19 17:32:49,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=740866.6666666666, ans=0.07 2023-11-19 17:32:56,194 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.08 vs. limit=15.0 2023-11-19 17:33:07,596 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.11 vs. limit=15.0 2023-11-19 17:33:10,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=740933.3333333334, ans=0.1 2023-11-19 17:33:14,512 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 111150 2023-11-19 17:33:17,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=741000.0, ans=0.125 2023-11-19 17:33:19,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=741000.0, ans=0.125 2023-11-19 17:33:21,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=741000.0, ans=0.125 2023-11-19 17:33:28,971 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 2950, loss[loss=0.09657, simple_loss=0.1256, pruned_loss=0.02633, audio_tagging_loss=0.007419, over 14810.00 frames. ], tot_loss[loss=0.08447, simple_loss=0.1035, pruned_loss=0.02232, audio_tagging_loss=0.0104, over 3050593.93 frames. ], batch size: 54, lr: 7.03e-03, grad_scale: 16.0 2023-11-19 17:33:31,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=741066.6666666666, ans=0.0 2023-11-19 17:33:44,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=741133.3333333334, ans=0.07 2023-11-19 17:34:10,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=741266.6666666666, ans=0.0 2023-11-19 17:34:17,243 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 111200 2023-11-19 17:34:28,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=741333.3333333334, ans=0.0 2023-11-19 17:34:28,547 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.05 vs. limit=15.0 2023-11-19 17:34:33,017 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 3000, loss[loss=0.08383, simple_loss=0.09176, pruned_loss=0.02682, audio_tagging_loss=0.01114, over 15197.00 frames. ], tot_loss[loss=0.08525, simple_loss=0.104, pruned_loss=0.02274, audio_tagging_loss=0.01052, over 3048331.93 frames. ], batch size: 56, lr: 7.03e-03, grad_scale: 16.0 2023-11-19 17:34:33,018 INFO [train_asr.py:1285] (1/4) Computing validation loss 2023-11-19 17:34:52,755 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.0515, 4.8211, 4.4027, 4.4791], device='cuda:1') 2023-11-19 17:35:02,762 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.1728, 2.3112, 5.0447, 2.4378], device='cuda:1') 2023-11-19 17:35:14,010 INFO [train_asr.py:1294] (1/4) Epoch 10, validation: loss=0.06437, simple_loss=0.0554, pruned_loss=0.006444, audio_tagging_loss=0.03022, over 4681554.00 frames. 2023-11-19 17:35:14,011 INFO [train_asr.py:1295] (1/4) Maximum memory allocated so far is 26082MB 2023-11-19 17:35:16,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=741400.0, ans=0.125 2023-11-19 17:35:31,930 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.687e+01 8.390e+01 9.154e+01 1.009e+02 1.642e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-19 17:35:32,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=741466.6666666666, ans=0.125 2023-11-19 17:35:48,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=741533.3333333334, ans=0.0 2023-11-19 17:35:54,735 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.01 vs. limit=15.0 2023-11-19 17:36:03,177 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 111250 2023-11-19 17:36:03,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=741600.0, ans=0.125 2023-11-19 17:36:03,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=741600.0, ans=0.0 2023-11-19 17:36:06,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=741666.6666666666, ans=0.125 2023-11-19 17:36:17,864 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 3050, loss[loss=0.08995, simple_loss=0.1211, pruned_loss=0.02067, audio_tagging_loss=0.008734, over 14667.00 frames. ], tot_loss[loss=0.08531, simple_loss=0.1042, pruned_loss=0.02267, audio_tagging_loss=0.01056, over 3050465.05 frames. ], batch size: 54, lr: 7.03e-03, grad_scale: 16.0 2023-11-19 17:36:51,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=741866.6666666666, ans=0.035 2023-11-19 17:36:55,106 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 17:37:05,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=741933.3333333334, ans=0.125 2023-11-19 17:37:06,320 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 111300 2023-11-19 17:37:18,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=742000.0, ans=0.125 2023-11-19 17:37:21,845 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 3100, loss[loss=0.09779, simple_loss=0.1235, pruned_loss=0.02576, audio_tagging_loss=0.01027, over 16097.00 frames. ], tot_loss[loss=0.08516, simple_loss=0.104, pruned_loss=0.02256, audio_tagging_loss=0.0106, over 3049794.51 frames. ], batch size: 61, lr: 7.02e-03, grad_scale: 16.0 2023-11-19 17:37:24,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=742066.6666666666, ans=0.0 2023-11-19 17:37:29,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=742066.6666666666, ans=0.125 2023-11-19 17:37:40,858 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.012e+01 8.266e+01 9.120e+01 9.877e+01 1.232e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-19 17:37:51,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=742200.0, ans=0.125 2023-11-19 17:38:02,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=742266.6666666666, ans=0.125 2023-11-19 17:38:02,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=742266.6666666666, ans=0.125 2023-11-19 17:38:11,332 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 111350 2023-11-19 17:38:18,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=742333.3333333334, ans=0.2 2023-11-19 17:38:26,715 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 3150, loss[loss=0.08665, simple_loss=0.1176, pruned_loss=0.01776, audio_tagging_loss=0.01009, over 15254.00 frames. ], tot_loss[loss=0.08585, simple_loss=0.1046, pruned_loss=0.02289, audio_tagging_loss=0.01065, over 3049528.55 frames. ], batch size: 56, lr: 7.02e-03, grad_scale: 16.0 2023-11-19 17:38:30,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=742400.0, ans=0.125 2023-11-19 17:38:47,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=742466.6666666666, ans=10.0 2023-11-19 17:39:13,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=742600.0, ans=0.125 2023-11-19 17:39:15,718 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 111400 2023-11-19 17:39:22,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=742666.6666666666, ans=0.0 2023-11-19 17:39:27,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=742666.6666666666, ans=0.125 2023-11-19 17:39:32,230 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 3200, loss[loss=0.07617, simple_loss=0.09379, pruned_loss=0.02013, audio_tagging_loss=0.009144, over 15200.00 frames. ], tot_loss[loss=0.086, simple_loss=0.1046, pruned_loss=0.02303, audio_tagging_loss=0.01065, over 3040932.86 frames. ], batch size: 59, lr: 7.02e-03, grad_scale: 32.0 2023-11-19 17:39:34,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=742733.3333333334, ans=0.125 2023-11-19 17:39:41,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=742733.3333333334, ans=0.125 2023-11-19 17:39:50,362 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.737e+01 8.277e+01 9.297e+01 1.012e+02 1.250e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-19 17:40:02,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=742866.6666666666, ans=0.1 2023-11-19 17:40:17,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=742933.3333333334, ans=0.0 2023-11-19 17:40:22,048 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 111450 2023-11-19 17:40:25,061 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.84 vs. limit=15.0 2023-11-19 17:40:25,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=743000.0, ans=0.05 2023-11-19 17:40:28,454 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=3.209e-01 2023-11-19 17:40:29,820 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.67 vs. limit=10.0 2023-11-19 17:40:37,264 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 3250, loss[loss=0.06265, simple_loss=0.0747, pruned_loss=0.01286, audio_tagging_loss=0.01244, over 16142.00 frames. ], tot_loss[loss=0.0858, simple_loss=0.1045, pruned_loss=0.0228, audio_tagging_loss=0.01075, over 3045660.41 frames. ], batch size: 63, lr: 7.02e-03, grad_scale: 32.0 2023-11-19 17:41:12,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=743200.0, ans=0.125 2023-11-19 17:41:25,802 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 111500 2023-11-19 17:41:40,587 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 3300, loss[loss=0.1022, simple_loss=0.1245, pruned_loss=0.03101, audio_tagging_loss=0.008969, over 15604.00 frames. ], tot_loss[loss=0.08514, simple_loss=0.1035, pruned_loss=0.02242, audio_tagging_loss=0.01099, over 3042799.95 frames. ], batch size: 58, lr: 7.02e-03, grad_scale: 16.0 2023-11-19 17:42:00,055 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.519e+01 8.050e+01 8.952e+01 9.862e+01 1.284e+02, threshold=1.790e+02, percent-clipped=0.0 2023-11-19 17:42:15,846 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=743533.3333333334, ans=0.125 2023-11-19 17:42:15,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=743533.3333333334, ans=0.2 2023-11-19 17:42:20,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=743600.0, ans=0.1 2023-11-19 17:42:26,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=743600.0, ans=0.0 2023-11-19 17:42:29,489 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 111550 2023-11-19 17:42:36,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=743666.6666666666, ans=0.0 2023-11-19 17:42:45,501 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 3350, loss[loss=0.07465, simple_loss=0.08454, pruned_loss=0.02324, audio_tagging_loss=0.009139, over 15240.00 frames. ], tot_loss[loss=0.08562, simple_loss=0.1041, pruned_loss=0.02276, audio_tagging_loss=0.01082, over 3040976.64 frames. ], batch size: 58, lr: 7.02e-03, grad_scale: 16.0 2023-11-19 17:42:45,914 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=743733.3333333334, ans=0.2 2023-11-19 17:42:56,473 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.45 vs. limit=10.0 2023-11-19 17:42:57,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=743800.0, ans=0.125 2023-11-19 17:43:33,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=743933.3333333334, ans=0.125 2023-11-19 17:43:33,892 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.57 vs. limit=15.0 2023-11-19 17:43:34,580 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 111600 2023-11-19 17:43:50,273 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 3400, loss[loss=0.09279, simple_loss=0.1131, pruned_loss=0.02842, audio_tagging_loss=0.007839, over 15239.00 frames. ], tot_loss[loss=0.08525, simple_loss=0.1038, pruned_loss=0.02272, audio_tagging_loss=0.01063, over 3041078.14 frames. ], batch size: 58, lr: 7.01e-03, grad_scale: 16.0 2023-11-19 17:43:50,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=744066.6666666666, ans=0.125 2023-11-19 17:43:55,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=744066.6666666666, ans=0.1 2023-11-19 17:44:09,437 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.227e+01 8.409e+01 9.014e+01 1.006e+02 1.399e+02, threshold=1.803e+02, percent-clipped=0.0 2023-11-19 17:44:13,728 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.95 vs. limit=15.0 2023-11-19 17:44:14,745 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=744133.3333333334, ans=0.125 2023-11-19 17:44:19,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=744200.0, ans=0.0 2023-11-19 17:44:27,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=744200.0, ans=0.125 2023-11-19 17:44:30,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=744266.6666666666, ans=0.125 2023-11-19 17:44:33,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=744266.6666666666, ans=0.0 2023-11-19 17:44:39,208 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 111650 2023-11-19 17:44:41,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=744333.3333333334, ans=0.05 2023-11-19 17:44:54,718 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 3450, loss[loss=0.0506, simple_loss=0.06182, pruned_loss=0.009632, audio_tagging_loss=0.01006, over 14243.00 frames. ], tot_loss[loss=0.08515, simple_loss=0.1039, pruned_loss=0.02264, audio_tagging_loss=0.01054, over 3046355.10 frames. ], batch size: 53, lr: 7.01e-03, grad_scale: 16.0 2023-11-19 17:45:13,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=744466.6666666666, ans=0.125 2023-11-19 17:45:21,541 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.02 vs. limit=15.0 2023-11-19 17:45:21,640 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.79 vs. limit=10.0 2023-11-19 17:45:32,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=744600.0, ans=0.0 2023-11-19 17:45:43,779 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 111700 2023-11-19 17:45:59,669 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 3500, loss[loss=0.09654, simple_loss=0.1291, pruned_loss=0.02346, audio_tagging_loss=0.00851, over 15128.00 frames. ], tot_loss[loss=0.08531, simple_loss=0.1044, pruned_loss=0.02259, audio_tagging_loss=0.01054, over 3048711.30 frames. ], batch size: 56, lr: 7.01e-03, grad_scale: 16.0 2023-11-19 17:46:02,907 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.25 vs. limit=10.0 2023-11-19 17:46:14,012 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.73 vs. limit=6.0 2023-11-19 17:46:18,196 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.875e+01 8.242e+01 8.864e+01 9.843e+01 1.271e+02, threshold=1.773e+02, percent-clipped=0.0 2023-11-19 17:46:31,138 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 17:46:46,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=744933.3333333334, ans=0.1 2023-11-19 17:46:48,452 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 111750 2023-11-19 17:46:57,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=745000.0, ans=0.0 2023-11-19 17:47:00,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=745000.0, ans=0.0 2023-11-19 17:47:00,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=745000.0, ans=0.125 2023-11-19 17:47:03,410 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 3550, loss[loss=0.09059, simple_loss=0.1152, pruned_loss=0.02397, audio_tagging_loss=0.009012, over 14719.00 frames. ], tot_loss[loss=0.08506, simple_loss=0.1038, pruned_loss=0.02259, audio_tagging_loss=0.01058, over 3038500.32 frames. ], batch size: 56, lr: 7.01e-03, grad_scale: 16.0 2023-11-19 17:47:07,872 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.83 vs. limit=10.0 2023-11-19 17:47:37,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=745200.0, ans=0.2 2023-11-19 17:47:52,403 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 111800 2023-11-19 17:48:07,834 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 3600, loss[loss=0.0897, simple_loss=0.1159, pruned_loss=0.02163, audio_tagging_loss=0.01012, over 15325.00 frames. ], tot_loss[loss=0.08457, simple_loss=0.1034, pruned_loss=0.02243, audio_tagging_loss=0.01045, over 3043333.68 frames. ], batch size: 59, lr: 7.01e-03, grad_scale: 32.0 2023-11-19 17:48:28,207 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.581e+01 8.234e+01 9.119e+01 9.988e+01 1.352e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-19 17:48:31,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=745466.6666666666, ans=0.1 2023-11-19 17:48:48,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=745600.0, ans=0.0 2023-11-19 17:48:56,760 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 111850 2023-11-19 17:48:59,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=745666.6666666666, ans=0.125 2023-11-19 17:49:05,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=745666.6666666666, ans=0.09899494936611666 2023-11-19 17:49:07,745 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=745666.6666666666, ans=0.2 2023-11-19 17:49:13,514 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 3650, loss[loss=0.07751, simple_loss=0.0859, pruned_loss=0.02584, audio_tagging_loss=0.008723, over 14113.00 frames. ], tot_loss[loss=0.08538, simple_loss=0.1044, pruned_loss=0.02288, audio_tagging_loss=0.01028, over 3047818.44 frames. ], batch size: 53, lr: 7.01e-03, grad_scale: 32.0 2023-11-19 17:49:13,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=745733.3333333334, ans=0.125 2023-11-19 17:49:16,169 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.912e-02 2023-11-19 17:49:27,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=745800.0, ans=0.07 2023-11-19 17:49:32,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=745800.0, ans=0.125 2023-11-19 17:49:42,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=745866.6666666666, ans=0.2 2023-11-19 17:50:02,637 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 111900 2023-11-19 17:50:16,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=746066.6666666666, ans=0.1 2023-11-19 17:50:17,559 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 3700, loss[loss=0.1334, simple_loss=0.1739, pruned_loss=0.03885, audio_tagging_loss=0.007625, over 15381.00 frames. ], tot_loss[loss=0.08567, simple_loss=0.1051, pruned_loss=0.02287, audio_tagging_loss=0.01025, over 3052688.34 frames. ], batch size: 56, lr: 7.01e-03, grad_scale: 32.0 2023-11-19 17:50:25,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=746066.6666666666, ans=0.125 2023-11-19 17:50:29,401 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.20 vs. limit=12.0 2023-11-19 17:50:31,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=746133.3333333334, ans=0.125 2023-11-19 17:50:35,743 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.943e+01 8.609e+01 9.395e+01 1.090e+02 1.567e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-19 17:50:45,244 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.11 vs. limit=15.0 2023-11-19 17:50:45,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=746200.0, ans=0.125 2023-11-19 17:51:05,350 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 111950 2023-11-19 17:51:18,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=746333.3333333334, ans=15.0 2023-11-19 17:51:20,090 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 3750, loss[loss=0.1099, simple_loss=0.1418, pruned_loss=0.0316, audio_tagging_loss=0.007391, over 15531.00 frames. ], tot_loss[loss=0.08631, simple_loss=0.1057, pruned_loss=0.02313, audio_tagging_loss=0.0103, over 3047215.39 frames. ], batch size: 54, lr: 7.00e-03, grad_scale: 32.0 2023-11-19 17:51:24,446 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.83 vs. limit=15.0 2023-11-19 17:51:27,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=746400.0, ans=0.125 2023-11-19 17:51:34,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=746466.6666666666, ans=0.125 2023-11-19 17:51:47,154 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.70 vs. limit=22.5 2023-11-19 17:51:47,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=746533.3333333334, ans=0.125 2023-11-19 17:52:03,703 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 17:52:08,664 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 112000 2023-11-19 17:52:17,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=746666.6666666666, ans=0.125 2023-11-19 17:52:25,159 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.58 vs. limit=15.0 2023-11-19 17:52:28,342 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 3800, loss[loss=0.09852, simple_loss=0.124, pruned_loss=0.02645, audio_tagging_loss=0.01009, over 15977.00 frames. ], tot_loss[loss=0.08535, simple_loss=0.1043, pruned_loss=0.02275, audio_tagging_loss=0.01043, over 3050664.77 frames. ], batch size: 59, lr: 7.00e-03, grad_scale: 32.0 2023-11-19 17:52:43,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=746800.0, ans=0.125 2023-11-19 17:52:46,635 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.263e+01 8.531e+01 9.323e+01 1.047e+02 1.478e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-19 17:52:49,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=746800.0, ans=0.2 2023-11-19 17:53:00,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=746866.6666666666, ans=0.125 2023-11-19 17:53:16,982 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 112050 2023-11-19 17:53:29,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=747000.0, ans=0.1 2023-11-19 17:53:31,504 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 3850, loss[loss=0.07144, simple_loss=0.08822, pruned_loss=0.01483, audio_tagging_loss=0.01249, over 15340.00 frames. ], tot_loss[loss=0.08571, simple_loss=0.1049, pruned_loss=0.02278, audio_tagging_loss=0.01049, over 3051242.81 frames. ], batch size: 60, lr: 7.00e-03, grad_scale: 32.0 2023-11-19 17:53:34,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=747066.6666666666, ans=0.1 2023-11-19 17:53:45,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=747133.3333333334, ans=0.2 2023-11-19 17:54:20,375 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 112100 2023-11-19 17:54:34,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=747400.0, ans=0.0 2023-11-19 17:54:35,289 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 3900, loss[loss=0.08768, simple_loss=0.1001, pruned_loss=0.02149, audio_tagging_loss=0.01614, over 14468.00 frames. ], tot_loss[loss=0.08647, simple_loss=0.106, pruned_loss=0.02298, audio_tagging_loss=0.01046, over 3050381.21 frames. ], batch size: 53, lr: 7.00e-03, grad_scale: 32.0 2023-11-19 17:54:36,951 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 17:54:39,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=747400.0, ans=0.2 2023-11-19 17:54:48,771 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.95 vs. limit=12.0 2023-11-19 17:54:55,654 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.123e+01 8.433e+01 9.481e+01 1.017e+02 1.565e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-19 17:55:06,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=747533.3333333334, ans=0.0 2023-11-19 17:55:18,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=747600.0, ans=0.125 2023-11-19 17:55:19,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=747600.0, ans=0.0 2023-11-19 17:55:24,407 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 112150 2023-11-19 17:55:40,522 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 3950, loss[loss=0.1136, simple_loss=0.14, pruned_loss=0.03441, audio_tagging_loss=0.009143, over 15852.00 frames. ], tot_loss[loss=0.08656, simple_loss=0.1059, pruned_loss=0.02304, audio_tagging_loss=0.01056, over 3052196.60 frames. ], batch size: 57, lr: 7.00e-03, grad_scale: 32.0 2023-11-19 17:55:40,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_na.min_abs, batch_count=747733.3333333334, ans=0.02 2023-11-19 17:55:50,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=747733.3333333334, ans=0.0 2023-11-19 17:55:57,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=747800.0, ans=0.0 2023-11-19 17:56:01,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=747800.0, ans=0.125 2023-11-19 17:56:29,180 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 112200 2023-11-19 17:56:29,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=747933.3333333334, ans=0.125 2023-11-19 17:56:33,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=748000.0, ans=0.1 2023-11-19 17:56:33,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=748000.0, ans=0.0 2023-11-19 17:56:43,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=748066.6666666666, ans=0.125 2023-11-19 17:56:44,797 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 4000, loss[loss=0.07219, simple_loss=0.08326, pruned_loss=0.01751, audio_tagging_loss=0.01305, over 15340.00 frames. ], tot_loss[loss=0.08633, simple_loss=0.1055, pruned_loss=0.02289, audio_tagging_loss=0.01069, over 3045204.18 frames. ], batch size: 58, lr: 7.00e-03, grad_scale: 32.0 2023-11-19 17:56:49,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=748066.6666666666, ans=0.2 2023-11-19 17:56:52,418 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=748066.6666666666, ans=0.2 2023-11-19 17:57:04,930 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.918e+01 8.458e+01 9.188e+01 1.037e+02 1.473e+02, threshold=1.838e+02, percent-clipped=0.0 2023-11-19 17:57:10,683 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.74 vs. limit=6.0 2023-11-19 17:57:21,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=748200.0, ans=0.035 2023-11-19 17:57:33,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=748266.6666666666, ans=0.07 2023-11-19 17:57:34,115 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 112250 2023-11-19 17:57:40,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=748333.3333333334, ans=0.0 2023-11-19 17:57:48,612 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 4050, loss[loss=0.07977, simple_loss=0.1005, pruned_loss=0.01944, audio_tagging_loss=0.01009, over 14735.00 frames. ], tot_loss[loss=0.08659, simple_loss=0.1057, pruned_loss=0.02306, audio_tagging_loss=0.01067, over 3043945.20 frames. ], batch size: 57, lr: 6.99e-03, grad_scale: 32.0 2023-11-19 17:57:51,073 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 17:57:53,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=748400.0, ans=0.125 2023-11-19 17:58:34,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=748600.0, ans=0.1 2023-11-19 17:58:35,364 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.75 vs. limit=15.0 2023-11-19 17:58:37,377 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 112300 2023-11-19 17:58:42,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=748666.6666666666, ans=0.0 2023-11-19 17:58:52,495 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 4100, loss[loss=0.07465, simple_loss=0.08214, pruned_loss=0.02069, audio_tagging_loss=0.01289, over 15157.00 frames. ], tot_loss[loss=0.08643, simple_loss=0.1055, pruned_loss=0.02298, audio_tagging_loss=0.01069, over 3046812.97 frames. ], batch size: 58, lr: 6.99e-03, grad_scale: 32.0 2023-11-19 17:59:13,895 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.713e+01 8.246e+01 9.038e+01 9.964e+01 1.289e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-19 17:59:19,498 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.88 vs. limit=15.0 2023-11-19 17:59:36,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=748933.3333333334, ans=0.5 2023-11-19 17:59:40,371 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.03 vs. limit=6.0 2023-11-19 17:59:41,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=748933.3333333334, ans=0.125 2023-11-19 17:59:42,014 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 112350 2023-11-19 17:59:56,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=749066.6666666666, ans=0.2 2023-11-19 17:59:57,408 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 4150, loss[loss=0.06828, simple_loss=0.08602, pruned_loss=0.0152, audio_tagging_loss=0.01007, over 16145.00 frames. ], tot_loss[loss=0.08647, simple_loss=0.1058, pruned_loss=0.02306, audio_tagging_loss=0.01052, over 3045990.25 frames. ], batch size: 60, lr: 6.99e-03, grad_scale: 32.0 2023-11-19 18:00:14,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=749133.3333333334, ans=0.1 2023-11-19 18:00:24,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=749200.0, ans=0.125 2023-11-19 18:00:25,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=749200.0, ans=0.0 2023-11-19 18:00:38,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=749266.6666666666, ans=0.1 2023-11-19 18:00:38,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=749266.6666666666, ans=0.125 2023-11-19 18:00:38,449 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.72 vs. limit=15.0 2023-11-19 18:00:40,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=749266.6666666666, ans=0.0 2023-11-19 18:00:43,988 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 18:00:45,561 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=749266.6666666666, ans=0.125 2023-11-19 18:00:46,532 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 112400 2023-11-19 18:01:01,632 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.79 vs. limit=15.0 2023-11-19 18:01:01,959 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 4200, loss[loss=0.08715, simple_loss=0.1115, pruned_loss=0.02221, audio_tagging_loss=0.009169, over 15641.00 frames. ], tot_loss[loss=0.08678, simple_loss=0.1065, pruned_loss=0.02319, audio_tagging_loss=0.01032, over 3049095.97 frames. ], batch size: 59, lr: 6.99e-03, grad_scale: 32.0 2023-11-19 18:01:04,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=749400.0, ans=0.125 2023-11-19 18:01:14,961 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.01 vs. limit=15.0 2023-11-19 18:01:23,161 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.124e+01 8.470e+01 8.967e+01 9.932e+01 1.345e+02, threshold=1.793e+02, percent-clipped=0.0 2023-11-19 18:01:50,892 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 112450 2023-11-19 18:02:05,493 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 4250, loss[loss=0.05773, simple_loss=0.06699, pruned_loss=0.01055, audio_tagging_loss=0.01368, over 15103.00 frames. ], tot_loss[loss=0.08634, simple_loss=0.1058, pruned_loss=0.02313, audio_tagging_loss=0.01033, over 3041756.13 frames. ], batch size: 58, lr: 6.99e-03, grad_scale: 16.0 2023-11-19 18:02:30,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=749866.6666666666, ans=0.125 2023-11-19 18:02:37,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=749866.6666666666, ans=0.1 2023-11-19 18:02:42,461 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.18 vs. limit=15.0 2023-11-19 18:02:50,118 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.37 vs. limit=12.0 2023-11-19 18:02:54,490 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 112500 2023-11-19 18:02:57,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=750000.0, ans=0.0 2023-11-19 18:02:59,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=750000.0, ans=0.125 2023-11-19 18:03:08,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=750000.0, ans=0.125 2023-11-19 18:03:10,406 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 4300, loss[loss=0.08869, simple_loss=0.1024, pruned_loss=0.02578, audio_tagging_loss=0.01173, over 14243.00 frames. ], tot_loss[loss=0.08663, simple_loss=0.1061, pruned_loss=0.02328, audio_tagging_loss=0.01028, over 3043286.58 frames. ], batch size: 56, lr: 6.99e-03, grad_scale: 16.0 2023-11-19 18:03:28,848 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.30 vs. limit=15.0 2023-11-19 18:03:31,885 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.174e+01 8.838e+01 9.432e+01 1.009e+02 1.921e+02, threshold=1.886e+02, percent-clipped=1.0 2023-11-19 18:03:33,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=750133.3333333334, ans=0.09899494936611666 2023-11-19 18:03:33,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=750133.3333333334, ans=0.0 2023-11-19 18:03:37,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=750200.0, ans=0.125 2023-11-19 18:03:40,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=750200.0, ans=0.2 2023-11-19 18:03:59,280 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 112550 2023-11-19 18:04:06,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=750333.3333333334, ans=0.125 2023-11-19 18:04:14,900 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 4350, loss[loss=0.0957, simple_loss=0.1153, pruned_loss=0.03042, audio_tagging_loss=0.00763, over 14875.00 frames. ], tot_loss[loss=0.08689, simple_loss=0.1069, pruned_loss=0.0233, audio_tagging_loss=0.01016, over 3044532.40 frames. ], batch size: 55, lr: 6.99e-03, grad_scale: 16.0 2023-11-19 18:04:24,230 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.05 vs. limit=15.0 2023-11-19 18:04:31,551 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.68 vs. limit=6.0 2023-11-19 18:04:33,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=750466.6666666666, ans=0.125 2023-11-19 18:04:46,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=750533.3333333334, ans=0.125 2023-11-19 18:04:49,815 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.12 vs. limit=15.0 2023-11-19 18:05:03,448 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 112600 2023-11-19 18:05:09,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=750666.6666666666, ans=0.125 2023-11-19 18:05:18,580 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 4400, loss[loss=0.1072, simple_loss=0.1374, pruned_loss=0.02688, audio_tagging_loss=0.01164, over 15650.00 frames. ], tot_loss[loss=0.08744, simple_loss=0.1076, pruned_loss=0.02347, audio_tagging_loss=0.01018, over 3039953.23 frames. ], batch size: 56, lr: 6.98e-03, grad_scale: 32.0 2023-11-19 18:05:21,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=750733.3333333334, ans=0.125 2023-11-19 18:05:33,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=750800.0, ans=0.125 2023-11-19 18:05:40,669 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.038e+01 8.205e+01 8.716e+01 9.862e+01 1.282e+02, threshold=1.743e+02, percent-clipped=0.0 2023-11-19 18:06:02,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=750933.3333333334, ans=0.125 2023-11-19 18:06:06,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=750933.3333333334, ans=0.125 2023-11-19 18:06:07,243 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 112650 2023-11-19 18:06:23,246 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 4450, loss[loss=0.0851, simple_loss=0.1053, pruned_loss=0.02078, audio_tagging_loss=0.01168, over 16066.00 frames. ], tot_loss[loss=0.08779, simple_loss=0.1082, pruned_loss=0.02357, audio_tagging_loss=0.01011, over 3047979.31 frames. ], batch size: 59, lr: 6.98e-03, grad_scale: 32.0 2023-11-19 18:06:31,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=751066.6666666666, ans=0.0 2023-11-19 18:06:39,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=751133.3333333334, ans=0.125 2023-11-19 18:07:12,196 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 112700 2023-11-19 18:07:12,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=751266.6666666666, ans=0.0 2023-11-19 18:07:26,860 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 4500, loss[loss=0.07991, simple_loss=0.09974, pruned_loss=0.01768, audio_tagging_loss=0.01237, over 16054.00 frames. ], tot_loss[loss=0.08795, simple_loss=0.1085, pruned_loss=0.02369, audio_tagging_loss=0.009979, over 3055731.93 frames. ], batch size: 58, lr: 6.98e-03, grad_scale: 32.0 2023-11-19 18:07:29,869 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.89 vs. limit=22.5 2023-11-19 18:07:35,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=751400.0, ans=0.2 2023-11-19 18:07:41,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=751466.6666666666, ans=0.125 2023-11-19 18:07:48,352 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.958e+01 8.242e+01 8.889e+01 9.724e+01 1.502e+02, threshold=1.778e+02, percent-clipped=0.0 2023-11-19 18:08:02,693 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.69 vs. limit=12.0 2023-11-19 18:08:05,334 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.97 vs. limit=15.0 2023-11-19 18:08:15,522 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 112750 2023-11-19 18:08:30,949 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 4550, loss[loss=0.08763, simple_loss=0.1018, pruned_loss=0.02397, audio_tagging_loss=0.01275, over 14378.00 frames. ], tot_loss[loss=0.08719, simple_loss=0.1074, pruned_loss=0.02341, audio_tagging_loss=0.01005, over 3050628.94 frames. ], batch size: 55, lr: 6.98e-03, grad_scale: 32.0 2023-11-19 18:08:36,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=751733.3333333334, ans=0.0 2023-11-19 18:08:41,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=751733.3333333334, ans=0.125 2023-11-19 18:08:44,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=751800.0, ans=0.0 2023-11-19 18:08:55,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=751866.6666666666, ans=0.125 2023-11-19 18:09:16,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=751933.3333333334, ans=0.125 2023-11-19 18:09:17,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=751933.3333333334, ans=0.125 2023-11-19 18:09:19,710 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 18:09:19,793 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 112800 2023-11-19 18:09:34,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=752066.6666666666, ans=0.0 2023-11-19 18:09:36,063 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 4600, loss[loss=0.08692, simple_loss=0.1154, pruned_loss=0.01942, audio_tagging_loss=0.009812, over 14937.00 frames. ], tot_loss[loss=0.0859, simple_loss=0.1055, pruned_loss=0.0229, audio_tagging_loss=0.01026, over 3052337.81 frames. ], batch size: 56, lr: 6.98e-03, grad_scale: 32.0 2023-11-19 18:09:52,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=752133.3333333334, ans=0.125 2023-11-19 18:09:56,965 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.492e+01 8.205e+01 8.855e+01 9.599e+01 1.553e+02, threshold=1.771e+02, percent-clipped=0.0 2023-11-19 18:10:07,234 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.03 vs. limit=6.0 2023-11-19 18:10:24,893 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 112850 2023-11-19 18:10:30,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=752333.3333333334, ans=0.0 2023-11-19 18:10:36,524 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.39 vs. limit=15.0 2023-11-19 18:10:38,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=752400.0, ans=0.1 2023-11-19 18:10:39,526 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 4650, loss[loss=0.06508, simple_loss=0.07652, pruned_loss=0.0164, audio_tagging_loss=0.01042, over 14943.00 frames. ], tot_loss[loss=0.08595, simple_loss=0.1056, pruned_loss=0.02284, audio_tagging_loss=0.0103, over 3048503.56 frames. ], batch size: 58, lr: 6.98e-03, grad_scale: 16.0 2023-11-19 18:10:52,679 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.59 vs. limit=22.5 2023-11-19 18:11:05,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=752533.3333333334, ans=0.1 2023-11-19 18:11:11,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=752533.3333333334, ans=0.07 2023-11-19 18:11:23,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=752600.0, ans=0.0 2023-11-19 18:11:28,744 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 112900 2023-11-19 18:11:30,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=752666.6666666666, ans=0.125 2023-11-19 18:11:43,072 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 4700, loss[loss=0.0906, simple_loss=0.1147, pruned_loss=0.02422, audio_tagging_loss=0.009033, over 14916.00 frames. ], tot_loss[loss=0.0859, simple_loss=0.1051, pruned_loss=0.02286, audio_tagging_loss=0.0105, over 3046017.31 frames. ], batch size: 54, lr: 6.97e-03, grad_scale: 16.0 2023-11-19 18:11:54,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=752733.3333333334, ans=0.125 2023-11-19 18:12:06,753 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.467e+01 8.509e+01 9.287e+01 1.024e+02 1.353e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-19 18:12:11,448 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.33 vs. limit=6.0 2023-11-19 18:12:13,799 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.36 vs. limit=10.0 2023-11-19 18:12:19,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=752866.6666666666, ans=0.025 2023-11-19 18:12:31,872 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 112950 2023-11-19 18:12:40,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=753000.0, ans=0.125 2023-11-19 18:12:49,179 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 4750, loss[loss=0.1028, simple_loss=0.1335, pruned_loss=0.02383, audio_tagging_loss=0.01218, over 15326.00 frames. ], tot_loss[loss=0.08612, simple_loss=0.1056, pruned_loss=0.02273, audio_tagging_loss=0.01057, over 3055259.51 frames. ], batch size: 55, lr: 6.97e-03, grad_scale: 16.0 2023-11-19 18:12:53,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=753066.6666666666, ans=0.125 2023-11-19 18:13:15,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=753200.0, ans=0.1 2023-11-19 18:13:17,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=753200.0, ans=0.125 2023-11-19 18:13:29,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=753266.6666666666, ans=0.125 2023-11-19 18:13:29,588 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.12 vs. limit=12.0 2023-11-19 18:13:37,922 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 113000 2023-11-19 18:13:53,001 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 4800, loss[loss=0.07876, simple_loss=0.09762, pruned_loss=0.01898, audio_tagging_loss=0.01097, over 14893.00 frames. ], tot_loss[loss=0.08686, simple_loss=0.1059, pruned_loss=0.02316, audio_tagging_loss=0.01076, over 3060176.52 frames. ], batch size: 56, lr: 6.97e-03, grad_scale: 32.0 2023-11-19 18:13:55,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=753400.0, ans=0.2 2023-11-19 18:14:15,227 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.124e+01 8.505e+01 9.415e+01 1.014e+02 1.501e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-19 18:14:23,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=753533.3333333334, ans=0.125 2023-11-19 18:14:36,259 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.71 vs. limit=15.0 2023-11-19 18:14:38,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=753600.0, ans=0.2 2023-11-19 18:14:41,524 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 113050 2023-11-19 18:14:55,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=753733.3333333334, ans=0.05 2023-11-19 18:14:56,000 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 4850, loss[loss=0.1142, simple_loss=0.1399, pruned_loss=0.03581, audio_tagging_loss=0.00844, over 15361.00 frames. ], tot_loss[loss=0.08595, simple_loss=0.1048, pruned_loss=0.02271, audio_tagging_loss=0.01083, over 3052164.00 frames. ], batch size: 57, lr: 6.97e-03, grad_scale: 32.0 2023-11-19 18:14:58,995 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.93 vs. limit=6.0 2023-11-19 18:15:01,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=753733.3333333334, ans=0.125 2023-11-19 18:15:05,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=753733.3333333334, ans=0.07 2023-11-19 18:15:18,258 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.38 vs. limit=22.5 2023-11-19 18:15:18,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=753800.0, ans=10.0 2023-11-19 18:15:22,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=753866.6666666666, ans=0.125 2023-11-19 18:15:44,951 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 113100 2023-11-19 18:15:50,347 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.79 vs. limit=15.0 2023-11-19 18:15:56,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=754000.0, ans=0.2 2023-11-19 18:16:01,394 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 4900, loss[loss=0.09388, simple_loss=0.1135, pruned_loss=0.0285, audio_tagging_loss=0.008615, over 15500.00 frames. ], tot_loss[loss=0.08653, simple_loss=0.1058, pruned_loss=0.02291, audio_tagging_loss=0.01073, over 3055020.56 frames. ], batch size: 58, lr: 6.97e-03, grad_scale: 32.0 2023-11-19 18:16:15,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=754133.3333333334, ans=0.5 2023-11-19 18:16:23,310 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.624e+01 8.196e+01 8.687e+01 9.230e+01 1.120e+02, threshold=1.737e+02, percent-clipped=0.0 2023-11-19 18:16:23,944 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.30 vs. limit=15.0 2023-11-19 18:16:24,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=754200.0, ans=0.2 2023-11-19 18:16:40,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=754266.6666666666, ans=0.07 2023-11-19 18:16:49,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=754266.6666666666, ans=0.0 2023-11-19 18:16:49,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=754266.6666666666, ans=0.0 2023-11-19 18:16:49,906 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 113150 2023-11-19 18:17:04,365 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 4950, loss[loss=0.07527, simple_loss=0.08711, pruned_loss=0.02035, audio_tagging_loss=0.01136, over 15047.00 frames. ], tot_loss[loss=0.08522, simple_loss=0.1043, pruned_loss=0.02256, audio_tagging_loss=0.01053, over 3044821.42 frames. ], batch size: 56, lr: 6.97e-03, grad_scale: 32.0 2023-11-19 18:17:11,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=754400.0, ans=0.0 2023-11-19 18:17:28,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=754533.3333333334, ans=0.2 2023-11-19 18:17:52,773 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 113200 2023-11-19 18:17:58,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=754666.6666666666, ans=0.1 2023-11-19 18:18:07,778 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 5000, loss[loss=0.09801, simple_loss=0.115, pruned_loss=0.02453, audio_tagging_loss=0.01597, over 15541.00 frames. ], tot_loss[loss=0.08487, simple_loss=0.104, pruned_loss=0.02244, audio_tagging_loss=0.01044, over 3044766.79 frames. ], batch size: 56, lr: 6.97e-03, grad_scale: 32.0 2023-11-19 18:18:08,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=754733.3333333334, ans=0.2 2023-11-19 18:18:10,470 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 18:18:23,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=754800.0, ans=0.0 2023-11-19 18:18:31,439 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.773e+01 8.065e+01 8.852e+01 9.668e+01 1.212e+02, threshold=1.770e+02, percent-clipped=0.0 2023-11-19 18:18:46,406 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=754933.3333333334, ans=0.5 2023-11-19 18:18:55,830 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 113250 2023-11-19 18:19:11,134 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 5050, loss[loss=0.1142, simple_loss=0.1414, pruned_loss=0.03357, audio_tagging_loss=0.009928, over 15557.00 frames. ], tot_loss[loss=0.08401, simple_loss=0.1028, pruned_loss=0.02223, audio_tagging_loss=0.01039, over 3045781.39 frames. ], batch size: 57, lr: 6.96e-03, grad_scale: 32.0 2023-11-19 18:19:14,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=755066.6666666666, ans=0.125 2023-11-19 18:19:21,048 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.09 vs. limit=15.0 2023-11-19 18:19:28,403 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.48 vs. limit=15.0 2023-11-19 18:19:30,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=755133.3333333334, ans=0.125 2023-11-19 18:19:41,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=755200.0, ans=0.0 2023-11-19 18:19:41,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=755200.0, ans=0.125 2023-11-19 18:19:56,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=755266.6666666666, ans=0.125 2023-11-19 18:19:59,442 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 113300 2023-11-19 18:20:05,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=755333.3333333334, ans=0.0 2023-11-19 18:20:09,083 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.34 vs. limit=15.0 2023-11-19 18:20:15,875 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 5100, loss[loss=0.08772, simple_loss=0.09856, pruned_loss=0.0265, audio_tagging_loss=0.01194, over 14824.00 frames. ], tot_loss[loss=0.08289, simple_loss=0.1016, pruned_loss=0.0217, audio_tagging_loss=0.01037, over 3039710.61 frames. ], batch size: 55, lr: 6.96e-03, grad_scale: 32.0 2023-11-19 18:20:25,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=755400.0, ans=0.025 2023-11-19 18:20:30,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=755466.6666666666, ans=0.0 2023-11-19 18:20:38,098 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.347e+01 7.917e+01 8.813e+01 9.876e+01 1.323e+02, threshold=1.763e+02, percent-clipped=0.0 2023-11-19 18:20:44,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=755533.3333333334, ans=0.0 2023-11-19 18:20:51,123 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.59 vs. limit=12.0 2023-11-19 18:20:51,235 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.89 vs. limit=15.0 2023-11-19 18:21:04,689 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 113350 2023-11-19 18:21:18,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=755733.3333333334, ans=0.125 2023-11-19 18:21:18,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=755733.3333333334, ans=0.0 2023-11-19 18:21:19,249 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 5150, loss[loss=0.08562, simple_loss=0.1061, pruned_loss=0.02153, audio_tagging_loss=0.01106, over 14884.00 frames. ], tot_loss[loss=0.08331, simple_loss=0.1021, pruned_loss=0.02186, audio_tagging_loss=0.01041, over 3034566.25 frames. ], batch size: 58, lr: 6.96e-03, grad_scale: 32.0 2023-11-19 18:21:27,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=755733.3333333334, ans=0.04949747468305833 2023-11-19 18:21:40,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=755800.0, ans=0.0 2023-11-19 18:21:56,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=755933.3333333334, ans=0.1 2023-11-19 18:22:07,689 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 113400 2023-11-19 18:22:11,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=756000.0, ans=0.0 2023-11-19 18:22:14,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=756000.0, ans=0.2 2023-11-19 18:22:15,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=756000.0, ans=0.125 2023-11-19 18:22:19,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=756000.0, ans=0.125 2023-11-19 18:22:22,826 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 5200, loss[loss=0.1008, simple_loss=0.1283, pruned_loss=0.0285, audio_tagging_loss=0.008175, over 14354.00 frames. ], tot_loss[loss=0.08442, simple_loss=0.1038, pruned_loss=0.02224, audio_tagging_loss=0.01028, over 3031128.33 frames. ], batch size: 52, lr: 6.96e-03, grad_scale: 32.0 2023-11-19 18:22:34,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=756066.6666666666, ans=0.0 2023-11-19 18:22:36,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=756133.3333333334, ans=0.1 2023-11-19 18:22:43,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=756133.3333333334, ans=0.0 2023-11-19 18:22:46,630 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.196e+01 8.495e+01 9.298e+01 1.017e+02 1.203e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-19 18:23:11,165 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 113450 2023-11-19 18:23:23,548 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.63 vs. limit=6.0 2023-11-19 18:23:27,301 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 5250, loss[loss=0.1009, simple_loss=0.1283, pruned_loss=0.02603, audio_tagging_loss=0.01071, over 15974.00 frames. ], tot_loss[loss=0.08561, simple_loss=0.1053, pruned_loss=0.02282, audio_tagging_loss=0.01014, over 3032278.42 frames. ], batch size: 58, lr: 6.96e-03, grad_scale: 32.0 2023-11-19 18:23:33,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=756400.0, ans=0.125 2023-11-19 18:23:47,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=756466.6666666666, ans=0.0 2023-11-19 18:24:04,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=756600.0, ans=0.125 2023-11-19 18:24:14,548 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 113500 2023-11-19 18:24:16,443 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.22 vs. limit=15.0 2023-11-19 18:24:22,635 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=756666.6666666666, ans=0.125 2023-11-19 18:24:29,879 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 5300, loss[loss=0.06822, simple_loss=0.08423, pruned_loss=0.01753, audio_tagging_loss=0.008582, over 15059.00 frames. ], tot_loss[loss=0.08553, simple_loss=0.1051, pruned_loss=0.02281, audio_tagging_loss=0.01017, over 3035518.46 frames. ], batch size: 59, lr: 6.96e-03, grad_scale: 32.0 2023-11-19 18:24:45,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=756800.0, ans=0.5 2023-11-19 18:24:53,116 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.177e+01 8.322e+01 9.046e+01 9.978e+01 1.366e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-19 18:25:19,305 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 113550 2023-11-19 18:25:19,810 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.57 vs. limit=15.0 2023-11-19 18:25:22,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=757000.0, ans=0.125 2023-11-19 18:25:24,927 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.84 vs. limit=15.0 2023-11-19 18:25:29,835 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.05 vs. limit=15.0 2023-11-19 18:25:33,920 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 5350, loss[loss=0.06584, simple_loss=0.08427, pruned_loss=0.01299, audio_tagging_loss=0.01071, over 14839.00 frames. ], tot_loss[loss=0.08622, simple_loss=0.1059, pruned_loss=0.0231, audio_tagging_loss=0.01018, over 3037829.04 frames. ], batch size: 54, lr: 6.95e-03, grad_scale: 32.0 2023-11-19 18:25:34,802 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.12 vs. limit=6.0 2023-11-19 18:25:44,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=757066.6666666666, ans=0.0 2023-11-19 18:25:46,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=757133.3333333334, ans=0.125 2023-11-19 18:25:50,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=757133.3333333334, ans=0.0 2023-11-19 18:26:09,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=757200.0, ans=0.125 2023-11-19 18:26:11,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=757266.6666666666, ans=0.0 2023-11-19 18:26:16,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=757266.6666666666, ans=0.0 2023-11-19 18:26:22,786 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 113600 2023-11-19 18:26:24,446 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.37 vs. limit=15.0 2023-11-19 18:26:39,006 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 5400, loss[loss=0.07683, simple_loss=0.09374, pruned_loss=0.01788, audio_tagging_loss=0.01208, over 16266.00 frames. ], tot_loss[loss=0.08636, simple_loss=0.1061, pruned_loss=0.02308, audio_tagging_loss=0.01025, over 3040267.17 frames. ], batch size: 61, lr: 6.95e-03, grad_scale: 32.0 2023-11-19 18:27:02,054 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.928e+01 8.152e+01 8.655e+01 9.837e+01 1.259e+02, threshold=1.731e+02, percent-clipped=0.0 2023-11-19 18:27:02,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=757466.6666666666, ans=0.2 2023-11-19 18:27:04,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=757533.3333333334, ans=0.125 2023-11-19 18:27:15,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=757600.0, ans=0.0 2023-11-19 18:27:23,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=757600.0, ans=0.125 2023-11-19 18:27:27,980 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 113650 2023-11-19 18:27:34,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=757666.6666666666, ans=0.0 2023-11-19 18:27:43,252 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 5450, loss[loss=0.09386, simple_loss=0.1172, pruned_loss=0.02255, audio_tagging_loss=0.01272, over 14686.00 frames. ], tot_loss[loss=0.08696, simple_loss=0.1066, pruned_loss=0.02334, audio_tagging_loss=0.01034, over 3032299.12 frames. ], batch size: 54, lr: 6.95e-03, grad_scale: 32.0 2023-11-19 18:27:47,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=757733.3333333334, ans=0.125 2023-11-19 18:27:50,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=757733.3333333334, ans=0.125 2023-11-19 18:27:54,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=757800.0, ans=0.125 2023-11-19 18:28:13,036 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.83 vs. limit=5.0 2023-11-19 18:28:31,998 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 113700 2023-11-19 18:28:46,394 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 5500, loss[loss=0.07084, simple_loss=0.08644, pruned_loss=0.01543, audio_tagging_loss=0.0122, over 14478.00 frames. ], tot_loss[loss=0.08675, simple_loss=0.1065, pruned_loss=0.02324, audio_tagging_loss=0.01028, over 3040348.18 frames. ], batch size: 54, lr: 6.95e-03, grad_scale: 16.0 2023-11-19 18:28:51,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=758066.6666666666, ans=0.125 2023-11-19 18:29:00,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=758133.3333333334, ans=0.125 2023-11-19 18:29:02,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=758133.3333333334, ans=0.125 2023-11-19 18:29:03,429 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.18 vs. limit=12.0 2023-11-19 18:29:10,970 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.752e+01 8.267e+01 8.902e+01 9.734e+01 1.914e+02, threshold=1.780e+02, percent-clipped=1.0 2023-11-19 18:29:26,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=758266.6666666666, ans=0.1 2023-11-19 18:29:28,152 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.82 vs. limit=10.0 2023-11-19 18:29:34,936 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 113750 2023-11-19 18:29:51,179 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 5550, loss[loss=0.09656, simple_loss=0.1079, pruned_loss=0.03146, audio_tagging_loss=0.01117, over 16035.00 frames. ], tot_loss[loss=0.08726, simple_loss=0.1071, pruned_loss=0.02338, audio_tagging_loss=0.01035, over 3042940.65 frames. ], batch size: 63, lr: 6.95e-03, grad_scale: 16.0 2023-11-19 18:29:51,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=758400.0, ans=0.0 2023-11-19 18:30:02,552 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.62 vs. limit=22.5 2023-11-19 18:30:05,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=758466.6666666666, ans=0.1 2023-11-19 18:30:11,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=758466.6666666666, ans=0.125 2023-11-19 18:30:13,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=758466.6666666666, ans=0.2 2023-11-19 18:30:17,666 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.04 vs. limit=15.0 2023-11-19 18:30:39,366 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 113800 2023-11-19 18:30:49,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=758666.6666666666, ans=0.2 2023-11-19 18:30:54,394 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 5600, loss[loss=0.07212, simple_loss=0.08751, pruned_loss=0.019, audio_tagging_loss=0.009369, over 14635.00 frames. ], tot_loss[loss=0.08672, simple_loss=0.1062, pruned_loss=0.02309, audio_tagging_loss=0.01054, over 3044690.08 frames. ], batch size: 56, lr: 6.95e-03, grad_scale: 32.0 2023-11-19 18:31:17,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=758800.0, ans=0.1 2023-11-19 18:31:18,678 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.413e+01 8.485e+01 9.378e+01 1.023e+02 2.129e+02, threshold=1.876e+02, percent-clipped=1.0 2023-11-19 18:31:25,509 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.78 vs. limit=15.0 2023-11-19 18:31:29,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=758866.6666666666, ans=0.0 2023-11-19 18:31:39,014 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 18:31:41,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=758933.3333333334, ans=0.125 2023-11-19 18:31:43,982 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 113850 2023-11-19 18:31:56,446 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.32 vs. limit=15.0 2023-11-19 18:31:59,335 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 5650, loss[loss=0.06696, simple_loss=0.07397, pruned_loss=0.01802, audio_tagging_loss=0.01195, over 14624.00 frames. ], tot_loss[loss=0.08561, simple_loss=0.1046, pruned_loss=0.02269, audio_tagging_loss=0.01064, over 3049105.93 frames. ], batch size: 58, lr: 6.95e-03, grad_scale: 32.0 2023-11-19 18:32:19,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=759133.3333333334, ans=0.125 2023-11-19 18:32:44,022 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.39 vs. limit=15.0 2023-11-19 18:32:48,341 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 113900 2023-11-19 18:32:51,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=759333.3333333334, ans=0.125 2023-11-19 18:33:00,078 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=759333.3333333334, ans=0.0 2023-11-19 18:33:04,049 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 5700, loss[loss=0.1019, simple_loss=0.1277, pruned_loss=0.02722, audio_tagging_loss=0.01085, over 15011.00 frames. ], tot_loss[loss=0.08543, simple_loss=0.1045, pruned_loss=0.02253, audio_tagging_loss=0.01064, over 3051711.36 frames. ], batch size: 54, lr: 6.94e-03, grad_scale: 16.0 2023-11-19 18:33:26,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=759466.6666666666, ans=0.125 2023-11-19 18:33:29,968 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.506e+01 8.509e+01 9.324e+01 1.031e+02 1.317e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-19 18:33:32,162 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.78 vs. limit=15.0 2023-11-19 18:33:39,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=759533.3333333334, ans=0.125 2023-11-19 18:33:50,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=759600.0, ans=0.125 2023-11-19 18:33:53,951 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 113950 2023-11-19 18:34:08,414 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 5750, loss[loss=0.1028, simple_loss=0.1282, pruned_loss=0.02787, audio_tagging_loss=0.01088, over 15842.00 frames. ], tot_loss[loss=0.08502, simple_loss=0.104, pruned_loss=0.02243, audio_tagging_loss=0.01058, over 3047475.96 frames. ], batch size: 61, lr: 6.94e-03, grad_scale: 16.0 2023-11-19 18:34:08,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=759733.3333333334, ans=0.0 2023-11-19 18:34:14,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=759733.3333333334, ans=0.0 2023-11-19 18:34:25,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=759800.0, ans=0.125 2023-11-19 18:34:33,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=759866.6666666666, ans=0.1 2023-11-19 18:34:52,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=759933.3333333334, ans=0.0 2023-11-19 18:34:57,384 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 114000 2023-11-19 18:35:12,672 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 5800, loss[loss=0.09054, simple_loss=0.1084, pruned_loss=0.02611, audio_tagging_loss=0.01021, over 15374.00 frames. ], tot_loss[loss=0.08469, simple_loss=0.1038, pruned_loss=0.02228, audio_tagging_loss=0.0105, over 3046728.90 frames. ], batch size: 56, lr: 6.94e-03, grad_scale: 16.0 2023-11-19 18:35:38,825 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.740e+01 8.284e+01 9.012e+01 9.674e+01 1.297e+02, threshold=1.802e+02, percent-clipped=0.0 2023-11-19 18:35:41,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=760200.0, ans=0.0 2023-11-19 18:35:42,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=760200.0, ans=0.2 2023-11-19 18:35:56,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=760266.6666666666, ans=0.2 2023-11-19 18:35:58,432 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.43 vs. limit=15.0 2023-11-19 18:36:02,341 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 114050 2023-11-19 18:36:06,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=760333.3333333334, ans=0.5 2023-11-19 18:36:07,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=760333.3333333334, ans=0.125 2023-11-19 18:36:13,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=760333.3333333334, ans=0.1 2023-11-19 18:36:17,848 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 5850, loss[loss=0.09208, simple_loss=0.1182, pruned_loss=0.02559, audio_tagging_loss=0.0074, over 14411.00 frames. ], tot_loss[loss=0.08501, simple_loss=0.1042, pruned_loss=0.02253, audio_tagging_loss=0.01038, over 3054987.96 frames. ], batch size: 57, lr: 6.94e-03, grad_scale: 16.0 2023-11-19 18:36:30,198 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.07 vs. limit=15.0 2023-11-19 18:36:33,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=760466.6666666666, ans=0.0 2023-11-19 18:36:45,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=760533.3333333334, ans=0.0 2023-11-19 18:37:07,058 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 114100 2023-11-19 18:37:16,509 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.08 vs. limit=6.0 2023-11-19 18:37:20,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=760666.6666666666, ans=0.1 2023-11-19 18:37:22,286 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 5900, loss[loss=0.06439, simple_loss=0.07707, pruned_loss=0.01363, audio_tagging_loss=0.01223, over 14580.00 frames. ], tot_loss[loss=0.08435, simple_loss=0.1033, pruned_loss=0.02228, audio_tagging_loss=0.01043, over 3053954.84 frames. ], batch size: 60, lr: 6.94e-03, grad_scale: 16.0 2023-11-19 18:37:29,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=760733.3333333334, ans=0.125 2023-11-19 18:37:46,872 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.13 vs. limit=22.5 2023-11-19 18:37:47,441 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.904e+01 8.348e+01 9.268e+01 1.091e+02 1.395e+02, threshold=1.854e+02, percent-clipped=0.0 2023-11-19 18:37:49,471 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.33 vs. limit=15.0 2023-11-19 18:37:59,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=760866.6666666666, ans=0.125 2023-11-19 18:38:08,627 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.62 vs. limit=22.5 2023-11-19 18:38:11,874 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 114150 2023-11-19 18:38:15,084 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.94 vs. limit=6.0 2023-11-19 18:38:20,845 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 18:38:26,627 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 5950, loss[loss=0.0937, simple_loss=0.1214, pruned_loss=0.02604, audio_tagging_loss=0.006973, over 15481.00 frames. ], tot_loss[loss=0.08433, simple_loss=0.1034, pruned_loss=0.02228, audio_tagging_loss=0.01036, over 3061868.06 frames. ], batch size: 56, lr: 6.94e-03, grad_scale: 16.0 2023-11-19 18:38:34,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=761066.6666666666, ans=0.1 2023-11-19 18:38:35,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=761066.6666666666, ans=0.1 2023-11-19 18:38:46,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=761133.3333333334, ans=0.1 2023-11-19 18:38:49,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=761133.3333333334, ans=0.125 2023-11-19 18:38:49,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=761133.3333333334, ans=0.125 2023-11-19 18:38:50,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=761133.3333333334, ans=0.0 2023-11-19 18:39:15,895 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 114200 2023-11-19 18:39:31,900 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 6000, loss[loss=0.08334, simple_loss=0.1049, pruned_loss=0.02103, audio_tagging_loss=0.009867, over 15737.00 frames. ], tot_loss[loss=0.08448, simple_loss=0.1038, pruned_loss=0.02233, audio_tagging_loss=0.01025, over 3065177.70 frames. ], batch size: 60, lr: 6.93e-03, grad_scale: 32.0 2023-11-19 18:39:31,900 INFO [train_asr.py:1285] (1/4) Computing validation loss 2023-11-19 18:39:59,377 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.6848, 0.2192, 2.8902, 3.0506, 2.6829, 2.6369, 2.9294, 2.7148], device='cuda:1') 2023-11-19 18:40:08,323 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([3.9823, 3.1114, 2.7701, 3.0159, 3.4105, 2.7315, 3.3457, 2.8275], device='cuda:1') 2023-11-19 18:40:10,734 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.1330, 4.9535, 3.4341, 4.0618], device='cuda:1') 2023-11-19 18:40:12,635 INFO [train_asr.py:1294] (1/4) Epoch 10, validation: loss=0.06357, simple_loss=0.05534, pruned_loss=0.006382, audio_tagging_loss=0.02952, over 4681554.00 frames. 2023-11-19 18:40:12,635 INFO [train_asr.py:1295] (1/4) Maximum memory allocated so far is 26082MB 2023-11-19 18:40:22,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=761400.0, ans=0.125 2023-11-19 18:40:22,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=761400.0, ans=0.125 2023-11-19 18:40:23,439 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.98 vs. limit=12.0 2023-11-19 18:40:32,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=761466.6666666666, ans=0.125 2023-11-19 18:40:35,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=761466.6666666666, ans=0.125 2023-11-19 18:40:38,558 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.626e+01 8.242e+01 9.055e+01 9.883e+01 1.211e+02, threshold=1.811e+02, percent-clipped=0.0 2023-11-19 18:40:38,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=761533.3333333334, ans=0.125 2023-11-19 18:40:56,379 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.81 vs. limit=22.5 2023-11-19 18:40:58,302 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 18:40:59,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=761600.0, ans=0.2 2023-11-19 18:41:02,026 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 114250 2023-11-19 18:41:17,059 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 6050, loss[loss=0.08683, simple_loss=0.1049, pruned_loss=0.02567, audio_tagging_loss=0.008693, over 15245.00 frames. ], tot_loss[loss=0.08452, simple_loss=0.1038, pruned_loss=0.02239, audio_tagging_loss=0.01023, over 3064444.74 frames. ], batch size: 56, lr: 6.93e-03, grad_scale: 32.0 2023-11-19 18:41:23,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=761733.3333333334, ans=0.0 2023-11-19 18:41:24,795 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 18:41:31,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=761800.0, ans=0.05 2023-11-19 18:41:35,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=761800.0, ans=0.125 2023-11-19 18:41:38,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=761800.0, ans=0.125 2023-11-19 18:41:39,561 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=761800.0, ans=0.125 2023-11-19 18:41:57,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=761933.3333333334, ans=0.2 2023-11-19 18:42:06,887 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 114300 2023-11-19 18:42:14,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=762000.0, ans=0.0 2023-11-19 18:42:23,026 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 6100, loss[loss=0.06539, simple_loss=0.06351, pruned_loss=0.01865, audio_tagging_loss=0.01499, over 14185.00 frames. ], tot_loss[loss=0.08491, simple_loss=0.1041, pruned_loss=0.02256, audio_tagging_loss=0.0103, over 3057134.77 frames. ], batch size: 55, lr: 6.93e-03, grad_scale: 32.0 2023-11-19 18:42:32,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=762066.6666666666, ans=0.1 2023-11-19 18:42:34,652 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.70 vs. limit=15.0 2023-11-19 18:42:48,708 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.974e+01 8.508e+01 9.449e+01 1.032e+02 1.447e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-19 18:43:13,165 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 114350 2023-11-19 18:43:28,439 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 6150, loss[loss=0.09202, simple_loss=0.1142, pruned_loss=0.02574, audio_tagging_loss=0.00917, over 15404.00 frames. ], tot_loss[loss=0.08569, simple_loss=0.105, pruned_loss=0.02287, audio_tagging_loss=0.01034, over 3050178.65 frames. ], batch size: 58, lr: 6.93e-03, grad_scale: 32.0 2023-11-19 18:43:50,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=762466.6666666666, ans=0.0 2023-11-19 18:43:58,579 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.75 vs. limit=15.0 2023-11-19 18:44:03,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=762533.3333333334, ans=0.0 2023-11-19 18:44:03,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=762533.3333333334, ans=0.125 2023-11-19 18:44:05,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=762533.3333333334, ans=0.0 2023-11-19 18:44:18,108 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 114400 2023-11-19 18:44:24,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=762666.6666666666, ans=0.125 2023-11-19 18:44:29,154 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.64 vs. limit=12.0 2023-11-19 18:44:33,329 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 6200, loss[loss=0.08927, simple_loss=0.104, pruned_loss=0.02469, audio_tagging_loss=0.01256, over 16050.00 frames. ], tot_loss[loss=0.08576, simple_loss=0.1051, pruned_loss=0.0228, audio_tagging_loss=0.01041, over 3046215.14 frames. ], batch size: 62, lr: 6.93e-03, grad_scale: 32.0 2023-11-19 18:44:54,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=762800.0, ans=0.125 2023-11-19 18:44:54,711 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.79 vs. limit=15.0 2023-11-19 18:45:00,749 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.911e+01 8.346e+01 9.010e+01 9.734e+01 1.303e+02, threshold=1.802e+02, percent-clipped=0.0 2023-11-19 18:45:22,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=762933.3333333334, ans=0.125 2023-11-19 18:45:23,153 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 114450 2023-11-19 18:45:39,186 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 6250, loss[loss=0.08475, simple_loss=0.1036, pruned_loss=0.0222, audio_tagging_loss=0.01074, over 14427.00 frames. ], tot_loss[loss=0.08625, simple_loss=0.1057, pruned_loss=0.02299, audio_tagging_loss=0.0104, over 3054027.43 frames. ], batch size: 56, lr: 6.93e-03, grad_scale: 32.0 2023-11-19 18:45:50,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=763066.6666666666, ans=0.2 2023-11-19 18:46:14,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=763200.0, ans=0.2 2023-11-19 18:46:21,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=763266.6666666666, ans=0.1 2023-11-19 18:46:24,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=763266.6666666666, ans=0.125 2023-11-19 18:46:28,950 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 114500 2023-11-19 18:46:42,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=763333.3333333334, ans=0.1 2023-11-19 18:46:45,253 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 6300, loss[loss=0.08754, simple_loss=0.1043, pruned_loss=0.02296, audio_tagging_loss=0.01243, over 14449.00 frames. ], tot_loss[loss=0.08674, simple_loss=0.106, pruned_loss=0.02322, audio_tagging_loss=0.01053, over 3055418.99 frames. ], batch size: 54, lr: 6.93e-03, grad_scale: 32.0 2023-11-19 18:47:09,917 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.137e+01 8.385e+01 9.179e+01 1.044e+02 1.360e+02, threshold=1.836e+02, percent-clipped=0.0 2023-11-19 18:47:13,971 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 18:47:25,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=763600.0, ans=0.125 2023-11-19 18:47:29,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=763600.0, ans=0.125 2023-11-19 18:47:34,913 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 114550 2023-11-19 18:47:42,898 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.02 vs. limit=15.0 2023-11-19 18:47:43,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=763666.6666666666, ans=0.0 2023-11-19 18:47:49,853 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 6350, loss[loss=0.09461, simple_loss=0.1102, pruned_loss=0.02943, audio_tagging_loss=0.01009, over 14898.00 frames. ], tot_loss[loss=0.08661, simple_loss=0.106, pruned_loss=0.02302, audio_tagging_loss=0.01058, over 3052064.09 frames. ], batch size: 56, lr: 6.92e-03, grad_scale: 32.0 2023-11-19 18:48:01,588 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.50 vs. limit=15.0 2023-11-19 18:48:12,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=763800.0, ans=0.0 2023-11-19 18:48:24,036 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 18:48:33,319 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.81 vs. limit=15.0 2023-11-19 18:48:38,865 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 114600 2023-11-19 18:48:40,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=764000.0, ans=0.2 2023-11-19 18:48:48,245 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 18:48:54,862 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 6400, loss[loss=0.09301, simple_loss=0.1092, pruned_loss=0.02949, audio_tagging_loss=0.008913, over 14869.00 frames. ], tot_loss[loss=0.08564, simple_loss=0.1045, pruned_loss=0.02269, audio_tagging_loss=0.01069, over 3048603.18 frames. ], batch size: 56, lr: 6.92e-03, grad_scale: 32.0 2023-11-19 18:49:18,755 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.88 vs. limit=6.0 2023-11-19 18:49:21,745 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.802e+01 8.099e+01 8.680e+01 9.158e+01 1.578e+02, threshold=1.736e+02, percent-clipped=0.0 2023-11-19 18:49:30,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=764200.0, ans=0.1 2023-11-19 18:49:41,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=764266.6666666666, ans=0.125 2023-11-19 18:49:44,823 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 114650 2023-11-19 18:49:50,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=764333.3333333334, ans=0.125 2023-11-19 18:49:57,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=764333.3333333334, ans=0.1 2023-11-19 18:50:01,295 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 6450, loss[loss=0.07598, simple_loss=0.1001, pruned_loss=0.01594, audio_tagging_loss=0.009981, over 14559.00 frames. ], tot_loss[loss=0.08642, simple_loss=0.1053, pruned_loss=0.02305, audio_tagging_loss=0.0107, over 3049209.40 frames. ], batch size: 55, lr: 6.92e-03, grad_scale: 32.0 2023-11-19 18:50:05,635 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.27 vs. limit=22.5 2023-11-19 18:50:20,278 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=764466.6666666666, ans=0.125 2023-11-19 18:50:21,604 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=764466.6666666666, ans=0.0 2023-11-19 18:50:30,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=764533.3333333334, ans=0.125 2023-11-19 18:50:50,152 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 114700 2023-11-19 18:50:54,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=764666.6666666666, ans=0.125 2023-11-19 18:51:05,822 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 6500, loss[loss=0.08283, simple_loss=0.1034, pruned_loss=0.0227, audio_tagging_loss=0.008438, over 13758.00 frames. ], tot_loss[loss=0.08598, simple_loss=0.1051, pruned_loss=0.02276, audio_tagging_loss=0.01067, over 3045208.09 frames. ], batch size: 53, lr: 6.92e-03, grad_scale: 32.0 2023-11-19 18:51:12,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=764733.3333333334, ans=0.0 2023-11-19 18:51:22,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=764800.0, ans=0.125 2023-11-19 18:51:32,075 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.963e+01 8.435e+01 9.152e+01 1.009e+02 1.379e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-19 18:51:45,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=764933.3333333334, ans=0.125 2023-11-19 18:51:45,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.whiten.whitening_limit, batch_count=764933.3333333334, ans=12.0 2023-11-19 18:51:46,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=764933.3333333334, ans=0.1 2023-11-19 18:51:48,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=764933.3333333334, ans=0.2 2023-11-19 18:51:54,544 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.78 vs. limit=5.0 2023-11-19 18:51:56,217 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 114750 2023-11-19 18:51:57,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=765000.0, ans=0.2 2023-11-19 18:52:11,130 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 6550, loss[loss=0.1003, simple_loss=0.1311, pruned_loss=0.02738, audio_tagging_loss=0.00742, over 16992.00 frames. ], tot_loss[loss=0.08628, simple_loss=0.1059, pruned_loss=0.02287, audio_tagging_loss=0.01048, over 3048719.96 frames. ], batch size: 61, lr: 6.92e-03, grad_scale: 32.0 2023-11-19 18:52:12,745 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=765066.6666666666, ans=0.125 2023-11-19 18:52:43,638 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 18:53:00,872 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 114800 2023-11-19 18:53:02,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=765333.3333333334, ans=0.125 2023-11-19 18:53:04,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=765333.3333333334, ans=0.0 2023-11-19 18:53:17,400 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 6600, loss[loss=0.06765, simple_loss=0.07806, pruned_loss=0.0162, audio_tagging_loss=0.01242, over 16600.00 frames. ], tot_loss[loss=0.08571, simple_loss=0.105, pruned_loss=0.02277, audio_tagging_loss=0.01043, over 3049469.49 frames. ], batch size: 62, lr: 6.92e-03, grad_scale: 32.0 2023-11-19 18:53:17,994 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.23 vs. limit=12.0 2023-11-19 18:53:25,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=765400.0, ans=0.0 2023-11-19 18:53:42,953 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.143e+01 8.495e+01 9.015e+01 9.763e+01 1.318e+02, threshold=1.803e+02, percent-clipped=0.0 2023-11-19 18:53:55,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=765600.0, ans=0.2 2023-11-19 18:54:04,076 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.72 vs. limit=22.5 2023-11-19 18:54:04,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=765600.0, ans=0.125 2023-11-19 18:54:06,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=765600.0, ans=0.125 2023-11-19 18:54:07,121 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 114850 2023-11-19 18:54:09,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=765666.6666666666, ans=0.1 2023-11-19 18:54:22,824 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 6650, loss[loss=0.09553, simple_loss=0.1165, pruned_loss=0.02859, audio_tagging_loss=0.008702, over 16291.00 frames. ], tot_loss[loss=0.08551, simple_loss=0.1045, pruned_loss=0.02286, audio_tagging_loss=0.0104, over 3045021.01 frames. ], batch size: 61, lr: 6.92e-03, grad_scale: 16.0 2023-11-19 18:54:42,393 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.46 vs. limit=6.0 2023-11-19 18:54:59,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=765866.6666666666, ans=0.1 2023-11-19 18:55:03,676 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 18:55:12,742 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 114900 2023-11-19 18:55:27,688 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 6700, loss[loss=0.07918, simple_loss=0.0975, pruned_loss=0.02089, audio_tagging_loss=0.009543, over 13939.00 frames. ], tot_loss[loss=0.085, simple_loss=0.104, pruned_loss=0.02267, audio_tagging_loss=0.01032, over 3043363.63 frames. ], batch size: 54, lr: 6.91e-03, grad_scale: 16.0 2023-11-19 18:55:47,701 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.85 vs. limit=22.5 2023-11-19 18:55:51,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=766133.3333333334, ans=0.95 2023-11-19 18:55:55,864 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.722e+01 8.369e+01 9.025e+01 9.789e+01 1.375e+02, threshold=1.805e+02, percent-clipped=0.0 2023-11-19 18:56:03,520 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=14.14 vs. limit=15.0 2023-11-19 18:56:12,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=766266.6666666666, ans=0.0 2023-11-19 18:56:17,641 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 114950 2023-11-19 18:56:33,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=766400.0, ans=0.125 2023-11-19 18:56:34,552 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 6750, loss[loss=0.05322, simple_loss=0.0589, pruned_loss=0.01084, audio_tagging_loss=0.01293, over 14421.00 frames. ], tot_loss[loss=0.08447, simple_loss=0.1034, pruned_loss=0.02249, audio_tagging_loss=0.01027, over 3039147.33 frames. ], batch size: 59, lr: 6.91e-03, grad_scale: 16.0 2023-11-19 18:57:04,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=766533.3333333334, ans=0.125 2023-11-19 18:57:08,267 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.53 vs. limit=15.0 2023-11-19 18:57:24,342 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 115000 2023-11-19 18:57:29,043 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.93 vs. limit=10.0 2023-11-19 18:57:31,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=766666.6666666666, ans=0.125 2023-11-19 18:57:39,605 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 6800, loss[loss=0.08856, simple_loss=0.1028, pruned_loss=0.02441, audio_tagging_loss=0.01275, over 16809.00 frames. ], tot_loss[loss=0.08595, simple_loss=0.1055, pruned_loss=0.02306, audio_tagging_loss=0.01013, over 3041082.25 frames. ], batch size: 64, lr: 6.91e-03, grad_scale: 32.0 2023-11-19 18:57:53,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=766800.0, ans=0.0 2023-11-19 18:58:07,414 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.734e+01 8.232e+01 9.131e+01 9.667e+01 1.376e+02, threshold=1.826e+02, percent-clipped=0.0 2023-11-19 18:58:20,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=766933.3333333334, ans=0.125 2023-11-19 18:58:29,262 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 115050 2023-11-19 18:58:42,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=767000.0, ans=0.1 2023-11-19 18:58:43,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=767066.6666666666, ans=0.2 2023-11-19 18:58:44,799 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 6850, loss[loss=0.06403, simple_loss=0.07694, pruned_loss=0.01498, audio_tagging_loss=0.01058, over 14494.00 frames. ], tot_loss[loss=0.0853, simple_loss=0.1047, pruned_loss=0.02278, audio_tagging_loss=0.01017, over 3036683.32 frames. ], batch size: 55, lr: 6.91e-03, grad_scale: 32.0 2023-11-19 18:59:34,649 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 115100 2023-11-19 18:59:50,466 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 6900, loss[loss=0.09626, simple_loss=0.1129, pruned_loss=0.02737, audio_tagging_loss=0.01245, over 14539.00 frames. ], tot_loss[loss=0.08485, simple_loss=0.1041, pruned_loss=0.02258, audio_tagging_loss=0.01023, over 3044953.70 frames. ], batch size: 54, lr: 6.91e-03, grad_scale: 32.0 2023-11-19 19:00:01,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=767400.0, ans=0.1 2023-11-19 19:00:05,245 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.66 vs. limit=15.0 2023-11-19 19:00:17,001 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.004e+01 8.111e+01 8.753e+01 9.460e+01 1.253e+02, threshold=1.751e+02, percent-clipped=0.0 2023-11-19 19:00:40,059 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 19:00:40,156 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 115150 2023-11-19 19:00:45,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=767666.6666666666, ans=0.0 2023-11-19 19:00:47,604 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.43 vs. limit=22.5 2023-11-19 19:00:50,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=767666.6666666666, ans=0.0 2023-11-19 19:00:55,483 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 6950, loss[loss=0.07308, simple_loss=0.08755, pruned_loss=0.01928, audio_tagging_loss=0.01003, over 15492.00 frames. ], tot_loss[loss=0.08559, simple_loss=0.1052, pruned_loss=0.02285, audio_tagging_loss=0.01013, over 3052305.39 frames. ], batch size: 59, lr: 6.91e-03, grad_scale: 32.0 2023-11-19 19:01:11,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=767800.0, ans=0.125 2023-11-19 19:01:17,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=767800.0, ans=0.0 2023-11-19 19:01:23,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=767866.6666666666, ans=0.125 2023-11-19 19:01:38,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=767933.3333333334, ans=0.04949747468305833 2023-11-19 19:01:45,209 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.47 vs. limit=22.5 2023-11-19 19:01:45,746 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 115200 2023-11-19 19:01:50,488 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.65 vs. limit=22.5 2023-11-19 19:01:51,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=768000.0, ans=0.1 2023-11-19 19:01:56,734 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.33 vs. limit=15.0 2023-11-19 19:02:00,895 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 7000, loss[loss=0.08049, simple_loss=0.1019, pruned_loss=0.01971, audio_tagging_loss=0.009849, over 15044.00 frames. ], tot_loss[loss=0.08541, simple_loss=0.1051, pruned_loss=0.02269, audio_tagging_loss=0.01018, over 3045874.23 frames. ], batch size: 56, lr: 6.90e-03, grad_scale: 32.0 2023-11-19 19:02:01,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=768066.6666666666, ans=0.1 2023-11-19 19:02:11,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=768066.6666666666, ans=0.2 2023-11-19 19:02:22,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=768133.3333333334, ans=0.125 2023-11-19 19:02:23,776 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.59 vs. limit=15.0 2023-11-19 19:02:26,217 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.74 vs. limit=15.0 2023-11-19 19:02:29,323 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.682e+01 8.339e+01 9.135e+01 1.019e+02 1.398e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-19 19:02:34,968 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.24 vs. limit=15.0 2023-11-19 19:02:43,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=768266.6666666666, ans=0.2 2023-11-19 19:02:50,579 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 115250 2023-11-19 19:02:53,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=768333.3333333334, ans=0.0 2023-11-19 19:02:56,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=768333.3333333334, ans=0.07 2023-11-19 19:03:07,192 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 7050, loss[loss=0.08334, simple_loss=0.1029, pruned_loss=0.02129, audio_tagging_loss=0.0106, over 16064.00 frames. ], tot_loss[loss=0.08519, simple_loss=0.1048, pruned_loss=0.02253, audio_tagging_loss=0.01027, over 3046785.02 frames. ], batch size: 60, lr: 6.90e-03, grad_scale: 32.0 2023-11-19 19:03:12,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=768400.0, ans=0.05 2023-11-19 19:03:17,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=768400.0, ans=0.2 2023-11-19 19:03:32,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=768533.3333333334, ans=0.125 2023-11-19 19:03:48,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=768600.0, ans=0.2 2023-11-19 19:03:56,501 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 115300 2023-11-19 19:04:09,628 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=768666.6666666666, ans=0.125 2023-11-19 19:04:11,840 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 7100, loss[loss=0.08506, simple_loss=0.1119, pruned_loss=0.01853, audio_tagging_loss=0.0106, over 14297.00 frames. ], tot_loss[loss=0.08538, simple_loss=0.1051, pruned_loss=0.02263, audio_tagging_loss=0.01022, over 3050402.58 frames. ], batch size: 56, lr: 6.90e-03, grad_scale: 32.0 2023-11-19 19:04:23,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=768800.0, ans=0.2 2023-11-19 19:04:25,993 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.86 vs. limit=15.0 2023-11-19 19:04:38,351 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.824e+01 8.390e+01 9.120e+01 9.831e+01 1.700e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-19 19:04:51,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=768933.3333333334, ans=0.125 2023-11-19 19:04:56,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=768933.3333333334, ans=0.125 2023-11-19 19:05:01,400 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 115350 2023-11-19 19:05:16,471 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 7150, loss[loss=0.09119, simple_loss=0.1131, pruned_loss=0.02141, audio_tagging_loss=0.01321, over 15900.00 frames. ], tot_loss[loss=0.08613, simple_loss=0.1058, pruned_loss=0.02303, audio_tagging_loss=0.01022, over 3050265.67 frames. ], batch size: 59, lr: 6.90e-03, grad_scale: 32.0 2023-11-19 19:06:01,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=769266.6666666666, ans=0.125 2023-11-19 19:06:03,172 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.43 vs. limit=15.0 2023-11-19 19:06:06,701 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 115400 2023-11-19 19:06:18,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=769333.3333333334, ans=0.1 2023-11-19 19:06:23,035 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 7200, loss[loss=0.0685, simple_loss=0.08278, pruned_loss=0.01579, audio_tagging_loss=0.01132, over 15374.00 frames. ], tot_loss[loss=0.08586, simple_loss=0.1054, pruned_loss=0.02278, audio_tagging_loss=0.01038, over 3047777.71 frames. ], batch size: 57, lr: 6.90e-03, grad_scale: 32.0 2023-11-19 19:06:23,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=769400.0, ans=0.125 2023-11-19 19:06:35,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=769466.6666666666, ans=0.035 2023-11-19 19:06:43,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=769466.6666666666, ans=0.0 2023-11-19 19:06:50,187 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.178e+01 8.420e+01 9.034e+01 9.720e+01 1.175e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-19 19:07:07,366 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.75 vs. limit=22.5 2023-11-19 19:07:07,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=769600.0, ans=0.125 2023-11-19 19:07:13,132 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 115450 2023-11-19 19:07:24,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=769666.6666666666, ans=0.2 2023-11-19 19:07:29,121 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 7250, loss[loss=0.08752, simple_loss=0.1077, pruned_loss=0.02239, audio_tagging_loss=0.01128, over 14896.00 frames. ], tot_loss[loss=0.08591, simple_loss=0.1052, pruned_loss=0.02288, audio_tagging_loss=0.01043, over 3046380.82 frames. ], batch size: 55, lr: 6.90e-03, grad_scale: 32.0 2023-11-19 19:08:18,660 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 115500 2023-11-19 19:08:18,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=769933.3333333334, ans=0.1 2023-11-19 19:08:33,599 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 7300, loss[loss=0.08738, simple_loss=0.1044, pruned_loss=0.02554, audio_tagging_loss=0.009632, over 14572.00 frames. ], tot_loss[loss=0.08519, simple_loss=0.1046, pruned_loss=0.0226, audio_tagging_loss=0.01031, over 3040834.57 frames. ], batch size: 56, lr: 6.90e-03, grad_scale: 32.0 2023-11-19 19:08:34,353 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.97 vs. limit=15.0 2023-11-19 19:08:43,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=770066.6666666666, ans=0.1 2023-11-19 19:08:52,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=770133.3333333334, ans=0.0 2023-11-19 19:09:02,040 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.990e+01 8.445e+01 8.971e+01 9.866e+01 1.829e+02, threshold=1.794e+02, percent-clipped=1.0 2023-11-19 19:09:20,031 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.24 vs. limit=15.0 2023-11-19 19:09:23,267 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 115550 2023-11-19 19:09:27,839 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.31 vs. limit=15.0 2023-11-19 19:09:37,897 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 7350, loss[loss=0.05521, simple_loss=0.05401, pruned_loss=0.01027, audio_tagging_loss=0.01793, over 16105.00 frames. ], tot_loss[loss=0.08463, simple_loss=0.1037, pruned_loss=0.02254, audio_tagging_loss=0.01023, over 3046875.42 frames. ], batch size: 64, lr: 6.89e-03, grad_scale: 32.0 2023-11-19 19:09:44,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=770400.0, ans=0.125 2023-11-19 19:09:48,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=770400.0, ans=0.0 2023-11-19 19:10:02,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=770466.6666666666, ans=0.125 2023-11-19 19:10:08,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=770533.3333333334, ans=0.125 2023-11-19 19:10:16,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=770600.0, ans=0.0 2023-11-19 19:10:24,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=770600.0, ans=0.025 2023-11-19 19:10:24,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=770600.0, ans=0.1 2023-11-19 19:10:26,809 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 115600 2023-11-19 19:10:44,514 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 7400, loss[loss=0.08109, simple_loss=0.1034, pruned_loss=0.02112, audio_tagging_loss=0.008287, over 15565.00 frames. ], tot_loss[loss=0.08465, simple_loss=0.1041, pruned_loss=0.02247, audio_tagging_loss=0.01012, over 3042302.64 frames. ], batch size: 59, lr: 6.89e-03, grad_scale: 16.0 2023-11-19 19:10:59,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=770800.0, ans=0.125 2023-11-19 19:11:07,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=770800.0, ans=0.0 2023-11-19 19:11:10,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=770866.6666666666, ans=0.1 2023-11-19 19:11:11,827 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.244e+01 8.500e+01 9.123e+01 1.022e+02 1.403e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-19 19:11:19,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=770866.6666666666, ans=0.0 2023-11-19 19:11:27,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=770933.3333333334, ans=0.1 2023-11-19 19:11:28,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=770933.3333333334, ans=0.125 2023-11-19 19:11:28,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=770933.3333333334, ans=0.125 2023-11-19 19:11:34,147 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 115650 2023-11-19 19:11:49,063 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 7450, loss[loss=0.08364, simple_loss=0.113, pruned_loss=0.01846, audio_tagging_loss=0.008699, over 15689.00 frames. ], tot_loss[loss=0.08487, simple_loss=0.1044, pruned_loss=0.02258, audio_tagging_loss=0.01008, over 3039433.34 frames. ], batch size: 57, lr: 6.89e-03, grad_scale: 16.0 2023-11-19 19:11:50,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=771066.6666666666, ans=0.125 2023-11-19 19:12:06,460 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 19:12:06,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=771133.3333333334, ans=0.125 2023-11-19 19:12:37,745 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 115700 2023-11-19 19:12:37,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=771266.6666666666, ans=0.0 2023-11-19 19:12:44,793 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.37 vs. limit=15.0 2023-11-19 19:12:48,411 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.63 vs. limit=6.0 2023-11-19 19:12:52,569 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 7500, loss[loss=0.08386, simple_loss=0.1137, pruned_loss=0.02017, audio_tagging_loss=0.006848, over 14689.00 frames. ], tot_loss[loss=0.08463, simple_loss=0.1042, pruned_loss=0.02237, audio_tagging_loss=0.01014, over 3045366.07 frames. ], batch size: 54, lr: 6.89e-03, grad_scale: 16.0 2023-11-19 19:12:58,351 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=771400.0, ans=0.025 2023-11-19 19:12:58,837 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.66 vs. limit=10.0 2023-11-19 19:13:14,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=771466.6666666666, ans=0.1 2023-11-19 19:13:22,182 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.878e+01 8.310e+01 8.971e+01 9.844e+01 3.516e+02, threshold=1.794e+02, percent-clipped=1.0 2023-11-19 19:13:41,888 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 115750 2023-11-19 19:13:59,048 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 7550, loss[loss=0.09101, simple_loss=0.1134, pruned_loss=0.02286, audio_tagging_loss=0.01145, over 15132.00 frames. ], tot_loss[loss=0.08501, simple_loss=0.1047, pruned_loss=0.02247, audio_tagging_loss=0.01018, over 3047298.43 frames. ], batch size: 55, lr: 6.89e-03, grad_scale: 16.0 2023-11-19 19:14:16,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=771800.0, ans=0.125 2023-11-19 19:14:48,794 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 115800 2023-11-19 19:14:51,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=772000.0, ans=0.125 2023-11-19 19:14:59,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=772000.0, ans=0.125 2023-11-19 19:15:04,028 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 7600, loss[loss=0.07879, simple_loss=0.08987, pruned_loss=0.02046, audio_tagging_loss=0.0134, over 15004.00 frames. ], tot_loss[loss=0.08464, simple_loss=0.1042, pruned_loss=0.02232, audio_tagging_loss=0.0102, over 3045001.90 frames. ], batch size: 56, lr: 6.89e-03, grad_scale: 32.0 2023-11-19 19:15:32,978 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.786e+01 8.395e+01 8.967e+01 1.032e+02 1.336e+02, threshold=1.793e+02, percent-clipped=0.0 2023-11-19 19:15:33,703 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.89 vs. limit=15.0 2023-11-19 19:15:43,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=772266.6666666666, ans=0.05 2023-11-19 19:15:51,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=772266.6666666666, ans=0.2 2023-11-19 19:15:53,519 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 115850 2023-11-19 19:16:08,664 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 7650, loss[loss=0.06058, simple_loss=0.08126, pruned_loss=0.0124, audio_tagging_loss=0.007558, over 15862.00 frames. ], tot_loss[loss=0.08438, simple_loss=0.1038, pruned_loss=0.02227, audio_tagging_loss=0.01021, over 3047894.19 frames. ], batch size: 63, lr: 6.89e-03, grad_scale: 32.0 2023-11-19 19:16:40,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=772533.3333333334, ans=0.05 2023-11-19 19:16:46,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=772533.3333333334, ans=0.125 2023-11-19 19:16:58,681 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 115900 2023-11-19 19:17:10,340 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.24 vs. limit=15.0 2023-11-19 19:17:12,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=772666.6666666666, ans=0.0 2023-11-19 19:17:14,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=772733.3333333334, ans=0.0 2023-11-19 19:17:15,206 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 7700, loss[loss=0.09279, simple_loss=0.1122, pruned_loss=0.02451, audio_tagging_loss=0.01216, over 15603.00 frames. ], tot_loss[loss=0.08493, simple_loss=0.1042, pruned_loss=0.02254, audio_tagging_loss=0.01028, over 3049713.21 frames. ], batch size: 61, lr: 6.88e-03, grad_scale: 32.0 2023-11-19 19:17:16,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=772733.3333333334, ans=0.0 2023-11-19 19:17:32,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=772800.0, ans=0.0 2023-11-19 19:17:43,110 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.843e+01 8.063e+01 8.419e+01 9.189e+01 1.330e+02, threshold=1.684e+02, percent-clipped=0.0 2023-11-19 19:17:53,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=772933.3333333334, ans=0.2 2023-11-19 19:18:04,548 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 115950 2023-11-19 19:18:20,131 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 7750, loss[loss=0.09773, simple_loss=0.1245, pruned_loss=0.02489, audio_tagging_loss=0.0106, over 14909.00 frames. ], tot_loss[loss=0.0855, simple_loss=0.105, pruned_loss=0.02271, audio_tagging_loss=0.01027, over 3041225.22 frames. ], batch size: 56, lr: 6.88e-03, grad_scale: 32.0 2023-11-19 19:18:53,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=773200.0, ans=0.0 2023-11-19 19:19:09,797 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 116000 2023-11-19 19:19:17,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=773333.3333333334, ans=0.0 2023-11-19 19:19:22,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=773333.3333333334, ans=0.125 2023-11-19 19:19:28,182 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 7800, loss[loss=0.07027, simple_loss=0.08034, pruned_loss=0.01738, audio_tagging_loss=0.01273, over 14350.00 frames. ], tot_loss[loss=0.08581, simple_loss=0.1053, pruned_loss=0.02273, audio_tagging_loss=0.01042, over 3043765.30 frames. ], batch size: 55, lr: 6.88e-03, grad_scale: 32.0 2023-11-19 19:19:47,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=773466.6666666666, ans=0.0 2023-11-19 19:19:51,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=773466.6666666666, ans=0.125 2023-11-19 19:19:58,890 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.943e+01 8.729e+01 9.223e+01 9.822e+01 1.484e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-19 19:20:17,761 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 116050 2023-11-19 19:20:24,303 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.19 vs. limit=15.0 2023-11-19 19:20:33,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=773733.3333333334, ans=0.04949747468305833 2023-11-19 19:20:34,242 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 7850, loss[loss=0.07924, simple_loss=0.1043, pruned_loss=0.01829, audio_tagging_loss=0.008808, over 14377.00 frames. ], tot_loss[loss=0.08522, simple_loss=0.1044, pruned_loss=0.02259, audio_tagging_loss=0.01042, over 3036035.98 frames. ], batch size: 55, lr: 6.88e-03, grad_scale: 16.0 2023-11-19 19:20:34,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=773733.3333333334, ans=0.0 2023-11-19 19:20:36,202 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.98 vs. limit=12.0 2023-11-19 19:20:38,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=773733.3333333334, ans=0.1 2023-11-19 19:20:47,194 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.79 vs. limit=10.0 2023-11-19 19:20:51,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=773800.0, ans=0.2 2023-11-19 19:21:23,695 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 116100 2023-11-19 19:21:23,837 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 19:21:27,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=774000.0, ans=0.125 2023-11-19 19:21:38,920 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 7900, loss[loss=0.0755, simple_loss=0.0914, pruned_loss=0.01508, audio_tagging_loss=0.01472, over 16260.00 frames. ], tot_loss[loss=0.08545, simple_loss=0.1045, pruned_loss=0.02273, audio_tagging_loss=0.01048, over 3038983.19 frames. ], batch size: 62, lr: 6.88e-03, grad_scale: 16.0 2023-11-19 19:21:50,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=774133.3333333334, ans=0.0 2023-11-19 19:21:51,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=774133.3333333334, ans=0.125 2023-11-19 19:21:55,652 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.25 vs. limit=22.5 2023-11-19 19:22:06,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=774200.0, ans=0.0 2023-11-19 19:22:08,451 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.972e+01 8.408e+01 9.159e+01 1.027e+02 1.292e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-19 19:22:27,810 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 116150 2023-11-19 19:22:43,199 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 7950, loss[loss=0.07852, simple_loss=0.09545, pruned_loss=0.01958, audio_tagging_loss=0.01121, over 16783.00 frames. ], tot_loss[loss=0.08537, simple_loss=0.1043, pruned_loss=0.02265, audio_tagging_loss=0.01059, over 3043608.49 frames. ], batch size: 63, lr: 6.88e-03, grad_scale: 16.0 2023-11-19 19:22:57,978 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 19:23:06,652 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.32 vs. limit=15.0 2023-11-19 19:23:15,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=774533.3333333334, ans=0.125 2023-11-19 19:23:32,853 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 116200 2023-11-19 19:23:49,426 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 8000, loss[loss=0.09641, simple_loss=0.121, pruned_loss=0.02647, audio_tagging_loss=0.009427, over 15330.00 frames. ], tot_loss[loss=0.08454, simple_loss=0.103, pruned_loss=0.02229, audio_tagging_loss=0.01073, over 3048459.91 frames. ], batch size: 58, lr: 6.88e-03, grad_scale: 32.0 2023-11-19 19:23:57,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=774733.3333333334, ans=0.125 2023-11-19 19:24:19,082 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.333e+01 8.445e+01 9.319e+01 1.021e+02 1.426e+02, threshold=1.864e+02, percent-clipped=0.0 2023-11-19 19:24:20,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=774866.6666666666, ans=0.125 2023-11-19 19:24:39,005 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 116250 2023-11-19 19:24:51,233 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.46 vs. limit=15.0 2023-11-19 19:24:54,250 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 8050, loss[loss=0.07418, simple_loss=0.07965, pruned_loss=0.02136, audio_tagging_loss=0.01299, over 14738.00 frames. ], tot_loss[loss=0.08484, simple_loss=0.1032, pruned_loss=0.02238, audio_tagging_loss=0.01084, over 3041527.68 frames. ], batch size: 57, lr: 6.87e-03, grad_scale: 32.0 2023-11-19 19:24:54,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=775066.6666666666, ans=0.125 2023-11-19 19:24:56,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=775066.6666666666, ans=0.125 2023-11-19 19:25:06,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=775133.3333333334, ans=0.125 2023-11-19 19:25:12,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=775133.3333333334, ans=0.09899494936611666 2023-11-19 19:25:14,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=775133.3333333334, ans=0.1 2023-11-19 19:25:19,066 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.30 vs. limit=15.0 2023-11-19 19:25:28,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=775200.0, ans=0.125 2023-11-19 19:25:29,226 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.79 vs. limit=15.0 2023-11-19 19:25:43,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=775266.6666666666, ans=0.2 2023-11-19 19:25:44,135 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 116300 2023-11-19 19:25:59,365 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 8100, loss[loss=0.08021, simple_loss=0.1029, pruned_loss=0.01969, audio_tagging_loss=0.009064, over 14910.00 frames. ], tot_loss[loss=0.08428, simple_loss=0.1027, pruned_loss=0.02219, audio_tagging_loss=0.01075, over 3047415.36 frames. ], batch size: 57, lr: 6.87e-03, grad_scale: 32.0 2023-11-19 19:26:29,906 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.744e+01 8.355e+01 8.902e+01 9.485e+01 1.238e+02, threshold=1.780e+02, percent-clipped=0.0 2023-11-19 19:26:31,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=775533.3333333334, ans=0.125 2023-11-19 19:26:33,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=775533.3333333334, ans=0.1 2023-11-19 19:26:35,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=775533.3333333334, ans=0.2 2023-11-19 19:26:49,908 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 116350 2023-11-19 19:26:52,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=775666.6666666666, ans=0.2 2023-11-19 19:26:58,838 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=775666.6666666666, ans=0.5 2023-11-19 19:27:04,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=775733.3333333334, ans=0.125 2023-11-19 19:27:05,400 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 8150, loss[loss=0.1127, simple_loss=0.1391, pruned_loss=0.0353, audio_tagging_loss=0.007889, over 14954.00 frames. ], tot_loss[loss=0.08367, simple_loss=0.1018, pruned_loss=0.02215, audio_tagging_loss=0.01061, over 3039886.24 frames. ], batch size: 54, lr: 6.87e-03, grad_scale: 32.0 2023-11-19 19:27:09,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=775733.3333333334, ans=0.125 2023-11-19 19:27:47,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=775933.3333333334, ans=0.0 2023-11-19 19:27:48,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=775933.3333333334, ans=0.0 2023-11-19 19:27:55,522 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 116400 2023-11-19 19:28:10,284 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 19:28:11,456 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 8200, loss[loss=0.1064, simple_loss=0.114, pruned_loss=0.03876, audio_tagging_loss=0.01068, over 14988.00 frames. ], tot_loss[loss=0.08384, simple_loss=0.1021, pruned_loss=0.02224, audio_tagging_loss=0.01053, over 3041399.36 frames. ], batch size: 57, lr: 6.87e-03, grad_scale: 32.0 2023-11-19 19:28:11,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=776066.6666666666, ans=0.125 2023-11-19 19:28:36,716 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.53 vs. limit=8.0 2023-11-19 19:28:41,261 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.260e+01 8.595e+01 9.342e+01 1.031e+02 1.321e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-19 19:28:49,557 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 19:29:01,282 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 116450 2023-11-19 19:29:05,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=776333.3333333334, ans=0.125 2023-11-19 19:29:16,457 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 8250, loss[loss=0.08272, simple_loss=0.1023, pruned_loss=0.02275, audio_tagging_loss=0.008814, over 14209.00 frames. ], tot_loss[loss=0.08418, simple_loss=0.1029, pruned_loss=0.02239, audio_tagging_loss=0.01032, over 3039535.77 frames. ], batch size: 54, lr: 6.87e-03, grad_scale: 32.0 2023-11-19 19:29:16,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=776400.0, ans=0.09899494936611666 2023-11-19 19:29:22,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=776400.0, ans=0.125 2023-11-19 19:29:38,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=776466.6666666666, ans=0.0 2023-11-19 19:29:43,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=776533.3333333334, ans=0.125 2023-11-19 19:30:06,003 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 116500 2023-11-19 19:30:22,186 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 8300, loss[loss=0.06664, simple_loss=0.08021, pruned_loss=0.01544, audio_tagging_loss=0.01109, over 15440.00 frames. ], tot_loss[loss=0.08337, simple_loss=0.102, pruned_loss=0.02208, audio_tagging_loss=0.0103, over 3043896.86 frames. ], batch size: 59, lr: 6.87e-03, grad_scale: 32.0 2023-11-19 19:30:22,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=776733.3333333334, ans=0.125 2023-11-19 19:30:51,980 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.779e+01 8.250e+01 8.993e+01 9.595e+01 1.317e+02, threshold=1.799e+02, percent-clipped=0.0 2023-11-19 19:31:11,740 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 116550 2023-11-19 19:31:12,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=776933.3333333334, ans=0.04949747468305833 2023-11-19 19:31:13,434 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.53 vs. limit=15.0 2023-11-19 19:31:24,527 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.18 vs. limit=15.0 2023-11-19 19:31:25,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=777000.0, ans=0.0 2023-11-19 19:31:27,856 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 8350, loss[loss=0.07798, simple_loss=0.09014, pruned_loss=0.02036, audio_tagging_loss=0.01255, over 15774.00 frames. ], tot_loss[loss=0.08356, simple_loss=0.1024, pruned_loss=0.02207, audio_tagging_loss=0.01027, over 3052679.19 frames. ], batch size: 60, lr: 6.86e-03, grad_scale: 32.0 2023-11-19 19:31:43,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=777133.3333333334, ans=0.2 2023-11-19 19:31:45,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=777133.3333333334, ans=0.125 2023-11-19 19:31:49,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=777133.3333333334, ans=0.2 2023-11-19 19:31:55,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=777200.0, ans=0.0 2023-11-19 19:32:05,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=777266.6666666666, ans=0.07 2023-11-19 19:32:16,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=777266.6666666666, ans=10.0 2023-11-19 19:32:17,478 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 116600 2023-11-19 19:32:29,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=777333.3333333334, ans=0.2 2023-11-19 19:32:32,798 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 8400, loss[loss=0.1035, simple_loss=0.1199, pruned_loss=0.03305, audio_tagging_loss=0.01051, over 14922.00 frames. ], tot_loss[loss=0.08368, simple_loss=0.1023, pruned_loss=0.02216, audio_tagging_loss=0.01036, over 3049733.85 frames. ], batch size: 58, lr: 6.86e-03, grad_scale: 32.0 2023-11-19 19:33:03,761 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.528e+01 8.258e+01 9.022e+01 9.929e+01 1.314e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-19 19:33:08,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=777533.3333333334, ans=0.015 2023-11-19 19:33:16,541 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.32 vs. limit=15.0 2023-11-19 19:33:21,232 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 19:33:22,301 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 116650 2023-11-19 19:33:26,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=777666.6666666666, ans=0.1 2023-11-19 19:33:27,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=777666.6666666666, ans=0.1 2023-11-19 19:33:30,459 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.20 vs. limit=22.5 2023-11-19 19:33:33,798 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.54 vs. limit=15.0 2023-11-19 19:33:37,654 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 8450, loss[loss=0.1005, simple_loss=0.1262, pruned_loss=0.02506, audio_tagging_loss=0.01231, over 14577.00 frames. ], tot_loss[loss=0.08431, simple_loss=0.103, pruned_loss=0.02247, audio_tagging_loss=0.01034, over 3038392.90 frames. ], batch size: 54, lr: 6.86e-03, grad_scale: 32.0 2023-11-19 19:33:50,009 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.68 vs. limit=15.0 2023-11-19 19:33:50,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=777800.0, ans=0.125 2023-11-19 19:34:00,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=777800.0, ans=0.1 2023-11-19 19:34:27,526 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 116700 2023-11-19 19:34:44,130 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 8500, loss[loss=0.09808, simple_loss=0.1226, pruned_loss=0.02759, audio_tagging_loss=0.009184, over 14741.00 frames. ], tot_loss[loss=0.08466, simple_loss=0.1035, pruned_loss=0.02254, audio_tagging_loss=0.01036, over 3034754.71 frames. ], batch size: 56, lr: 6.86e-03, grad_scale: 32.0 2023-11-19 19:34:49,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=778066.6666666666, ans=0.125 2023-11-19 19:35:13,934 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.558e+01 8.293e+01 8.958e+01 9.904e+01 1.302e+02, threshold=1.792e+02, percent-clipped=0.0 2023-11-19 19:35:14,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=778200.0, ans=0.125 2023-11-19 19:35:33,057 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 116750 2023-11-19 19:35:48,120 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 8550, loss[loss=0.06716, simple_loss=0.07479, pruned_loss=0.01695, audio_tagging_loss=0.01281, over 15592.00 frames. ], tot_loss[loss=0.08438, simple_loss=0.1032, pruned_loss=0.02236, audio_tagging_loss=0.0104, over 3033752.02 frames. ], batch size: 61, lr: 6.86e-03, grad_scale: 16.0 2023-11-19 19:35:51,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=778400.0, ans=0.015 2023-11-19 19:35:51,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=778400.0, ans=0.0 2023-11-19 19:35:57,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=778400.0, ans=0.2 2023-11-19 19:36:01,210 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.93 vs. limit=10.0 2023-11-19 19:36:22,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=778533.3333333334, ans=0.2 2023-11-19 19:36:23,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=778533.3333333334, ans=0.125 2023-11-19 19:36:28,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=778600.0, ans=0.125 2023-11-19 19:36:37,893 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 116800 2023-11-19 19:36:44,635 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=778666.6666666666, ans=0.125 2023-11-19 19:36:51,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=778733.3333333334, ans=0.0 2023-11-19 19:36:52,888 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 8600, loss[loss=0.07428, simple_loss=0.08687, pruned_loss=0.02073, audio_tagging_loss=0.01011, over 14151.00 frames. ], tot_loss[loss=0.08482, simple_loss=0.1038, pruned_loss=0.02256, audio_tagging_loss=0.01038, over 3037793.09 frames. ], batch size: 58, lr: 6.86e-03, grad_scale: 16.0 2023-11-19 19:36:58,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=778733.3333333334, ans=0.0 2023-11-19 19:37:02,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=778733.3333333334, ans=0.07 2023-11-19 19:37:24,749 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.251e+01 8.439e+01 9.027e+01 1.024e+02 1.472e+02, threshold=1.805e+02, percent-clipped=0.0 2023-11-19 19:37:42,131 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 116850 2023-11-19 19:37:59,585 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 8650, loss[loss=0.09022, simple_loss=0.1039, pruned_loss=0.02612, audio_tagging_loss=0.01218, over 15250.00 frames. ], tot_loss[loss=0.08537, simple_loss=0.1044, pruned_loss=0.02266, audio_tagging_loss=0.01051, over 3043397.81 frames. ], batch size: 58, lr: 6.86e-03, grad_scale: 16.0 2023-11-19 19:38:04,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=779066.6666666666, ans=0.125 2023-11-19 19:38:35,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=779200.0, ans=0.125 2023-11-19 19:38:48,816 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 116900 2023-11-19 19:39:03,568 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 8700, loss[loss=0.07953, simple_loss=0.09832, pruned_loss=0.01986, audio_tagging_loss=0.01051, over 16393.00 frames. ], tot_loss[loss=0.08562, simple_loss=0.1047, pruned_loss=0.02279, audio_tagging_loss=0.01048, over 3041289.57 frames. ], batch size: 63, lr: 6.85e-03, grad_scale: 16.0 2023-11-19 19:39:03,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=779400.0, ans=0.125 2023-11-19 19:39:19,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=779466.6666666666, ans=0.125 2023-11-19 19:39:28,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=779533.3333333334, ans=0.125 2023-11-19 19:39:35,919 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.849e+01 8.405e+01 9.122e+01 9.859e+01 1.308e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-19 19:39:42,844 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.83 vs. limit=15.0 2023-11-19 19:39:53,677 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 116950 2023-11-19 19:39:53,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=779600.0, ans=0.0 2023-11-19 19:40:05,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=779666.6666666666, ans=0.125 2023-11-19 19:40:08,886 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 8750, loss[loss=0.07181, simple_loss=0.09151, pruned_loss=0.01707, audio_tagging_loss=0.008989, over 15485.00 frames. ], tot_loss[loss=0.08614, simple_loss=0.1056, pruned_loss=0.02289, audio_tagging_loss=0.01047, over 3045998.57 frames. ], batch size: 57, lr: 6.85e-03, grad_scale: 16.0 2023-11-19 19:40:14,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=779733.3333333334, ans=0.0 2023-11-19 19:40:16,339 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.95 vs. limit=10.0 2023-11-19 19:40:22,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=779800.0, ans=0.1 2023-11-19 19:40:47,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=779933.3333333334, ans=0.125 2023-11-19 19:40:58,634 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 117000 2023-11-19 19:41:01,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=780000.0, ans=0.125 2023-11-19 19:41:07,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=780000.0, ans=0.125 2023-11-19 19:41:11,081 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 19:41:15,638 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 8800, loss[loss=0.1095, simple_loss=0.1421, pruned_loss=0.02953, audio_tagging_loss=0.008912, over 15716.00 frames. ], tot_loss[loss=0.08641, simple_loss=0.1056, pruned_loss=0.0229, audio_tagging_loss=0.0107, over 3049568.19 frames. ], batch size: 56, lr: 6.85e-03, grad_scale: 32.0 2023-11-19 19:41:22,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=780066.6666666666, ans=0.125 2023-11-19 19:41:30,333 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 19:41:31,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=780133.3333333334, ans=0.125 2023-11-19 19:41:46,212 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.896e+01 8.517e+01 9.316e+01 1.005e+02 1.428e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-19 19:41:47,165 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.89 vs. limit=15.0 2023-11-19 19:42:05,450 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 117050 2023-11-19 19:42:19,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=780400.0, ans=0.5 2023-11-19 19:42:20,907 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 8850, loss[loss=0.08159, simple_loss=0.09105, pruned_loss=0.02266, audio_tagging_loss=0.01341, over 15170.00 frames. ], tot_loss[loss=0.08692, simple_loss=0.106, pruned_loss=0.02313, audio_tagging_loss=0.01078, over 3051927.90 frames. ], batch size: 60, lr: 6.85e-03, grad_scale: 32.0 2023-11-19 19:42:31,029 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 19:42:40,360 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.51 vs. limit=22.5 2023-11-19 19:42:46,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=780533.3333333334, ans=0.125 2023-11-19 19:42:48,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=780533.3333333334, ans=0.09899494936611666 2023-11-19 19:43:09,022 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.22 vs. limit=15.0 2023-11-19 19:43:10,888 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 117100 2023-11-19 19:43:24,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=780733.3333333334, ans=10.0 2023-11-19 19:43:25,649 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 8900, loss[loss=0.06846, simple_loss=0.07602, pruned_loss=0.01816, audio_tagging_loss=0.01229, over 14168.00 frames. ], tot_loss[loss=0.08734, simple_loss=0.107, pruned_loss=0.02334, audio_tagging_loss=0.01047, over 3053044.04 frames. ], batch size: 58, lr: 6.85e-03, grad_scale: 32.0 2023-11-19 19:43:30,106 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.25 vs. limit=15.0 2023-11-19 19:43:39,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=780800.0, ans=0.125 2023-11-19 19:43:57,709 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.896e+01 8.524e+01 9.164e+01 1.008e+02 2.519e+02, threshold=1.833e+02, percent-clipped=1.0 2023-11-19 19:44:06,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=780933.3333333334, ans=0.125 2023-11-19 19:44:15,128 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 117150 2023-11-19 19:44:23,668 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.39 vs. limit=10.0 2023-11-19 19:44:25,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=781000.0, ans=0.0 2023-11-19 19:44:30,939 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 8950, loss[loss=0.1005, simple_loss=0.1313, pruned_loss=0.02903, audio_tagging_loss=0.005801, over 15018.00 frames. ], tot_loss[loss=0.08661, simple_loss=0.1063, pruned_loss=0.02323, audio_tagging_loss=0.01021, over 3054709.52 frames. ], batch size: 55, lr: 6.85e-03, grad_scale: 32.0 2023-11-19 19:44:43,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=781133.3333333334, ans=0.125 2023-11-19 19:44:52,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=781133.3333333334, ans=0.125 2023-11-19 19:44:53,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=781133.3333333334, ans=0.2 2023-11-19 19:44:54,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=781133.3333333334, ans=0.09899494936611666 2023-11-19 19:45:20,405 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 117200 2023-11-19 19:45:36,440 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 9000, loss[loss=0.07699, simple_loss=0.09692, pruned_loss=0.01793, audio_tagging_loss=0.0106, over 15895.00 frames. ], tot_loss[loss=0.08615, simple_loss=0.1058, pruned_loss=0.02306, audio_tagging_loss=0.01019, over 3059323.54 frames. ], batch size: 59, lr: 6.85e-03, grad_scale: 16.0 2023-11-19 19:45:36,441 INFO [train_asr.py:1285] (1/4) Computing validation loss 2023-11-19 19:46:18,836 INFO [train_asr.py:1294] (1/4) Epoch 10, validation: loss=0.06518, simple_loss=0.05524, pruned_loss=0.006372, audio_tagging_loss=0.03119, over 4681554.00 frames. 2023-11-19 19:46:18,837 INFO [train_asr.py:1295] (1/4) Maximum memory allocated so far is 26082MB 2023-11-19 19:46:19,416 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.87 vs. limit=15.0 2023-11-19 19:46:19,428 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.80 vs. limit=10.0 2023-11-19 19:46:23,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=781400.0, ans=0.125 2023-11-19 19:46:49,443 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.24 vs. limit=15.0 2023-11-19 19:46:52,440 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.260e+01 8.536e+01 8.923e+01 9.790e+01 1.498e+02, threshold=1.785e+02, percent-clipped=0.0 2023-11-19 19:47:05,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=781600.0, ans=0.1 2023-11-19 19:47:08,913 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 117250 2023-11-19 19:47:10,970 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=15.11 vs. limit=15.0 2023-11-19 19:47:15,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=781666.6666666666, ans=0.125 2023-11-19 19:47:25,440 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 9050, loss[loss=0.09195, simple_loss=0.1173, pruned_loss=0.02495, audio_tagging_loss=0.008332, over 15710.00 frames. ], tot_loss[loss=0.08645, simple_loss=0.1062, pruned_loss=0.02328, audio_tagging_loss=0.01008, over 3059068.76 frames. ], batch size: 57, lr: 6.84e-03, grad_scale: 16.0 2023-11-19 19:47:28,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=781733.3333333334, ans=0.0 2023-11-19 19:47:39,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=781800.0, ans=0.0 2023-11-19 19:48:05,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=781933.3333333334, ans=0.0 2023-11-19 19:48:15,119 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 117300 2023-11-19 19:48:17,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=782000.0, ans=0.125 2023-11-19 19:48:30,606 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 9100, loss[loss=0.1024, simple_loss=0.1322, pruned_loss=0.02709, audio_tagging_loss=0.009259, over 15596.00 frames. ], tot_loss[loss=0.08582, simple_loss=0.1057, pruned_loss=0.02301, audio_tagging_loss=0.009953, over 3051971.08 frames. ], batch size: 56, lr: 6.84e-03, grad_scale: 16.0 2023-11-19 19:48:39,841 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.06 vs. limit=6.0 2023-11-19 19:48:47,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=782133.3333333334, ans=0.1 2023-11-19 19:48:48,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=782133.3333333334, ans=0.0 2023-11-19 19:49:03,198 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.816e+01 8.111e+01 8.734e+01 9.488e+01 1.224e+02, threshold=1.747e+02, percent-clipped=0.0 2023-11-19 19:49:21,003 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 117350 2023-11-19 19:49:30,568 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.05 vs. limit=12.0 2023-11-19 19:49:36,000 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 9150, loss[loss=0.09417, simple_loss=0.124, pruned_loss=0.02419, audio_tagging_loss=0.007997, over 14796.00 frames. ], tot_loss[loss=0.08491, simple_loss=0.1047, pruned_loss=0.02257, audio_tagging_loss=0.01, over 3050721.56 frames. ], batch size: 56, lr: 6.84e-03, grad_scale: 16.0 2023-11-19 19:50:01,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=782533.3333333334, ans=0.0 2023-11-19 19:50:26,042 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 117400 2023-11-19 19:50:36,315 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.47 vs. limit=15.0 2023-11-19 19:50:42,570 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 9200, loss[loss=0.13, simple_loss=0.1636, pruned_loss=0.04082, audio_tagging_loss=0.007424, over 14905.00 frames. ], tot_loss[loss=0.08555, simple_loss=0.1052, pruned_loss=0.02286, audio_tagging_loss=0.0101, over 3045191.96 frames. ], batch size: 54, lr: 6.84e-03, grad_scale: 32.0 2023-11-19 19:50:53,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=782733.3333333334, ans=0.125 2023-11-19 19:51:03,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=782800.0, ans=0.125 2023-11-19 19:51:15,175 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.014e+01 8.317e+01 9.062e+01 1.049e+02 1.537e+02, threshold=1.812e+02, percent-clipped=0.0 2023-11-19 19:51:33,133 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 117450 2023-11-19 19:51:33,720 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.23 vs. limit=15.0 2023-11-19 19:51:39,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=783000.0, ans=0.05 2023-11-19 19:51:42,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=783000.0, ans=0.2 2023-11-19 19:51:47,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=783066.6666666666, ans=0.125 2023-11-19 19:51:48,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=783066.6666666666, ans=0.0 2023-11-19 19:51:48,835 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 9250, loss[loss=0.08421, simple_loss=0.1051, pruned_loss=0.02031, audio_tagging_loss=0.01136, over 15356.00 frames. ], tot_loss[loss=0.08474, simple_loss=0.1042, pruned_loss=0.02251, audio_tagging_loss=0.01015, over 3053821.90 frames. ], batch size: 56, lr: 6.84e-03, grad_scale: 32.0 2023-11-19 19:51:49,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=783066.6666666666, ans=0.2 2023-11-19 19:52:05,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=783133.3333333334, ans=10.0 2023-11-19 19:52:08,626 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.72 vs. limit=15.0 2023-11-19 19:52:16,798 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.64 vs. limit=22.5 2023-11-19 19:52:23,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=783200.0, ans=10.0 2023-11-19 19:52:38,897 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 117500 2023-11-19 19:52:45,137 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.49 vs. limit=22.5 2023-11-19 19:52:48,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=783333.3333333334, ans=0.125 2023-11-19 19:52:49,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=783333.3333333334, ans=0.04949747468305833 2023-11-19 19:52:54,383 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 9300, loss[loss=0.1048, simple_loss=0.1311, pruned_loss=0.03309, audio_tagging_loss=0.006177, over 14427.00 frames. ], tot_loss[loss=0.08419, simple_loss=0.1035, pruned_loss=0.02225, audio_tagging_loss=0.01018, over 3052445.59 frames. ], batch size: 56, lr: 6.84e-03, grad_scale: 32.0 2023-11-19 19:52:56,568 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.36 vs. limit=15.0 2023-11-19 19:53:00,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=783400.0, ans=0.125 2023-11-19 19:53:26,981 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.639e+01 8.312e+01 9.036e+01 9.787e+01 1.162e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-19 19:53:38,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=783600.0, ans=0.0 2023-11-19 19:53:44,126 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 117550 2023-11-19 19:53:44,623 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.16 vs. limit=22.5 2023-11-19 19:53:59,586 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 9350, loss[loss=0.07144, simple_loss=0.07639, pruned_loss=0.0178, audio_tagging_loss=0.01544, over 14751.00 frames. ], tot_loss[loss=0.08477, simple_loss=0.1043, pruned_loss=0.02245, audio_tagging_loss=0.01018, over 3064808.87 frames. ], batch size: 58, lr: 6.84e-03, grad_scale: 32.0 2023-11-19 19:54:08,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=783733.3333333334, ans=0.1 2023-11-19 19:54:24,869 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.33 vs. limit=15.0 2023-11-19 19:54:27,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=783866.6666666666, ans=0.1 2023-11-19 19:54:49,352 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 117600 2023-11-19 19:54:58,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=784000.0, ans=0.0 2023-11-19 19:55:05,779 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 9400, loss[loss=0.06812, simple_loss=0.08216, pruned_loss=0.01674, audio_tagging_loss=0.0103, over 16083.00 frames. ], tot_loss[loss=0.08536, simple_loss=0.105, pruned_loss=0.02263, audio_tagging_loss=0.01024, over 3055185.90 frames. ], batch size: 62, lr: 6.83e-03, grad_scale: 16.0 2023-11-19 19:55:06,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=784066.6666666666, ans=0.1 2023-11-19 19:55:35,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=784200.0, ans=0.0 2023-11-19 19:55:39,192 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.194e+01 8.476e+01 9.081e+01 1.030e+02 1.355e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-19 19:55:55,248 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 117650 2023-11-19 19:56:06,407 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 19:56:11,001 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 9450, loss[loss=0.0785, simple_loss=0.0956, pruned_loss=0.01876, audio_tagging_loss=0.01194, over 15311.00 frames. ], tot_loss[loss=0.08579, simple_loss=0.105, pruned_loss=0.02292, audio_tagging_loss=0.01039, over 3052661.34 frames. ], batch size: 59, lr: 6.83e-03, grad_scale: 16.0 2023-11-19 19:56:26,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=784466.6666666666, ans=0.125 2023-11-19 19:56:27,994 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=784466.6666666666, ans=0.125 2023-11-19 19:56:37,007 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.34 vs. limit=12.0 2023-11-19 19:56:44,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=784533.3333333334, ans=0.125 2023-11-19 19:56:46,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=784533.3333333334, ans=0.125 2023-11-19 19:56:47,901 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.04 vs. limit=15.0 2023-11-19 19:56:51,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=784600.0, ans=0.125 2023-11-19 19:57:00,648 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 117700 2023-11-19 19:57:08,800 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.47 vs. limit=15.0 2023-11-19 19:57:16,045 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 9500, loss[loss=0.07019, simple_loss=0.09373, pruned_loss=0.01432, audio_tagging_loss=0.009009, over 14431.00 frames. ], tot_loss[loss=0.08639, simple_loss=0.106, pruned_loss=0.02304, audio_tagging_loss=0.01037, over 3045311.26 frames. ], batch size: 53, lr: 6.83e-03, grad_scale: 16.0 2023-11-19 19:57:23,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=784733.3333333334, ans=0.07 2023-11-19 19:57:25,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=784733.3333333334, ans=0.2 2023-11-19 19:57:26,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=784733.3333333334, ans=0.0 2023-11-19 19:57:46,849 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.31 vs. limit=15.0 2023-11-19 19:57:46,876 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.42 vs. limit=22.5 2023-11-19 19:57:49,738 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.106e+01 8.346e+01 9.084e+01 9.988e+01 1.421e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-19 19:57:50,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=784866.6666666666, ans=0.125 2023-11-19 19:57:59,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=784933.3333333334, ans=0.0 2023-11-19 19:58:06,207 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 117750 2023-11-19 19:58:13,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=785000.0, ans=0.125 2023-11-19 19:58:21,916 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 9550, loss[loss=0.1052, simple_loss=0.1298, pruned_loss=0.03152, audio_tagging_loss=0.008778, over 15606.00 frames. ], tot_loss[loss=0.08702, simple_loss=0.1066, pruned_loss=0.02325, audio_tagging_loss=0.01046, over 3042711.33 frames. ], batch size: 55, lr: 6.83e-03, grad_scale: 16.0 2023-11-19 19:58:27,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=785066.6666666666, ans=0.125 2023-11-19 19:58:47,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=785200.0, ans=0.1 2023-11-19 19:59:01,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=785266.6666666666, ans=0.125 2023-11-19 19:59:02,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=785266.6666666666, ans=0.125 2023-11-19 19:59:11,457 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 117800 2023-11-19 19:59:26,813 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 9600, loss[loss=0.09268, simple_loss=0.117, pruned_loss=0.02153, audio_tagging_loss=0.01267, over 14951.00 frames. ], tot_loss[loss=0.08712, simple_loss=0.1067, pruned_loss=0.02317, audio_tagging_loss=0.01059, over 3045914.64 frames. ], batch size: 54, lr: 6.83e-03, grad_scale: 32.0 2023-11-19 19:59:57,945 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=785533.3333333334, ans=0.125 2023-11-19 20:00:01,477 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.979e+01 8.524e+01 9.148e+01 9.988e+01 1.418e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-19 20:00:05,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=785600.0, ans=0.0 2023-11-19 20:00:10,894 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.23 vs. limit=6.0 2023-11-19 20:00:15,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=785600.0, ans=0.125 2023-11-19 20:00:16,401 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 117850 2023-11-19 20:00:20,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=785666.6666666666, ans=0.0 2023-11-19 20:00:28,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=785666.6666666666, ans=0.125 2023-11-19 20:00:32,106 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 9650, loss[loss=0.07109, simple_loss=0.09245, pruned_loss=0.01315, audio_tagging_loss=0.01171, over 15866.00 frames. ], tot_loss[loss=0.0862, simple_loss=0.1057, pruned_loss=0.0228, audio_tagging_loss=0.01056, over 3051859.09 frames. ], batch size: 61, lr: 6.83e-03, grad_scale: 32.0 2023-11-19 20:00:39,760 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.52 vs. limit=15.0 2023-11-19 20:00:48,900 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.43 vs. limit=15.0 2023-11-19 20:01:03,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=785866.6666666666, ans=0.0 2023-11-19 20:01:13,524 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.07 vs. limit=15.0 2023-11-19 20:01:22,108 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 117900 2023-11-19 20:01:22,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=785933.3333333334, ans=0.1 2023-11-19 20:01:33,173 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.53 vs. limit=15.0 2023-11-19 20:01:34,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=786000.0, ans=0.125 2023-11-19 20:01:37,758 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 9700, loss[loss=0.07462, simple_loss=0.09656, pruned_loss=0.01399, audio_tagging_loss=0.01235, over 15218.00 frames. ], tot_loss[loss=0.08652, simple_loss=0.1066, pruned_loss=0.02299, audio_tagging_loss=0.01026, over 3052227.63 frames. ], batch size: 59, lr: 6.83e-03, grad_scale: 32.0 2023-11-19 20:02:11,050 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.874e+01 8.357e+01 9.241e+01 1.009e+02 1.482e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-19 20:02:26,868 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 117950 2023-11-19 20:02:28,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=786333.3333333334, ans=0.0 2023-11-19 20:02:39,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=786333.3333333334, ans=0.125 2023-11-19 20:02:41,769 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 9750, loss[loss=0.04376, simple_loss=0.04706, pruned_loss=0.00733, audio_tagging_loss=0.0129, over 14821.00 frames. ], tot_loss[loss=0.08576, simple_loss=0.1055, pruned_loss=0.02275, audio_tagging_loss=0.01027, over 3045507.92 frames. ], batch size: 58, lr: 6.82e-03, grad_scale: 32.0 2023-11-19 20:03:11,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=786533.3333333334, ans=0.125 2023-11-19 20:03:19,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=786600.0, ans=0.125 2023-11-19 20:03:30,744 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 118000 2023-11-19 20:03:45,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=786733.3333333334, ans=0.0 2023-11-19 20:03:46,726 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 9800, loss[loss=0.07074, simple_loss=0.08246, pruned_loss=0.01926, audio_tagging_loss=0.01025, over 15831.00 frames. ], tot_loss[loss=0.08523, simple_loss=0.1048, pruned_loss=0.02248, audio_tagging_loss=0.01032, over 3040987.22 frames. ], batch size: 61, lr: 6.82e-03, grad_scale: 32.0 2023-11-19 20:03:48,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=786733.3333333334, ans=0.125 2023-11-19 20:03:52,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=786733.3333333334, ans=0.0 2023-11-19 20:04:20,501 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.556e+01 8.585e+01 9.415e+01 1.041e+02 1.362e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-19 20:04:24,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=786933.3333333334, ans=0.125 2023-11-19 20:04:25,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=786933.3333333334, ans=0.0 2023-11-19 20:04:36,289 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 118050 2023-11-19 20:04:36,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=786933.3333333334, ans=0.07 2023-11-19 20:04:36,739 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.48 vs. limit=15.0 2023-11-19 20:04:43,118 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 20:04:52,878 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 9850, loss[loss=0.08661, simple_loss=0.09678, pruned_loss=0.02429, audio_tagging_loss=0.01393, over 14802.00 frames. ], tot_loss[loss=0.08542, simple_loss=0.1051, pruned_loss=0.02264, audio_tagging_loss=0.01023, over 3045890.55 frames. ], batch size: 58, lr: 6.82e-03, grad_scale: 32.0 2023-11-19 20:05:01,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=787066.6666666666, ans=0.1 2023-11-19 20:05:05,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=787133.3333333334, ans=0.09899494936611666 2023-11-19 20:05:09,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=787133.3333333334, ans=0.2 2023-11-19 20:05:12,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=787133.3333333334, ans=0.125 2023-11-19 20:05:14,500 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.70 vs. limit=12.0 2023-11-19 20:05:30,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=787266.6666666666, ans=0.2 2023-11-19 20:05:33,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=787266.6666666666, ans=0.1 2023-11-19 20:05:34,365 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.80 vs. limit=15.0 2023-11-19 20:05:35,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=787266.6666666666, ans=0.125 2023-11-19 20:05:40,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=787266.6666666666, ans=0.0 2023-11-19 20:05:41,640 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 118100 2023-11-19 20:05:52,216 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.38 vs. limit=22.5 2023-11-19 20:05:54,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=787333.3333333334, ans=0.125 2023-11-19 20:05:56,655 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 9900, loss[loss=0.1046, simple_loss=0.1301, pruned_loss=0.02925, audio_tagging_loss=0.01024, over 16989.00 frames. ], tot_loss[loss=0.08572, simple_loss=0.1056, pruned_loss=0.02271, audio_tagging_loss=0.01021, over 3050787.43 frames. ], batch size: 62, lr: 6.82e-03, grad_scale: 32.0 2023-11-19 20:06:02,304 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.16 vs. limit=15.0 2023-11-19 20:06:17,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=787466.6666666666, ans=0.5 2023-11-19 20:06:17,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=787466.6666666666, ans=0.125 2023-11-19 20:06:30,690 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.320e+01 8.268e+01 9.019e+01 9.736e+01 1.319e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-19 20:06:45,595 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 118150 2023-11-19 20:06:56,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=787666.6666666666, ans=0.125 2023-11-19 20:07:00,287 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 9950, loss[loss=0.07855, simple_loss=0.09927, pruned_loss=0.0209, audio_tagging_loss=0.008011, over 15396.00 frames. ], tot_loss[loss=0.08521, simple_loss=0.1049, pruned_loss=0.02248, audio_tagging_loss=0.01028, over 3055081.97 frames. ], batch size: 59, lr: 6.82e-03, grad_scale: 32.0 2023-11-19 20:07:26,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=787866.6666666666, ans=0.05 2023-11-19 20:07:48,745 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 118200 2023-11-19 20:08:05,800 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 10000, loss[loss=0.08247, simple_loss=0.09927, pruned_loss=0.02064, audio_tagging_loss=0.01219, over 14936.00 frames. ], tot_loss[loss=0.08449, simple_loss=0.1039, pruned_loss=0.02219, audio_tagging_loss=0.01037, over 3051242.27 frames. ], batch size: 56, lr: 6.82e-03, grad_scale: 32.0 2023-11-19 20:08:34,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=788200.0, ans=0.025 2023-11-19 20:08:36,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=788200.0, ans=0.0 2023-11-19 20:08:37,910 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.424e+01 7.892e+01 8.582e+01 9.307e+01 3.708e+02, threshold=1.716e+02, percent-clipped=1.0 2023-11-19 20:08:39,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=788200.0, ans=0.1 2023-11-19 20:08:48,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=788266.6666666666, ans=0.125 2023-11-19 20:08:54,209 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 118250 2023-11-19 20:08:57,729 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.96 vs. limit=22.5 2023-11-19 20:08:57,833 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.32 vs. limit=6.0 2023-11-19 20:08:58,027 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.90 vs. limit=15.0 2023-11-19 20:09:02,514 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 20:09:09,681 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 10050, loss[loss=0.063, simple_loss=0.07677, pruned_loss=0.01244, audio_tagging_loss=0.01217, over 15657.00 frames. ], tot_loss[loss=0.08471, simple_loss=0.1041, pruned_loss=0.02228, audio_tagging_loss=0.01037, over 3055735.85 frames. ], batch size: 61, lr: 6.82e-03, grad_scale: 32.0 2023-11-19 20:09:32,081 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.42 vs. limit=15.0 2023-11-19 20:09:36,564 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 20:09:46,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=788533.3333333334, ans=0.125 2023-11-19 20:09:50,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=788600.0, ans=0.2 2023-11-19 20:09:58,747 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 118300 2023-11-19 20:10:13,505 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 10100, loss[loss=0.06183, simple_loss=0.08054, pruned_loss=0.01193, audio_tagging_loss=0.009631, over 16002.00 frames. ], tot_loss[loss=0.08376, simple_loss=0.1029, pruned_loss=0.02188, audio_tagging_loss=0.01043, over 3051255.53 frames. ], batch size: 59, lr: 6.81e-03, grad_scale: 32.0 2023-11-19 20:10:16,814 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.61 vs. limit=15.0 2023-11-19 20:10:29,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=788800.0, ans=0.125 2023-11-19 20:10:47,500 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.102e+01 8.462e+01 9.399e+01 1.049e+02 1.408e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-19 20:11:02,488 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 118350 2023-11-19 20:11:03,600 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 20:11:15,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=789000.0, ans=0.1 2023-11-19 20:11:18,272 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 10150, loss[loss=0.09198, simple_loss=0.1192, pruned_loss=0.02561, audio_tagging_loss=0.006784, over 14459.00 frames. ], tot_loss[loss=0.08407, simple_loss=0.1032, pruned_loss=0.02204, audio_tagging_loss=0.01041, over 3050032.94 frames. ], batch size: 54, lr: 6.81e-03, grad_scale: 32.0 2023-11-19 20:11:35,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=789133.3333333334, ans=0.1 2023-11-19 20:11:36,111 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.53 vs. limit=15.0 2023-11-19 20:11:46,491 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 20:12:07,326 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 118400 2023-11-19 20:12:10,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=789333.3333333334, ans=0.125 2023-11-19 20:12:17,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=789333.3333333334, ans=0.1 2023-11-19 20:12:21,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=789333.3333333334, ans=0.2 2023-11-19 20:12:23,156 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 10200, loss[loss=0.07597, simple_loss=0.08696, pruned_loss=0.02077, audio_tagging_loss=0.01172, over 15674.00 frames. ], tot_loss[loss=0.084, simple_loss=0.1032, pruned_loss=0.02198, audio_tagging_loss=0.01041, over 3049901.90 frames. ], batch size: 61, lr: 6.81e-03, grad_scale: 32.0 2023-11-19 20:12:27,406 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.502e-03 2023-11-19 20:12:34,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=789466.6666666666, ans=0.5 2023-11-19 20:12:44,401 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 20:12:50,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=789533.3333333334, ans=0.2 2023-11-19 20:12:57,883 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.646e+01 8.394e+01 8.968e+01 9.888e+01 1.322e+02, threshold=1.794e+02, percent-clipped=0.0 2023-11-19 20:13:12,253 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 118450 2023-11-19 20:13:27,000 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 10250, loss[loss=0.08565, simple_loss=0.09872, pruned_loss=0.02254, audio_tagging_loss=0.01375, over 15863.00 frames. ], tot_loss[loss=0.08405, simple_loss=0.1032, pruned_loss=0.02194, audio_tagging_loss=0.0105, over 3048415.65 frames. ], batch size: 57, lr: 6.81e-03, grad_scale: 16.0 2023-11-19 20:13:28,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=789733.3333333334, ans=0.125 2023-11-19 20:13:30,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=789733.3333333334, ans=0.125 2023-11-19 20:14:06,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=789933.3333333334, ans=0.0 2023-11-19 20:14:09,721 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.73 vs. limit=12.0 2023-11-19 20:14:16,516 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 118500 2023-11-19 20:14:19,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=790000.0, ans=0.2 2023-11-19 20:14:31,737 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 10300, loss[loss=0.06571, simple_loss=0.06755, pruned_loss=0.01833, audio_tagging_loss=0.0136, over 15688.00 frames. ], tot_loss[loss=0.08412, simple_loss=0.1032, pruned_loss=0.02198, audio_tagging_loss=0.01052, over 3050646.18 frames. ], batch size: 62, lr: 6.81e-03, grad_scale: 16.0 2023-11-19 20:14:39,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=790066.6666666666, ans=0.0 2023-11-19 20:14:41,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=790066.6666666666, ans=0.2 2023-11-19 20:15:06,833 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.556e+01 8.163e+01 8.831e+01 9.880e+01 1.200e+02, threshold=1.766e+02, percent-clipped=0.0 2023-11-19 20:15:21,128 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 118550 2023-11-19 20:15:37,163 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 10350, loss[loss=0.08186, simple_loss=0.1024, pruned_loss=0.01852, audio_tagging_loss=0.01213, over 15384.00 frames. ], tot_loss[loss=0.08513, simple_loss=0.1045, pruned_loss=0.02226, audio_tagging_loss=0.01063, over 3046314.78 frames. ], batch size: 56, lr: 6.81e-03, grad_scale: 16.0 2023-11-19 20:15:46,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=790400.0, ans=0.125 2023-11-19 20:16:20,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=790600.0, ans=0.1 2023-11-19 20:16:25,913 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 118600 2023-11-19 20:16:31,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=790666.6666666666, ans=0.125 2023-11-19 20:16:41,676 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 10400, loss[loss=0.08317, simple_loss=0.1041, pruned_loss=0.02159, audio_tagging_loss=0.009531, over 15253.00 frames. ], tot_loss[loss=0.08499, simple_loss=0.1044, pruned_loss=0.02213, audio_tagging_loss=0.01064, over 3047667.40 frames. ], batch size: 59, lr: 6.81e-03, grad_scale: 32.0 2023-11-19 20:16:45,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=790733.3333333334, ans=0.0 2023-11-19 20:16:52,623 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 20:16:56,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=790800.0, ans=0.125 2023-11-19 20:17:06,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=790800.0, ans=0.1 2023-11-19 20:17:17,260 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.085e+01 8.510e+01 9.248e+01 1.057e+02 2.087e+02, threshold=1.850e+02, percent-clipped=1.0 2023-11-19 20:17:26,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=790933.3333333334, ans=0.125 2023-11-19 20:17:31,915 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 118650 2023-11-19 20:17:42,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=791000.0, ans=0.125 2023-11-19 20:17:43,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=791000.0, ans=0.0 2023-11-19 20:17:46,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=791066.6666666666, ans=0.125 2023-11-19 20:17:47,270 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 10450, loss[loss=0.09635, simple_loss=0.1247, pruned_loss=0.02577, audio_tagging_loss=0.008251, over 16389.00 frames. ], tot_loss[loss=0.08545, simple_loss=0.1046, pruned_loss=0.02258, audio_tagging_loss=0.01059, over 3047566.09 frames. ], batch size: 62, lr: 6.80e-03, grad_scale: 16.0 2023-11-19 20:18:19,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=791200.0, ans=0.0 2023-11-19 20:18:21,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=791200.0, ans=0.2 2023-11-19 20:18:36,717 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 118700 2023-11-19 20:18:42,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=791333.3333333334, ans=0.125 2023-11-19 20:18:52,550 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 10500, loss[loss=0.09244, simple_loss=0.1153, pruned_loss=0.02696, audio_tagging_loss=0.00782, over 15765.00 frames. ], tot_loss[loss=0.08516, simple_loss=0.1046, pruned_loss=0.0224, audio_tagging_loss=0.01044, over 3045961.43 frames. ], batch size: 58, lr: 6.80e-03, grad_scale: 16.0 2023-11-19 20:18:52,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=791400.0, ans=0.2 2023-11-19 20:18:52,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=791400.0, ans=0.0 2023-11-19 20:18:58,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=791400.0, ans=0.125 2023-11-19 20:19:01,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=791400.0, ans=0.125 2023-11-19 20:19:04,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=791466.6666666666, ans=0.1 2023-11-19 20:19:28,548 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.886e+01 8.049e+01 8.549e+01 9.577e+01 1.136e+02, threshold=1.710e+02, percent-clipped=0.0 2023-11-19 20:19:30,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=791600.0, ans=10.0 2023-11-19 20:19:42,029 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 118750 2023-11-19 20:19:56,824 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 10550, loss[loss=0.1123, simple_loss=0.1412, pruned_loss=0.03334, audio_tagging_loss=0.008336, over 15217.00 frames. ], tot_loss[loss=0.08527, simple_loss=0.1048, pruned_loss=0.02257, audio_tagging_loss=0.01031, over 3045731.12 frames. ], batch size: 55, lr: 6.80e-03, grad_scale: 16.0 2023-11-19 20:19:59,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=791733.3333333334, ans=0.2 2023-11-19 20:20:18,391 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.75 vs. limit=15.0 2023-11-19 20:20:46,392 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 118800 2023-11-19 20:20:57,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=792000.0, ans=0.125 2023-11-19 20:20:57,617 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.32 vs. limit=15.0 2023-11-19 20:21:01,985 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.83 vs. limit=12.0 2023-11-19 20:21:02,566 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 10600, loss[loss=0.08963, simple_loss=0.113, pruned_loss=0.02217, audio_tagging_loss=0.01098, over 14767.00 frames. ], tot_loss[loss=0.0846, simple_loss=0.1039, pruned_loss=0.02226, audio_tagging_loss=0.01039, over 3039958.71 frames. ], batch size: 55, lr: 6.80e-03, grad_scale: 16.0 2023-11-19 20:21:06,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=792066.6666666666, ans=0.125 2023-11-19 20:21:09,571 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.10 vs. limit=6.0 2023-11-19 20:21:12,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=792066.6666666666, ans=0.2 2023-11-19 20:21:16,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=792133.3333333334, ans=0.0 2023-11-19 20:21:23,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=792133.3333333334, ans=0.125 2023-11-19 20:21:24,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=792133.3333333334, ans=0.125 2023-11-19 20:21:27,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=792200.0, ans=0.125 2023-11-19 20:21:38,215 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.354e+01 8.305e+01 8.737e+01 9.475e+01 2.195e+02, threshold=1.747e+02, percent-clipped=1.0 2023-11-19 20:21:51,983 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 118850 2023-11-19 20:21:57,271 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.69 vs. limit=15.0 2023-11-19 20:22:07,729 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 10650, loss[loss=0.09676, simple_loss=0.1115, pruned_loss=0.03209, audio_tagging_loss=0.00894, over 16126.00 frames. ], tot_loss[loss=0.0848, simple_loss=0.1042, pruned_loss=0.0224, audio_tagging_loss=0.01028, over 3040948.67 frames. ], batch size: 59, lr: 6.80e-03, grad_scale: 16.0 2023-11-19 20:22:15,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=792400.0, ans=0.125 2023-11-19 20:22:21,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=792466.6666666666, ans=0.125 2023-11-19 20:22:25,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=792466.6666666666, ans=0.2 2023-11-19 20:22:45,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=792600.0, ans=0.1 2023-11-19 20:22:45,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=792600.0, ans=0.125 2023-11-19 20:22:52,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=792600.0, ans=0.04949747468305833 2023-11-19 20:22:52,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=792600.0, ans=0.125 2023-11-19 20:22:55,527 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.43 vs. limit=12.0 2023-11-19 20:22:57,327 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 118900 2023-11-19 20:23:12,221 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 10700, loss[loss=0.09098, simple_loss=0.1265, pruned_loss=0.02154, audio_tagging_loss=0.006183, over 15596.00 frames. ], tot_loss[loss=0.08574, simple_loss=0.1055, pruned_loss=0.02278, audio_tagging_loss=0.01022, over 3044284.67 frames. ], batch size: 59, lr: 6.80e-03, grad_scale: 16.0 2023-11-19 20:23:25,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=792800.0, ans=0.125 2023-11-19 20:23:30,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=792800.0, ans=0.2 2023-11-19 20:23:32,225 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.46 vs. limit=15.0 2023-11-19 20:23:48,371 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.590e+01 8.344e+01 9.124e+01 9.874e+01 1.194e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-19 20:23:55,049 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.22 vs. limit=15.0 2023-11-19 20:24:00,569 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 118950 2023-11-19 20:24:00,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=792933.3333333334, ans=0.125 2023-11-19 20:24:07,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=793000.0, ans=0.07 2023-11-19 20:24:10,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=793000.0, ans=0.125 2023-11-19 20:24:16,642 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 10750, loss[loss=0.08551, simple_loss=0.1097, pruned_loss=0.01989, audio_tagging_loss=0.01078, over 14258.00 frames. ], tot_loss[loss=0.08582, simple_loss=0.1057, pruned_loss=0.02278, audio_tagging_loss=0.01021, over 3044952.30 frames. ], batch size: 55, lr: 6.80e-03, grad_scale: 16.0 2023-11-19 20:24:19,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=793066.6666666666, ans=0.0 2023-11-19 20:24:22,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=793066.6666666666, ans=0.125 2023-11-19 20:24:43,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=793200.0, ans=0.125 2023-11-19 20:25:04,921 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 119000 2023-11-19 20:25:18,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=793333.3333333334, ans=0.0 2023-11-19 20:25:19,392 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.02 vs. limit=15.0 2023-11-19 20:25:21,035 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 10800, loss[loss=0.08015, simple_loss=0.09539, pruned_loss=0.02498, audio_tagging_loss=0.007483, over 15023.00 frames. ], tot_loss[loss=0.08564, simple_loss=0.1055, pruned_loss=0.02269, audio_tagging_loss=0.01018, over 3043261.84 frames. ], batch size: 57, lr: 6.79e-03, grad_scale: 32.0 2023-11-19 20:25:21,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=793400.0, ans=0.2 2023-11-19 20:25:37,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=793466.6666666666, ans=0.0 2023-11-19 20:25:43,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=793466.6666666666, ans=0.125 2023-11-19 20:25:45,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=793533.3333333334, ans=0.035 2023-11-19 20:25:47,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=793533.3333333334, ans=0.125 2023-11-19 20:25:56,698 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.855e+01 8.324e+01 9.413e+01 1.037e+02 1.353e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-19 20:26:09,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=793600.0, ans=0.09899494936611666 2023-11-19 20:26:10,132 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 119050 2023-11-19 20:26:24,893 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 10850, loss[loss=0.09479, simple_loss=0.1172, pruned_loss=0.02524, audio_tagging_loss=0.01095, over 14888.00 frames. ], tot_loss[loss=0.08494, simple_loss=0.1043, pruned_loss=0.0225, audio_tagging_loss=0.01027, over 3038508.55 frames. ], batch size: 56, lr: 6.79e-03, grad_scale: 32.0 2023-11-19 20:26:25,194 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 20:26:43,184 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.57 vs. limit=15.0 2023-11-19 20:26:49,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=793866.6666666666, ans=0.125 2023-11-19 20:27:03,119 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.61 vs. limit=10.0 2023-11-19 20:27:13,722 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 119100 2023-11-19 20:27:21,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=794000.0, ans=0.5 2023-11-19 20:27:22,332 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 20:27:28,418 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 10900, loss[loss=0.0654, simple_loss=0.07936, pruned_loss=0.01612, audio_tagging_loss=0.009599, over 15066.00 frames. ], tot_loss[loss=0.08507, simple_loss=0.1045, pruned_loss=0.02249, audio_tagging_loss=0.01031, over 3039506.73 frames. ], batch size: 56, lr: 6.79e-03, grad_scale: 16.0 2023-11-19 20:27:44,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=794133.3333333334, ans=0.0 2023-11-19 20:27:45,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=794133.3333333334, ans=0.0 2023-11-19 20:27:50,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=794133.3333333334, ans=0.125 2023-11-19 20:28:05,858 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.516e+01 8.198e+01 8.697e+01 9.317e+01 1.364e+02, threshold=1.739e+02, percent-clipped=0.0 2023-11-19 20:28:07,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=794266.6666666666, ans=0.2 2023-11-19 20:28:16,971 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 119150 2023-11-19 20:28:18,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=794333.3333333334, ans=0.05 2023-11-19 20:28:25,418 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.81 vs. limit=15.0 2023-11-19 20:28:33,881 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 10950, loss[loss=0.09208, simple_loss=0.1181, pruned_loss=0.02356, audio_tagging_loss=0.009486, over 16459.00 frames. ], tot_loss[loss=0.08523, simple_loss=0.1047, pruned_loss=0.02257, audio_tagging_loss=0.01031, over 3040380.90 frames. ], batch size: 60, lr: 6.79e-03, grad_scale: 16.0 2023-11-19 20:28:35,648 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.25 vs. limit=10.0 2023-11-19 20:29:03,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=794533.3333333334, ans=0.1 2023-11-19 20:29:18,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=794600.0, ans=0.0 2023-11-19 20:29:23,128 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 119200 2023-11-19 20:29:30,847 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=794666.6666666666, ans=0.125 2023-11-19 20:29:37,870 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 11000, loss[loss=0.0822, simple_loss=0.1079, pruned_loss=0.01682, audio_tagging_loss=0.01142, over 16440.00 frames. ], tot_loss[loss=0.08463, simple_loss=0.1039, pruned_loss=0.02225, audio_tagging_loss=0.01041, over 3042031.13 frames. ], batch size: 59, lr: 6.79e-03, grad_scale: 16.0 2023-11-19 20:29:45,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=794733.3333333334, ans=0.95 2023-11-19 20:29:45,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=794733.3333333334, ans=0.2 2023-11-19 20:29:46,434 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 20:30:15,771 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.738e+01 8.172e+01 9.020e+01 9.721e+01 1.365e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-19 20:30:16,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=794933.3333333334, ans=0.125 2023-11-19 20:30:21,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=794933.3333333334, ans=0.125 2023-11-19 20:30:21,317 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.02 vs. limit=6.0 2023-11-19 20:30:27,046 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 119250 2023-11-19 20:30:41,674 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 11050, loss[loss=0.1037, simple_loss=0.1388, pruned_loss=0.02545, audio_tagging_loss=0.008831, over 15647.00 frames. ], tot_loss[loss=0.08441, simple_loss=0.1034, pruned_loss=0.02213, audio_tagging_loss=0.01058, over 3039638.70 frames. ], batch size: 55, lr: 6.79e-03, grad_scale: 16.0 2023-11-19 20:30:45,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=795066.6666666666, ans=0.1 2023-11-19 20:30:53,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=795133.3333333334, ans=0.125 2023-11-19 20:31:11,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=795200.0, ans=0.0 2023-11-19 20:31:18,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=795266.6666666666, ans=0.035 2023-11-19 20:31:29,781 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 119300 2023-11-19 20:31:42,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=795333.3333333334, ans=0.125 2023-11-19 20:31:44,855 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 11100, loss[loss=0.09567, simple_loss=0.1159, pruned_loss=0.0282, audio_tagging_loss=0.009533, over 14335.00 frames. ], tot_loss[loss=0.08513, simple_loss=0.1042, pruned_loss=0.02237, audio_tagging_loss=0.01069, over 3042583.20 frames. ], batch size: 54, lr: 6.79e-03, grad_scale: 16.0 2023-11-19 20:32:01,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=795466.6666666666, ans=0.1 2023-11-19 20:32:21,492 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.097e+01 8.553e+01 9.407e+01 1.079e+02 1.400e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-19 20:32:23,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=795600.0, ans=0.125 2023-11-19 20:32:33,125 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 119350 2023-11-19 20:32:48,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=795733.3333333334, ans=0.125 2023-11-19 20:32:49,463 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 11150, loss[loss=0.08411, simple_loss=0.1075, pruned_loss=0.02101, audio_tagging_loss=0.00936, over 14493.00 frames. ], tot_loss[loss=0.08504, simple_loss=0.104, pruned_loss=0.02227, audio_tagging_loss=0.01076, over 3042954.46 frames. ], batch size: 54, lr: 6.78e-03, grad_scale: 16.0 2023-11-19 20:32:50,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=795733.3333333334, ans=0.1 2023-11-19 20:33:10,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=795800.0, ans=0.125 2023-11-19 20:33:24,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=795866.6666666666, ans=0.2 2023-11-19 20:33:37,732 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 119400 2023-11-19 20:33:52,392 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 11200, loss[loss=0.0985, simple_loss=0.1164, pruned_loss=0.0271, audio_tagging_loss=0.01321, over 15050.00 frames. ], tot_loss[loss=0.08515, simple_loss=0.1042, pruned_loss=0.02224, audio_tagging_loss=0.01083, over 3042020.33 frames. ], batch size: 55, lr: 6.78e-03, grad_scale: 32.0 2023-11-19 20:33:53,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=796066.6666666666, ans=0.2 2023-11-19 20:34:15,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=796133.3333333334, ans=0.125 2023-11-19 20:34:22,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=796200.0, ans=0.125 2023-11-19 20:34:27,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=796200.0, ans=0.1 2023-11-19 20:34:28,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=796200.0, ans=0.125 2023-11-19 20:34:30,416 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.715e+01 8.063e+01 8.813e+01 9.513e+01 1.484e+02, threshold=1.763e+02, percent-clipped=0.0 2023-11-19 20:34:41,316 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 119450 2023-11-19 20:34:56,398 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 11250, loss[loss=0.05397, simple_loss=0.05892, pruned_loss=0.01262, audio_tagging_loss=0.01189, over 15128.00 frames. ], tot_loss[loss=0.08459, simple_loss=0.1032, pruned_loss=0.02213, audio_tagging_loss=0.01086, over 3040484.76 frames. ], batch size: 59, lr: 6.78e-03, grad_scale: 32.0 2023-11-19 20:35:06,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=796400.0, ans=0.125 2023-11-19 20:35:13,512 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.76 vs. limit=12.0 2023-11-19 20:35:41,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=796600.0, ans=0.125 2023-11-19 20:35:42,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=796600.0, ans=0.05 2023-11-19 20:35:45,352 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 119500 2023-11-19 20:35:56,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=796666.6666666666, ans=0.125 2023-11-19 20:36:01,683 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 11300, loss[loss=0.07004, simple_loss=0.08872, pruned_loss=0.01576, audio_tagging_loss=0.009917, over 15243.00 frames. ], tot_loss[loss=0.08472, simple_loss=0.1036, pruned_loss=0.02228, audio_tagging_loss=0.01062, over 3037592.65 frames. ], batch size: 57, lr: 6.78e-03, grad_scale: 32.0 2023-11-19 20:36:28,764 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.09 vs. limit=15.0 2023-11-19 20:36:33,609 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.04 vs. limit=15.0 2023-11-19 20:36:38,569 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.085e+01 8.176e+01 8.864e+01 9.663e+01 1.255e+02, threshold=1.773e+02, percent-clipped=0.0 2023-11-19 20:36:50,729 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 119550 2023-11-19 20:36:57,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=797000.0, ans=0.0 2023-11-19 20:36:58,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=797000.0, ans=0.0 2023-11-19 20:37:05,284 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 11350, loss[loss=0.07563, simple_loss=0.09311, pruned_loss=0.02062, audio_tagging_loss=0.008459, over 15111.00 frames. ], tot_loss[loss=0.08494, simple_loss=0.1044, pruned_loss=0.02229, audio_tagging_loss=0.01046, over 3044281.33 frames. ], batch size: 56, lr: 6.78e-03, grad_scale: 32.0 2023-11-19 20:37:27,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=797133.3333333334, ans=0.125 2023-11-19 20:37:54,003 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 119600 2023-11-19 20:38:08,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=797400.0, ans=0.0 2023-11-19 20:38:09,491 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 11400, loss[loss=0.06664, simple_loss=0.07921, pruned_loss=0.01417, audio_tagging_loss=0.01287, over 15751.00 frames. ], tot_loss[loss=0.08401, simple_loss=0.1031, pruned_loss=0.02206, audio_tagging_loss=0.01038, over 3036379.09 frames. ], batch size: 63, lr: 6.78e-03, grad_scale: 32.0 2023-11-19 20:38:30,914 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=797466.6666666666, ans=0.125 2023-11-19 20:38:34,876 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.02 vs. limit=6.0 2023-11-19 20:38:39,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=797533.3333333334, ans=0.125 2023-11-19 20:38:46,566 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.427e+01 8.335e+01 9.100e+01 1.007e+02 1.269e+02, threshold=1.820e+02, percent-clipped=0.0 2023-11-19 20:38:56,473 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.78 vs. limit=22.5 2023-11-19 20:38:58,338 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 119650 2023-11-19 20:39:14,311 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 11450, loss[loss=0.1009, simple_loss=0.1128, pruned_loss=0.03269, audio_tagging_loss=0.01183, over 15013.00 frames. ], tot_loss[loss=0.08453, simple_loss=0.1034, pruned_loss=0.02241, audio_tagging_loss=0.01043, over 3038566.71 frames. ], batch size: 58, lr: 6.78e-03, grad_scale: 32.0 2023-11-19 20:39:23,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=797733.3333333334, ans=0.1 2023-11-19 20:39:48,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=797866.6666666666, ans=0.07 2023-11-19 20:39:54,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=797933.3333333334, ans=0.04949747468305833 2023-11-19 20:40:03,128 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 119700 2023-11-19 20:40:11,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=798000.0, ans=0.125 2023-11-19 20:40:15,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=798000.0, ans=0.125 2023-11-19 20:40:18,267 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 11500, loss[loss=0.1212, simple_loss=0.1459, pruned_loss=0.04027, audio_tagging_loss=0.007976, over 15014.00 frames. ], tot_loss[loss=0.08495, simple_loss=0.1041, pruned_loss=0.02255, audio_tagging_loss=0.01034, over 3038982.89 frames. ], batch size: 56, lr: 6.77e-03, grad_scale: 32.0 2023-11-19 20:40:19,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=798066.6666666666, ans=0.125 2023-11-19 20:40:53,945 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.91 vs. limit=22.5 2023-11-19 20:40:55,529 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.898e+01 8.252e+01 9.046e+01 9.695e+01 1.384e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-19 20:41:07,301 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 119750 2023-11-19 20:41:08,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=798333.3333333334, ans=0.1 2023-11-19 20:41:16,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=798333.3333333334, ans=0.2 2023-11-19 20:41:20,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=798333.3333333334, ans=0.2 2023-11-19 20:41:22,502 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 11550, loss[loss=0.07816, simple_loss=0.08871, pruned_loss=0.02237, audio_tagging_loss=0.01143, over 15289.00 frames. ], tot_loss[loss=0.08546, simple_loss=0.105, pruned_loss=0.02274, audio_tagging_loss=0.01022, over 3043470.72 frames. ], batch size: 58, lr: 6.77e-03, grad_scale: 32.0 2023-11-19 20:41:57,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=798533.3333333334, ans=0.125 2023-11-19 20:41:58,359 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 20:42:11,172 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 119800 2023-11-19 20:42:15,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=798666.6666666666, ans=0.125 2023-11-19 20:42:27,331 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 11600, loss[loss=0.08978, simple_loss=0.1148, pruned_loss=0.02301, audio_tagging_loss=0.009384, over 15501.00 frames. ], tot_loss[loss=0.08571, simple_loss=0.1055, pruned_loss=0.02275, audio_tagging_loss=0.01021, over 3044829.42 frames. ], batch size: 58, lr: 6.77e-03, grad_scale: 32.0 2023-11-19 20:42:27,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=798733.3333333334, ans=0.125 2023-11-19 20:42:44,686 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.34 vs. limit=10.0 2023-11-19 20:43:04,737 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.840e+01 8.658e+01 9.480e+01 1.096e+02 1.560e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-19 20:43:16,048 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 119850 2023-11-19 20:43:16,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=798933.3333333334, ans=0.0 2023-11-19 20:43:27,030 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 20:43:31,195 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 11650, loss[loss=0.06819, simple_loss=0.06688, pruned_loss=0.01832, audio_tagging_loss=0.01643, over 16071.00 frames. ], tot_loss[loss=0.08496, simple_loss=0.1043, pruned_loss=0.02242, audio_tagging_loss=0.01037, over 3045400.32 frames. ], batch size: 63, lr: 6.77e-03, grad_scale: 32.0 2023-11-19 20:43:41,467 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.13 vs. limit=15.0 2023-11-19 20:43:59,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=799200.0, ans=0.0 2023-11-19 20:44:00,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=799200.0, ans=0.1 2023-11-19 20:44:13,255 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.53 vs. limit=6.0 2023-11-19 20:44:16,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=799266.6666666666, ans=0.2 2023-11-19 20:44:19,803 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 119900 2023-11-19 20:44:35,570 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 11700, loss[loss=0.07864, simple_loss=0.09538, pruned_loss=0.01799, audio_tagging_loss=0.01296, over 17011.00 frames. ], tot_loss[loss=0.08427, simple_loss=0.1031, pruned_loss=0.02223, audio_tagging_loss=0.01047, over 3050608.09 frames. ], batch size: 66, lr: 6.77e-03, grad_scale: 32.0 2023-11-19 20:45:07,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=799533.3333333334, ans=0.2 2023-11-19 20:45:14,282 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.866e+01 8.190e+01 9.042e+01 9.907e+01 1.390e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-19 20:45:17,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=799600.0, ans=0.0 2023-11-19 20:45:24,732 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 119950 2023-11-19 20:45:40,592 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 11750, loss[loss=0.08854, simple_loss=0.1076, pruned_loss=0.02197, audio_tagging_loss=0.01278, over 14386.00 frames. ], tot_loss[loss=0.08501, simple_loss=0.1041, pruned_loss=0.02254, audio_tagging_loss=0.01043, over 3047042.67 frames. ], batch size: 57, lr: 6.77e-03, grad_scale: 32.0 2023-11-19 20:45:56,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=799800.0, ans=0.125 2023-11-19 20:46:04,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=799866.6666666666, ans=0.125 2023-11-19 20:46:08,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=799866.6666666666, ans=0.0 2023-11-19 20:46:19,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=799933.3333333334, ans=0.125 2023-11-19 20:46:29,870 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 120000 2023-11-19 20:46:47,827 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 11800, loss[loss=0.07251, simple_loss=0.08225, pruned_loss=0.01857, audio_tagging_loss=0.01282, over 15384.00 frames. ], tot_loss[loss=0.08556, simple_loss=0.1048, pruned_loss=0.02268, audio_tagging_loss=0.01046, over 3046804.38 frames. ], batch size: 59, lr: 6.77e-03, grad_scale: 32.0 2023-11-19 20:47:18,825 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.17 vs. limit=15.0 2023-11-19 20:47:24,747 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 20:47:26,910 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.227e+01 8.456e+01 9.093e+01 9.839e+01 1.192e+02, threshold=1.819e+02, percent-clipped=0.0 2023-11-19 20:47:36,772 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 120050 2023-11-19 20:47:43,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=800333.3333333334, ans=0.1 2023-11-19 20:47:44,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=800333.3333333334, ans=0.1 2023-11-19 20:47:51,928 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 11850, loss[loss=0.08641, simple_loss=0.09971, pruned_loss=0.02438, audio_tagging_loss=0.01217, over 14228.00 frames. ], tot_loss[loss=0.08565, simple_loss=0.1051, pruned_loss=0.02263, audio_tagging_loss=0.01049, over 3044545.56 frames. ], batch size: 54, lr: 6.76e-03, grad_scale: 32.0 2023-11-19 20:48:00,494 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.44 vs. limit=12.0 2023-11-19 20:48:03,210 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.06 vs. limit=12.0 2023-11-19 20:48:22,628 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.05 vs. limit=10.0 2023-11-19 20:48:23,477 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 20:48:32,565 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.40 vs. limit=15.0 2023-11-19 20:48:37,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=800600.0, ans=0.1 2023-11-19 20:48:37,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=800600.0, ans=0.1 2023-11-19 20:48:40,953 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 120100 2023-11-19 20:48:57,225 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 11900, loss[loss=0.1005, simple_loss=0.1322, pruned_loss=0.0264, audio_tagging_loss=0.007958, over 15450.00 frames. ], tot_loss[loss=0.08634, simple_loss=0.1063, pruned_loss=0.02268, audio_tagging_loss=0.0105, over 3045017.40 frames. ], batch size: 57, lr: 6.76e-03, grad_scale: 32.0 2023-11-19 20:48:59,066 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.80 vs. limit=15.0 2023-11-19 20:49:06,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=800733.3333333334, ans=0.1 2023-11-19 20:49:31,606 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.52 vs. limit=15.0 2023-11-19 20:49:33,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=800933.3333333334, ans=0.125 2023-11-19 20:49:35,949 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.865e+01 8.251e+01 9.063e+01 9.820e+01 1.973e+02, threshold=1.813e+02, percent-clipped=1.0 2023-11-19 20:49:42,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=800933.3333333334, ans=0.0 2023-11-19 20:49:45,852 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 120150 2023-11-19 20:50:00,550 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 11950, loss[loss=0.09232, simple_loss=0.1103, pruned_loss=0.02637, audio_tagging_loss=0.01082, over 15057.00 frames. ], tot_loss[loss=0.08599, simple_loss=0.1059, pruned_loss=0.02253, audio_tagging_loss=0.01052, over 3051301.39 frames. ], batch size: 56, lr: 6.76e-03, grad_scale: 16.0 2023-11-19 20:50:18,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=801133.3333333334, ans=0.1 2023-11-19 20:50:43,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=801266.6666666666, ans=0.1 2023-11-19 20:50:44,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=801266.6666666666, ans=0.1 2023-11-19 20:50:48,464 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 120200 2023-11-19 20:50:49,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=801333.3333333334, ans=0.1 2023-11-19 20:50:55,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=801333.3333333334, ans=0.125 2023-11-19 20:50:59,701 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.47 vs. limit=15.0 2023-11-19 20:51:02,527 INFO [train_asr.py:1262] (1/4) Epoch 10, batch 12000, loss[loss=0.08737, simple_loss=0.1015, pruned_loss=0.02392, audio_tagging_loss=0.01271, over 14319.00 frames. ], tot_loss[loss=0.08561, simple_loss=0.105, pruned_loss=0.02241, audio_tagging_loss=0.0107, over 3044255.84 frames. ], batch size: 54, lr: 6.76e-03, grad_scale: 32.0 2023-11-19 20:51:02,528 INFO [train_asr.py:1285] (1/4) Computing validation loss 2023-11-19 20:51:22,890 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([5.3204, 4.5086, 4.7169, 4.7150], device='cuda:1') 2023-11-19 20:51:23,842 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.6575, 3.5833, 3.7166, 3.2700], device='cuda:1') 2023-11-19 20:51:41,786 INFO [train_asr.py:1294] (1/4) Epoch 10, validation: loss=0.06456, simple_loss=0.05518, pruned_loss=0.006322, audio_tagging_loss=0.03065, over 4681554.00 frames. 2023-11-19 20:51:41,787 INFO [train_asr.py:1295] (1/4) Maximum memory allocated so far is 26082MB 2023-11-19 20:51:54,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=801466.6666666666, ans=0.125 2023-11-19 20:51:59,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=801466.6666666666, ans=0.125 2023-11-19 20:52:44,688 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 0, loss[loss=0.08895, simple_loss=0.08547, pruned_loss=0.0182, audio_tagging_loss=0.02801, over 15918.00 frames. ], tot_loss[loss=0.08895, simple_loss=0.08547, pruned_loss=0.0182, audio_tagging_loss=0.02801, over 15918.00 frames. ], batch size: 60, lr: 6.45e-03, grad_scale: 32.0 2023-11-19 20:52:44,693 INFO [train_asr.py:1285] (1/4) Computing validation loss 2023-11-19 20:53:06,949 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.5345, 3.6317, 4.3094, 3.1829], device='cuda:1') 2023-11-19 20:53:19,992 INFO [train_asr.py:1294] (1/4) Epoch 11, validation: loss=0.06409, simple_loss=0.05518, pruned_loss=0.006264, audio_tagging_loss=0.03024, over 4681554.00 frames. 2023-11-19 20:53:19,993 INFO [train_asr.py:1295] (1/4) Maximum memory allocated so far is 26082MB 2023-11-19 20:53:24,372 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.52 vs. limit=15.0 2023-11-19 20:53:32,239 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.455e+01 8.493e+01 9.059e+01 9.664e+01 1.642e+02, threshold=1.812e+02, percent-clipped=0.0 2023-11-19 20:53:35,279 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.86 vs. limit=15.0 2023-11-19 20:53:41,116 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 120250 2023-11-19 20:53:42,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=801606.6666666666, ans=0.0 2023-11-19 20:53:56,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=801673.3333333334, ans=0.025 2023-11-19 20:54:06,878 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.16 vs. limit=22.5 2023-11-19 20:54:14,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=801806.6666666666, ans=0.125 2023-11-19 20:54:18,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=801806.6666666666, ans=0.125 2023-11-19 20:54:24,288 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 50, loss[loss=0.1101, simple_loss=0.1322, pruned_loss=0.02522, audio_tagging_loss=0.01883, over 14870.00 frames. ], tot_loss[loss=0.09771, simple_loss=0.1108, pruned_loss=0.02303, audio_tagging_loss=0.01928, over 691338.80 frames. ], batch size: 56, lr: 6.45e-03, grad_scale: 32.0 2023-11-19 20:54:30,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=801873.3333333334, ans=0.125 2023-11-19 20:54:46,571 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 120300 2023-11-19 20:54:55,706 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 20:55:01,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=802006.6666666666, ans=0.2 2023-11-19 20:55:02,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=802006.6666666666, ans=0.0 2023-11-19 20:55:17,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=802140.0, ans=0.0 2023-11-19 20:55:30,003 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 100, loss[loss=0.08121, simple_loss=0.0926, pruned_loss=0.01908, audio_tagging_loss=0.01583, over 15109.00 frames. ], tot_loss[loss=0.0951, simple_loss=0.1073, pruned_loss=0.0225, audio_tagging_loss=0.01895, over 1220833.04 frames. ], batch size: 56, lr: 6.45e-03, grad_scale: 32.0 2023-11-19 20:55:44,013 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.071e+01 8.908e+01 9.605e+01 1.032e+02 1.207e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-19 20:55:52,791 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 120350 2023-11-19 20:56:19,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=802406.6666666666, ans=0.125 2023-11-19 20:56:35,879 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 150, loss[loss=0.07698, simple_loss=0.09512, pruned_loss=0.0188, audio_tagging_loss=0.01062, over 15761.00 frames. ], tot_loss[loss=0.09274, simple_loss=0.1063, pruned_loss=0.02238, audio_tagging_loss=0.01721, over 1621018.63 frames. ], batch size: 60, lr: 6.45e-03, grad_scale: 32.0 2023-11-19 20:56:38,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=802540.0, ans=0.1 2023-11-19 20:56:57,464 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 120400 2023-11-19 20:57:02,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=802673.3333333334, ans=0.0 2023-11-19 20:57:07,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=802673.3333333334, ans=0.0 2023-11-19 20:57:12,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=802673.3333333334, ans=0.1 2023-11-19 20:57:16,754 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.98 vs. limit=22.5 2023-11-19 20:57:19,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=802740.0, ans=0.2 2023-11-19 20:57:21,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=802740.0, ans=0.0 2023-11-19 20:57:36,669 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=11.43 vs. limit=12.0 2023-11-19 20:57:39,307 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.78 vs. limit=22.5 2023-11-19 20:57:40,922 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 200, loss[loss=0.08625, simple_loss=0.1106, pruned_loss=0.02287, audio_tagging_loss=0.008048, over 14513.00 frames. ], tot_loss[loss=0.08939, simple_loss=0.1043, pruned_loss=0.02198, audio_tagging_loss=0.01524, over 1932180.76 frames. ], batch size: 55, lr: 6.45e-03, grad_scale: 32.0 2023-11-19 20:57:54,012 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.513e+01 8.370e+01 8.919e+01 1.001e+02 1.772e+02, threshold=1.784e+02, percent-clipped=0.0 2023-11-19 20:57:54,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=802940.0, ans=0.2 2023-11-19 20:58:01,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=802940.0, ans=0.0 2023-11-19 20:58:02,722 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 120450 2023-11-19 20:58:16,542 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.21 vs. limit=15.0 2023-11-19 20:58:29,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=803073.3333333334, ans=0.1 2023-11-19 20:58:35,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=803140.0, ans=0.04949747468305833 2023-11-19 20:58:46,009 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 250, loss[loss=0.09721, simple_loss=0.1246, pruned_loss=0.0275, audio_tagging_loss=0.007404, over 15181.00 frames. ], tot_loss[loss=0.08872, simple_loss=0.1056, pruned_loss=0.02222, audio_tagging_loss=0.0137, over 2187607.85 frames. ], batch size: 57, lr: 6.45e-03, grad_scale: 16.0 2023-11-19 20:58:49,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=803206.6666666666, ans=0.125 2023-11-19 20:58:53,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=803206.6666666666, ans=0.0 2023-11-19 20:59:08,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=803273.3333333334, ans=0.125 2023-11-19 20:59:09,454 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 120500 2023-11-19 20:59:23,996 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.60 vs. limit=10.0 2023-11-19 20:59:30,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=803406.6666666666, ans=0.1 2023-11-19 20:59:35,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=803406.6666666666, ans=0.0 2023-11-19 20:59:47,352 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.60 vs. limit=15.0 2023-11-19 20:59:52,069 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 300, loss[loss=0.07487, simple_loss=0.09367, pruned_loss=0.01785, audio_tagging_loss=0.01018, over 14893.00 frames. ], tot_loss[loss=0.08864, simple_loss=0.1064, pruned_loss=0.02268, audio_tagging_loss=0.01275, over 2379308.70 frames. ], batch size: 56, lr: 6.45e-03, grad_scale: 16.0 2023-11-19 20:59:54,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=803540.0, ans=0.125 2023-11-19 20:59:57,675 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.07 vs. limit=15.0 2023-11-19 21:00:05,454 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.76 vs. limit=15.0 2023-11-19 21:00:05,710 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.754e+01 8.387e+01 8.949e+01 9.814e+01 1.274e+02, threshold=1.790e+02, percent-clipped=0.0 2023-11-19 21:00:13,390 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 120550 2023-11-19 21:00:46,865 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.54 vs. limit=6.0 2023-11-19 21:00:52,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=803806.6666666666, ans=0.0 2023-11-19 21:00:56,043 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 350, loss[loss=0.05971, simple_loss=0.06234, pruned_loss=0.01424, audio_tagging_loss=0.0143, over 14773.00 frames. ], tot_loss[loss=0.08743, simple_loss=0.1057, pruned_loss=0.02258, audio_tagging_loss=0.01202, over 2530970.71 frames. ], batch size: 56, lr: 6.44e-03, grad_scale: 16.0 2023-11-19 21:01:03,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=803873.3333333334, ans=0.125 2023-11-19 21:01:17,856 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 120600 2023-11-19 21:01:19,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=803940.0, ans=0.125 2023-11-19 21:01:44,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=804073.3333333334, ans=0.1 2023-11-19 21:01:55,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=804140.0, ans=0.07 2023-11-19 21:02:01,513 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 400, loss[loss=0.09158, simple_loss=0.1178, pruned_loss=0.02008, audio_tagging_loss=0.01261, over 14843.00 frames. ], tot_loss[loss=0.08732, simple_loss=0.1058, pruned_loss=0.02281, audio_tagging_loss=0.01161, over 2636863.22 frames. ], batch size: 54, lr: 6.44e-03, grad_scale: 32.0 2023-11-19 21:02:06,071 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.86 vs. limit=10.0 2023-11-19 21:02:15,713 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.819e+01 8.262e+01 8.810e+01 9.660e+01 1.540e+02, threshold=1.762e+02, percent-clipped=0.0 2023-11-19 21:02:24,503 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 120650 2023-11-19 21:02:29,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=804340.0, ans=0.0 2023-11-19 21:02:35,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=804340.0, ans=0.125 2023-11-19 21:02:57,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=804473.3333333334, ans=0.05 2023-11-19 21:03:06,996 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 450, loss[loss=0.09423, simple_loss=0.1305, pruned_loss=0.02037, audio_tagging_loss=0.008593, over 15418.00 frames. ], tot_loss[loss=0.08624, simple_loss=0.1051, pruned_loss=0.02241, audio_tagging_loss=0.01129, over 2728929.59 frames. ], batch size: 59, lr: 6.44e-03, grad_scale: 32.0 2023-11-19 21:03:13,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=804540.0, ans=0.125 2023-11-19 21:03:15,726 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.88 vs. limit=15.0 2023-11-19 21:03:25,816 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.83 vs. limit=10.0 2023-11-19 21:03:27,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=804606.6666666666, ans=0.0 2023-11-19 21:03:28,955 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 120700 2023-11-19 21:03:29,386 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.43 vs. limit=15.0 2023-11-19 21:03:39,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=804673.3333333334, ans=0.0 2023-11-19 21:03:50,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=804740.0, ans=0.1 2023-11-19 21:04:12,481 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 500, loss[loss=0.06431, simple_loss=0.07996, pruned_loss=0.01411, audio_tagging_loss=0.01022, over 14262.00 frames. ], tot_loss[loss=0.08568, simple_loss=0.1047, pruned_loss=0.02231, audio_tagging_loss=0.01103, over 2793622.85 frames. ], batch size: 53, lr: 6.44e-03, grad_scale: 32.0 2023-11-19 21:04:26,059 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.811e+01 8.498e+01 9.308e+01 1.026e+02 1.855e+02, threshold=1.862e+02, percent-clipped=1.0 2023-11-19 21:04:34,112 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 120750 2023-11-19 21:04:47,787 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.93 vs. limit=12.0 2023-11-19 21:04:48,730 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=805006.6666666666, ans=0.2 2023-11-19 21:04:59,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=805073.3333333334, ans=0.125 2023-11-19 21:05:03,026 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 21:05:10,386 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=805140.0, ans=0.0 2023-11-19 21:05:16,336 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 550, loss[loss=0.0731, simple_loss=0.08909, pruned_loss=0.02016, audio_tagging_loss=0.008395, over 17116.00 frames. ], tot_loss[loss=0.08485, simple_loss=0.1039, pruned_loss=0.02201, audio_tagging_loss=0.01089, over 2849683.98 frames. ], batch size: 68, lr: 6.44e-03, grad_scale: 32.0 2023-11-19 21:05:16,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=805206.6666666666, ans=0.0 2023-11-19 21:05:22,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=805206.6666666666, ans=0.1 2023-11-19 21:05:33,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=805273.3333333334, ans=0.1 2023-11-19 21:05:37,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=805273.3333333334, ans=0.125 2023-11-19 21:05:39,029 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 120800 2023-11-19 21:05:44,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=805340.0, ans=0.0 2023-11-19 21:05:57,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=805406.6666666666, ans=0.0 2023-11-19 21:06:21,516 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 600, loss[loss=0.08555, simple_loss=0.0983, pruned_loss=0.02504, audio_tagging_loss=0.01136, over 15246.00 frames. ], tot_loss[loss=0.08494, simple_loss=0.1042, pruned_loss=0.02209, audio_tagging_loss=0.01075, over 2889873.01 frames. ], batch size: 59, lr: 6.44e-03, grad_scale: 32.0 2023-11-19 21:06:36,954 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.892e+01 8.276e+01 9.038e+01 9.770e+01 1.365e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-19 21:06:44,468 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 120850 2023-11-19 21:06:57,262 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.26 vs. limit=6.0 2023-11-19 21:07:02,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=805740.0, ans=0.2 2023-11-19 21:07:25,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=805873.3333333334, ans=0.2 2023-11-19 21:07:27,204 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 650, loss[loss=0.07729, simple_loss=0.09278, pruned_loss=0.018, audio_tagging_loss=0.0129, over 15529.00 frames. ], tot_loss[loss=0.08517, simple_loss=0.1044, pruned_loss=0.02228, audio_tagging_loss=0.01067, over 2926549.95 frames. ], batch size: 57, lr: 6.44e-03, grad_scale: 32.0 2023-11-19 21:07:27,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=805873.3333333334, ans=0.5 2023-11-19 21:07:48,672 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 120900 2023-11-19 21:07:56,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=806006.6666666666, ans=0.125 2023-11-19 21:08:30,880 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 700, loss[loss=0.06128, simple_loss=0.07467, pruned_loss=0.01313, audio_tagging_loss=0.01082, over 16016.00 frames. ], tot_loss[loss=0.08513, simple_loss=0.1043, pruned_loss=0.02218, audio_tagging_loss=0.01078, over 2957604.62 frames. ], batch size: 62, lr: 6.43e-03, grad_scale: 32.0 2023-11-19 21:08:34,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=806206.6666666666, ans=0.2 2023-11-19 21:08:41,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=806206.6666666666, ans=0.125 2023-11-19 21:08:42,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=806273.3333333334, ans=0.125 2023-11-19 21:08:44,808 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.115e+01 8.069e+01 8.585e+01 9.544e+01 1.162e+02, threshold=1.717e+02, percent-clipped=0.0 2023-11-19 21:08:49,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=806273.3333333334, ans=0.5 2023-11-19 21:08:53,533 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 120950 2023-11-19 21:09:04,243 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.59 vs. limit=15.0 2023-11-19 21:09:35,850 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 750, loss[loss=0.06084, simple_loss=0.05643, pruned_loss=0.01494, audio_tagging_loss=0.01768, over 15037.00 frames. ], tot_loss[loss=0.08459, simple_loss=0.1039, pruned_loss=0.02193, audio_tagging_loss=0.0107, over 2978191.43 frames. ], batch size: 58, lr: 6.43e-03, grad_scale: 32.0 2023-11-19 21:09:37,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=806540.0, ans=0.1 2023-11-19 21:09:47,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=806540.0, ans=0.125 2023-11-19 21:09:58,084 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 121000 2023-11-19 21:10:00,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=806606.6666666666, ans=22.5 2023-11-19 21:10:04,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=806673.3333333334, ans=0.125 2023-11-19 21:10:19,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=806740.0, ans=0.125 2023-11-19 21:10:22,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=806740.0, ans=0.125 2023-11-19 21:10:24,975 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.52 vs. limit=15.0 2023-11-19 21:10:34,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=806806.6666666666, ans=0.125 2023-11-19 21:10:40,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=806873.3333333334, ans=0.0 2023-11-19 21:10:40,722 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.96 vs. limit=22.5 2023-11-19 21:10:41,368 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 800, loss[loss=0.09093, simple_loss=0.1166, pruned_loss=0.02411, audio_tagging_loss=0.008491, over 15428.00 frames. ], tot_loss[loss=0.08454, simple_loss=0.1037, pruned_loss=0.02191, audio_tagging_loss=0.01076, over 2994445.97 frames. ], batch size: 56, lr: 6.43e-03, grad_scale: 32.0 2023-11-19 21:10:55,545 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.953e+01 8.303e+01 9.154e+01 9.871e+01 1.410e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-19 21:11:02,902 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 121050 2023-11-19 21:11:34,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=807140.0, ans=0.125 2023-11-19 21:11:45,778 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 850, loss[loss=0.1208, simple_loss=0.1544, pruned_loss=0.03426, audio_tagging_loss=0.009379, over 14653.00 frames. ], tot_loss[loss=0.0848, simple_loss=0.1037, pruned_loss=0.02204, audio_tagging_loss=0.01091, over 3005181.56 frames. ], batch size: 52, lr: 6.43e-03, grad_scale: 32.0 2023-11-19 21:11:50,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=807206.6666666666, ans=0.125 2023-11-19 21:11:51,313 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.70 vs. limit=15.0 2023-11-19 21:12:01,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=807273.3333333334, ans=0.0 2023-11-19 21:12:01,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=807273.3333333334, ans=0.0 2023-11-19 21:12:02,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=807273.3333333334, ans=0.05 2023-11-19 21:12:07,856 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 121100 2023-11-19 21:12:09,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=807273.3333333334, ans=0.125 2023-11-19 21:12:15,258 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.72 vs. limit=6.0 2023-11-19 21:12:21,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=807340.0, ans=0.05 2023-11-19 21:12:45,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=807473.3333333334, ans=0.1 2023-11-19 21:12:46,400 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.24 vs. limit=22.5 2023-11-19 21:12:50,507 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 900, loss[loss=0.0804, simple_loss=0.09796, pruned_loss=0.01996, audio_tagging_loss=0.01146, over 16023.00 frames. ], tot_loss[loss=0.0849, simple_loss=0.1038, pruned_loss=0.02209, audio_tagging_loss=0.01089, over 3018290.11 frames. ], batch size: 62, lr: 6.43e-03, grad_scale: 32.0 2023-11-19 21:13:05,614 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.647e+01 8.167e+01 8.792e+01 9.769e+01 1.364e+02, threshold=1.758e+02, percent-clipped=0.0 2023-11-19 21:13:13,265 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 121150 2023-11-19 21:13:19,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=807673.3333333334, ans=0.125 2023-11-19 21:13:21,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=807673.3333333334, ans=0.0 2023-11-19 21:13:28,914 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=807740.0, ans=0.05 2023-11-19 21:13:31,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=807740.0, ans=0.0 2023-11-19 21:13:35,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=807740.0, ans=0.125 2023-11-19 21:13:56,691 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 950, loss[loss=0.1057, simple_loss=0.1331, pruned_loss=0.02933, audio_tagging_loss=0.00985, over 15539.00 frames. ], tot_loss[loss=0.08531, simple_loss=0.1048, pruned_loss=0.02233, audio_tagging_loss=0.01061, over 3025680.08 frames. ], batch size: 56, lr: 6.43e-03, grad_scale: 32.0 2023-11-19 21:14:04,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=807873.3333333334, ans=0.0 2023-11-19 21:14:18,439 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 121200 2023-11-19 21:14:30,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=808006.6666666666, ans=0.07 2023-11-19 21:14:49,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=808140.0, ans=0.0 2023-11-19 21:15:01,039 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 1000, loss[loss=0.07874, simple_loss=0.1023, pruned_loss=0.02057, audio_tagging_loss=0.007, over 14917.00 frames. ], tot_loss[loss=0.08457, simple_loss=0.104, pruned_loss=0.02212, audio_tagging_loss=0.01045, over 3028046.41 frames. ], batch size: 56, lr: 6.43e-03, grad_scale: 32.0 2023-11-19 21:15:07,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=808206.6666666666, ans=10.0 2023-11-19 21:15:15,836 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.437e+01 8.149e+01 8.966e+01 9.862e+01 1.248e+02, threshold=1.793e+02, percent-clipped=0.0 2023-11-19 21:15:23,333 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 121250 2023-11-19 21:15:29,315 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 21:15:33,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=808340.0, ans=0.0 2023-11-19 21:15:40,768 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.90 vs. limit=15.0 2023-11-19 21:15:57,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=808473.3333333334, ans=0.0 2023-11-19 21:16:02,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=808473.3333333334, ans=0.125 2023-11-19 21:16:05,940 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 1050, loss[loss=0.08012, simple_loss=0.09108, pruned_loss=0.02575, audio_tagging_loss=0.008829, over 14181.00 frames. ], tot_loss[loss=0.08467, simple_loss=0.1042, pruned_loss=0.02223, audio_tagging_loss=0.01033, over 3034535.57 frames. ], batch size: 54, lr: 6.43e-03, grad_scale: 32.0 2023-11-19 21:16:14,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=808540.0, ans=0.125 2023-11-19 21:16:17,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=808540.0, ans=0.2 2023-11-19 21:16:26,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=808606.6666666666, ans=0.2 2023-11-19 21:16:28,215 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 121300 2023-11-19 21:16:54,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=808740.0, ans=0.1 2023-11-19 21:17:08,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=808806.6666666666, ans=0.125 2023-11-19 21:17:09,749 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.49 vs. limit=15.0 2023-11-19 21:17:11,265 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 1100, loss[loss=0.0842, simple_loss=0.1047, pruned_loss=0.02035, audio_tagging_loss=0.01149, over 16158.00 frames. ], tot_loss[loss=0.08455, simple_loss=0.1042, pruned_loss=0.02218, audio_tagging_loss=0.01029, over 3037111.82 frames. ], batch size: 61, lr: 6.42e-03, grad_scale: 16.0 2023-11-19 21:17:13,662 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 21:17:20,051 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 21:17:24,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=808940.0, ans=0.0 2023-11-19 21:17:25,896 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.793e+01 8.226e+01 9.036e+01 9.842e+01 1.440e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-19 21:17:32,120 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 121350 2023-11-19 21:17:46,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=809006.6666666666, ans=0.1 2023-11-19 21:17:47,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=809073.3333333334, ans=0.0 2023-11-19 21:18:11,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=809140.0, ans=0.0 2023-11-19 21:18:14,495 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 1150, loss[loss=0.09279, simple_loss=0.1155, pruned_loss=0.02747, audio_tagging_loss=0.007585, over 15418.00 frames. ], tot_loss[loss=0.08504, simple_loss=0.1051, pruned_loss=0.02231, audio_tagging_loss=0.01017, over 3039326.15 frames. ], batch size: 58, lr: 6.42e-03, grad_scale: 16.0 2023-11-19 21:18:29,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=809273.3333333334, ans=0.125 2023-11-19 21:18:37,141 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 121400 2023-11-19 21:19:02,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=809406.6666666666, ans=0.125 2023-11-19 21:19:05,218 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.37 vs. limit=15.0 2023-11-19 21:19:05,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=809473.3333333334, ans=0.1 2023-11-19 21:19:08,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=809473.3333333334, ans=0.0 2023-11-19 21:19:20,005 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 1200, loss[loss=0.09169, simple_loss=0.1097, pruned_loss=0.02628, audio_tagging_loss=0.01057, over 15166.00 frames. ], tot_loss[loss=0.085, simple_loss=0.1053, pruned_loss=0.0223, audio_tagging_loss=0.01008, over 3040823.29 frames. ], batch size: 58, lr: 6.42e-03, grad_scale: 32.0 2023-11-19 21:19:22,209 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 21:19:24,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=809540.0, ans=0.1 2023-11-19 21:19:34,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=809606.6666666666, ans=0.125 2023-11-19 21:19:36,420 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.659e+01 8.249e+01 9.079e+01 9.946e+01 1.270e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-19 21:19:42,791 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 121450 2023-11-19 21:19:49,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=809673.3333333334, ans=0.0 2023-11-19 21:19:54,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=809673.3333333334, ans=0.125 2023-11-19 21:20:03,911 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 21:20:16,605 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.84 vs. limit=22.5 2023-11-19 21:20:17,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=809806.6666666666, ans=0.0 2023-11-19 21:20:20,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=809806.6666666666, ans=0.125 2023-11-19 21:20:25,698 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 1250, loss[loss=0.05973, simple_loss=0.06635, pruned_loss=0.0146, audio_tagging_loss=0.01196, over 12926.00 frames. ], tot_loss[loss=0.08545, simple_loss=0.1056, pruned_loss=0.02264, audio_tagging_loss=0.01002, over 3042791.24 frames. ], batch size: 53, lr: 6.42e-03, grad_scale: 16.0 2023-11-19 21:20:44,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=809940.0, ans=0.1 2023-11-19 21:20:46,392 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 121500 2023-11-19 21:20:50,992 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.45 vs. limit=15.0 2023-11-19 21:21:07,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=810073.3333333334, ans=10.0 2023-11-19 21:21:07,843 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.77 vs. limit=15.0 2023-11-19 21:21:12,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=810073.3333333334, ans=0.1 2023-11-19 21:21:25,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=810140.0, ans=0.09899494936611666 2023-11-19 21:21:28,525 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 1300, loss[loss=0.07532, simple_loss=0.09204, pruned_loss=0.01909, audio_tagging_loss=0.01021, over 14995.00 frames. ], tot_loss[loss=0.08467, simple_loss=0.1047, pruned_loss=0.02227, audio_tagging_loss=0.01008, over 3042364.39 frames. ], batch size: 55, lr: 6.42e-03, grad_scale: 16.0 2023-11-19 21:21:45,102 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.756e+01 8.695e+01 9.311e+01 1.032e+02 1.222e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-19 21:21:50,146 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 121550 2023-11-19 21:21:58,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=810340.0, ans=0.1 2023-11-19 21:22:00,584 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.82 vs. limit=15.0 2023-11-19 21:22:02,423 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=810340.0, ans=0.035 2023-11-19 21:22:23,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=810473.3333333334, ans=0.1 2023-11-19 21:22:32,531 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 1350, loss[loss=0.08634, simple_loss=0.1039, pruned_loss=0.02265, audio_tagging_loss=0.01172, over 14725.00 frames. ], tot_loss[loss=0.08428, simple_loss=0.1041, pruned_loss=0.02206, audio_tagging_loss=0.01017, over 3041065.46 frames. ], batch size: 56, lr: 6.42e-03, grad_scale: 16.0 2023-11-19 21:22:35,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=810540.0, ans=0.125 2023-11-19 21:22:35,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=810540.0, ans=0.1 2023-11-19 21:22:50,822 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.77 vs. limit=10.0 2023-11-19 21:22:54,948 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 121600 2023-11-19 21:22:56,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=810606.6666666666, ans=0.07 2023-11-19 21:23:05,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=810673.3333333334, ans=0.125 2023-11-19 21:23:07,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=810673.3333333334, ans=0.0 2023-11-19 21:23:18,715 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 21:23:25,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=810806.6666666666, ans=0.125 2023-11-19 21:23:25,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=810806.6666666666, ans=0.0 2023-11-19 21:23:26,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=810806.6666666666, ans=0.05 2023-11-19 21:23:28,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=810806.6666666666, ans=0.125 2023-11-19 21:23:37,201 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.49 vs. limit=15.0 2023-11-19 21:23:37,724 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 1400, loss[loss=0.08089, simple_loss=0.0968, pruned_loss=0.01747, audio_tagging_loss=0.01502, over 14794.00 frames. ], tot_loss[loss=0.08457, simple_loss=0.1042, pruned_loss=0.02219, audio_tagging_loss=0.01025, over 3045950.55 frames. ], batch size: 55, lr: 6.42e-03, grad_scale: 16.0 2023-11-19 21:23:53,612 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.902e+01 8.352e+01 8.972e+01 9.668e+01 1.251e+02, threshold=1.794e+02, percent-clipped=0.0 2023-11-19 21:23:58,578 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 121650 2023-11-19 21:23:58,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=810940.0, ans=0.0 2023-11-19 21:24:16,781 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.60 vs. limit=12.0 2023-11-19 21:24:20,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=811073.3333333334, ans=0.0 2023-11-19 21:24:40,437 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 1450, loss[loss=0.08645, simple_loss=0.1153, pruned_loss=0.02038, audio_tagging_loss=0.008423, over 15126.00 frames. ], tot_loss[loss=0.08464, simple_loss=0.1043, pruned_loss=0.02216, audio_tagging_loss=0.01032, over 3049335.98 frames. ], batch size: 55, lr: 6.42e-03, grad_scale: 16.0 2023-11-19 21:24:50,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=811206.6666666666, ans=0.125 2023-11-19 21:24:58,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=811273.3333333334, ans=0.125 2023-11-19 21:25:00,175 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.13 vs. limit=22.5 2023-11-19 21:25:01,964 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 121700 2023-11-19 21:25:23,663 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.05 vs. limit=6.0 2023-11-19 21:25:28,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=811406.6666666666, ans=0.0 2023-11-19 21:25:44,124 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 1500, loss[loss=0.09967, simple_loss=0.1299, pruned_loss=0.02465, audio_tagging_loss=0.01008, over 15042.00 frames. ], tot_loss[loss=0.08421, simple_loss=0.1038, pruned_loss=0.02195, audio_tagging_loss=0.01037, over 3046248.45 frames. ], batch size: 54, lr: 6.41e-03, grad_scale: 16.0 2023-11-19 21:26:01,427 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.708e+01 8.227e+01 9.077e+01 1.029e+02 1.490e+02, threshold=1.815e+02, percent-clipped=0.0 2023-11-19 21:26:03,324 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.18 vs. limit=15.0 2023-11-19 21:26:07,357 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 121750 2023-11-19 21:26:13,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=811673.3333333334, ans=0.1 2023-11-19 21:26:15,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=811673.3333333334, ans=0.05 2023-11-19 21:26:38,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=811806.6666666666, ans=0.0 2023-11-19 21:26:41,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=811806.6666666666, ans=0.0 2023-11-19 21:26:48,077 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 1550, loss[loss=0.06717, simple_loss=0.07426, pruned_loss=0.0185, audio_tagging_loss=0.01155, over 14507.00 frames. ], tot_loss[loss=0.08464, simple_loss=0.1041, pruned_loss=0.02217, audio_tagging_loss=0.01042, over 3042138.27 frames. ], batch size: 58, lr: 6.41e-03, grad_scale: 16.0 2023-11-19 21:26:54,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=811873.3333333334, ans=0.125 2023-11-19 21:27:08,121 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.52 vs. limit=12.0 2023-11-19 21:27:11,378 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 121800 2023-11-19 21:27:11,887 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.76 vs. limit=22.5 2023-11-19 21:27:17,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=812006.6666666666, ans=0.2 2023-11-19 21:27:23,330 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=11.05 vs. limit=12.0 2023-11-19 21:27:54,384 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 1600, loss[loss=0.102, simple_loss=0.1218, pruned_loss=0.03201, audio_tagging_loss=0.009094, over 14490.00 frames. ], tot_loss[loss=0.08424, simple_loss=0.1035, pruned_loss=0.022, audio_tagging_loss=0.01049, over 3042879.23 frames. ], batch size: 56, lr: 6.41e-03, grad_scale: 32.0 2023-11-19 21:28:10,250 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.741e+01 8.357e+01 8.989e+01 9.853e+01 1.199e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-19 21:28:13,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=812273.3333333334, ans=0.125 2023-11-19 21:28:15,892 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 121850 2023-11-19 21:28:28,277 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.39 vs. limit=15.0 2023-11-19 21:28:36,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=812406.6666666666, ans=0.1 2023-11-19 21:28:38,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=812406.6666666666, ans=0.125 2023-11-19 21:28:57,469 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 1650, loss[loss=0.08554, simple_loss=0.1101, pruned_loss=0.02082, audio_tagging_loss=0.009683, over 15084.00 frames. ], tot_loss[loss=0.08425, simple_loss=0.104, pruned_loss=0.02184, audio_tagging_loss=0.01042, over 3043912.90 frames. ], batch size: 55, lr: 6.41e-03, grad_scale: 32.0 2023-11-19 21:29:01,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=812540.0, ans=0.1 2023-11-19 21:29:03,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=812540.0, ans=0.125 2023-11-19 21:29:10,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=812606.6666666666, ans=0.0 2023-11-19 21:29:20,407 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 121900 2023-11-19 21:30:01,912 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 1700, loss[loss=0.08682, simple_loss=0.1056, pruned_loss=0.02196, audio_tagging_loss=0.01206, over 16097.00 frames. ], tot_loss[loss=0.08422, simple_loss=0.1039, pruned_loss=0.02181, audio_tagging_loss=0.01045, over 3045136.69 frames. ], batch size: 59, lr: 6.41e-03, grad_scale: 32.0 2023-11-19 21:30:03,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=812873.3333333334, ans=0.1 2023-11-19 21:30:11,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=812873.3333333334, ans=0.125 2023-11-19 21:30:19,258 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.83 vs. limit=15.0 2023-11-19 21:30:19,795 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.291e+01 8.451e+01 9.168e+01 1.014e+02 1.661e+02, threshold=1.834e+02, percent-clipped=0.0 2023-11-19 21:30:22,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=812940.0, ans=0.125 2023-11-19 21:30:24,725 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 121950 2023-11-19 21:30:30,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=813006.6666666666, ans=0.125 2023-11-19 21:30:50,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=813073.3333333334, ans=0.125 2023-11-19 21:31:01,452 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.36 vs. limit=15.0 2023-11-19 21:31:07,577 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 1750, loss[loss=0.07221, simple_loss=0.09662, pruned_loss=0.01598, audio_tagging_loss=0.007926, over 15660.00 frames. ], tot_loss[loss=0.08451, simple_loss=0.1046, pruned_loss=0.0219, audio_tagging_loss=0.01032, over 3044947.60 frames. ], batch size: 59, lr: 6.41e-03, grad_scale: 32.0 2023-11-19 21:31:08,447 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.59 vs. limit=12.0 2023-11-19 21:31:27,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=813273.3333333334, ans=0.125 2023-11-19 21:31:28,402 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 122000 2023-11-19 21:31:33,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=813340.0, ans=0.125 2023-11-19 21:31:50,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=813406.6666666666, ans=0.125 2023-11-19 21:31:56,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=813406.6666666666, ans=0.2 2023-11-19 21:32:12,114 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 1800, loss[loss=0.08155, simple_loss=0.1051, pruned_loss=0.01809, audio_tagging_loss=0.0109, over 16243.00 frames. ], tot_loss[loss=0.08458, simple_loss=0.1047, pruned_loss=0.02194, audio_tagging_loss=0.01029, over 3042984.43 frames. ], batch size: 62, lr: 6.41e-03, grad_scale: 32.0 2023-11-19 21:32:28,682 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.860e+01 8.061e+01 8.839e+01 9.659e+01 3.662e+02, threshold=1.768e+02, percent-clipped=1.0 2023-11-19 21:32:34,300 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 122050 2023-11-19 21:32:38,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=813673.3333333334, ans=0.125 2023-11-19 21:32:55,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=813740.0, ans=0.2 2023-11-19 21:33:14,602 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=8.045e-01 2023-11-19 21:33:16,786 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 1850, loss[loss=0.09624, simple_loss=0.1104, pruned_loss=0.02814, audio_tagging_loss=0.0129, over 14331.00 frames. ], tot_loss[loss=0.0837, simple_loss=0.1034, pruned_loss=0.02164, audio_tagging_loss=0.01035, over 3038573.07 frames. ], batch size: 54, lr: 6.40e-03, grad_scale: 16.0 2023-11-19 21:33:38,886 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 122100 2023-11-19 21:33:39,005 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 21:33:47,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=814006.6666666666, ans=0.125 2023-11-19 21:33:54,799 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.39 vs. limit=15.0 2023-11-19 21:34:00,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=814073.3333333334, ans=0.125 2023-11-19 21:34:10,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=814140.0, ans=0.0 2023-11-19 21:34:21,905 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 1900, loss[loss=0.09898, simple_loss=0.1149, pruned_loss=0.02862, audio_tagging_loss=0.01289, over 15382.00 frames. ], tot_loss[loss=0.08346, simple_loss=0.1032, pruned_loss=0.02153, audio_tagging_loss=0.01031, over 3035684.13 frames. ], batch size: 58, lr: 6.40e-03, grad_scale: 16.0 2023-11-19 21:34:36,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=814273.3333333334, ans=0.1 2023-11-19 21:34:39,685 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.976e+01 8.118e+01 8.678e+01 9.738e+01 1.673e+02, threshold=1.736e+02, percent-clipped=0.0 2023-11-19 21:34:43,640 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 122150 2023-11-19 21:35:01,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=814406.6666666666, ans=0.0 2023-11-19 21:35:01,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=814406.6666666666, ans=0.125 2023-11-19 21:35:09,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=814406.6666666666, ans=0.04949747468305833 2023-11-19 21:35:26,496 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 1950, loss[loss=0.09052, simple_loss=0.114, pruned_loss=0.02432, audio_tagging_loss=0.009194, over 15183.00 frames. ], tot_loss[loss=0.08346, simple_loss=0.1031, pruned_loss=0.0216, audio_tagging_loss=0.01032, over 3038250.03 frames. ], batch size: 60, lr: 6.40e-03, grad_scale: 16.0 2023-11-19 21:35:48,051 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 122200 2023-11-19 21:36:19,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=814806.6666666666, ans=0.0 2023-11-19 21:36:27,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=814806.6666666666, ans=0.125 2023-11-19 21:36:31,166 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 2000, loss[loss=0.1041, simple_loss=0.1301, pruned_loss=0.03103, audio_tagging_loss=0.007986, over 16121.00 frames. ], tot_loss[loss=0.08332, simple_loss=0.1028, pruned_loss=0.0216, audio_tagging_loss=0.01032, over 3037887.38 frames. ], batch size: 56, lr: 6.40e-03, grad_scale: 32.0 2023-11-19 21:36:40,596 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 21:36:49,640 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.135e+01 8.499e+01 9.542e+01 1.090e+02 1.717e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-19 21:36:53,401 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 122250 2023-11-19 21:37:02,803 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.77 vs. limit=10.0 2023-11-19 21:37:16,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=815073.3333333334, ans=0.125 2023-11-19 21:37:22,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=815140.0, ans=0.025 2023-11-19 21:37:30,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=815140.0, ans=0.0 2023-11-19 21:37:30,835 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.47 vs. limit=6.0 2023-11-19 21:37:36,786 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 2050, loss[loss=0.09083, simple_loss=0.1136, pruned_loss=0.02272, audio_tagging_loss=0.01129, over 15131.00 frames. ], tot_loss[loss=0.08297, simple_loss=0.1022, pruned_loss=0.02151, audio_tagging_loss=0.01036, over 3038859.50 frames. ], batch size: 58, lr: 6.40e-03, grad_scale: 32.0 2023-11-19 21:37:37,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=815206.6666666666, ans=0.125 2023-11-19 21:37:41,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=815206.6666666666, ans=0.125 2023-11-19 21:37:41,362 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.77 vs. limit=15.0 2023-11-19 21:37:42,678 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.48 vs. limit=10.0 2023-11-19 21:37:58,502 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 122300 2023-11-19 21:38:02,780 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.71 vs. limit=12.0 2023-11-19 21:38:03,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=815340.0, ans=0.125 2023-11-19 21:38:05,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=815340.0, ans=0.2 2023-11-19 21:38:08,169 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.10 vs. limit=15.0 2023-11-19 21:38:19,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=815406.6666666666, ans=0.125 2023-11-19 21:38:23,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=815406.6666666666, ans=0.125 2023-11-19 21:38:26,640 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=14.65 vs. limit=15.0 2023-11-19 21:38:26,853 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.90 vs. limit=6.0 2023-11-19 21:38:30,508 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.03 vs. limit=15.0 2023-11-19 21:38:36,173 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 21:38:38,998 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.69 vs. limit=15.0 2023-11-19 21:38:39,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=815540.0, ans=0.1 2023-11-19 21:38:40,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=815540.0, ans=22.5 2023-11-19 21:38:40,859 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 2100, loss[loss=0.09113, simple_loss=0.1086, pruned_loss=0.02689, audio_tagging_loss=0.00992, over 16125.00 frames. ], tot_loss[loss=0.08258, simple_loss=0.1017, pruned_loss=0.02139, audio_tagging_loss=0.01032, over 3038392.63 frames. ], batch size: 61, lr: 6.40e-03, grad_scale: 32.0 2023-11-19 21:38:59,054 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.154e+01 8.161e+01 9.375e+01 1.016e+02 1.346e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-19 21:39:02,937 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 122350 2023-11-19 21:39:04,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=815606.6666666666, ans=0.0 2023-11-19 21:39:07,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=815673.3333333334, ans=0.125 2023-11-19 21:39:11,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=815673.3333333334, ans=0.125 2023-11-19 21:39:15,832 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.62 vs. limit=6.0 2023-11-19 21:39:19,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=815740.0, ans=0.2 2023-11-19 21:39:21,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=815740.0, ans=0.125 2023-11-19 21:39:37,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=815806.6666666666, ans=0.125 2023-11-19 21:39:45,696 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 2150, loss[loss=0.07948, simple_loss=0.09273, pruned_loss=0.02245, audio_tagging_loss=0.01067, over 15353.00 frames. ], tot_loss[loss=0.08255, simple_loss=0.1018, pruned_loss=0.02137, audio_tagging_loss=0.01028, over 3040012.43 frames. ], batch size: 58, lr: 6.40e-03, grad_scale: 32.0 2023-11-19 21:40:08,132 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 122400 2023-11-19 21:40:09,732 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.29 vs. limit=22.5 2023-11-19 21:40:09,803 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.57 vs. limit=12.0 2023-11-19 21:40:24,497 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 21:40:47,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=816140.0, ans=0.125 2023-11-19 21:40:51,633 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 2200, loss[loss=0.08065, simple_loss=0.1024, pruned_loss=0.02081, audio_tagging_loss=0.008629, over 15262.00 frames. ], tot_loss[loss=0.08345, simple_loss=0.1029, pruned_loss=0.02169, audio_tagging_loss=0.01029, over 3043884.15 frames. ], batch size: 56, lr: 6.40e-03, grad_scale: 32.0 2023-11-19 21:40:58,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=816206.6666666666, ans=0.125 2023-11-19 21:41:08,856 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.683e+01 8.220e+01 9.086e+01 1.022e+02 1.678e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-19 21:41:10,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=816273.3333333334, ans=0.0 2023-11-19 21:41:12,636 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 122450 2023-11-19 21:41:16,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=816340.0, ans=0.2 2023-11-19 21:41:16,770 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.70 vs. limit=6.0 2023-11-19 21:41:39,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=816406.6666666666, ans=0.0 2023-11-19 21:41:40,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=816406.6666666666, ans=0.09899494936611666 2023-11-19 21:41:48,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=816473.3333333334, ans=0.125 2023-11-19 21:41:55,714 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 2250, loss[loss=0.09258, simple_loss=0.1233, pruned_loss=0.02319, audio_tagging_loss=0.007741, over 15397.00 frames. ], tot_loss[loss=0.08393, simple_loss=0.1035, pruned_loss=0.02185, audio_tagging_loss=0.01032, over 3039744.39 frames. ], batch size: 56, lr: 6.39e-03, grad_scale: 32.0 2023-11-19 21:42:17,983 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 122500 2023-11-19 21:42:47,236 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.78 vs. limit=22.5 2023-11-19 21:42:47,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=816806.6666666666, ans=0.2 2023-11-19 21:42:56,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=816806.6666666666, ans=0.1 2023-11-19 21:43:00,642 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 2300, loss[loss=0.08158, simple_loss=0.1065, pruned_loss=0.01929, audio_tagging_loss=0.009044, over 15316.00 frames. ], tot_loss[loss=0.08376, simple_loss=0.1036, pruned_loss=0.02168, audio_tagging_loss=0.01029, over 3049346.45 frames. ], batch size: 56, lr: 6.39e-03, grad_scale: 32.0 2023-11-19 21:43:04,551 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.22 vs. limit=22.5 2023-11-19 21:43:09,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=816873.3333333334, ans=0.04949747468305833 2023-11-19 21:43:19,727 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.33 vs. limit=15.0 2023-11-19 21:43:19,984 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.000e+01 8.330e+01 8.975e+01 9.637e+01 1.370e+02, threshold=1.795e+02, percent-clipped=0.0 2023-11-19 21:43:23,786 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 122550 2023-11-19 21:43:38,209 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.52 vs. limit=15.0 2023-11-19 21:43:38,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=817073.3333333334, ans=0.0 2023-11-19 21:43:47,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=817073.3333333334, ans=0.2 2023-11-19 21:43:58,423 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 21:44:01,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=817140.0, ans=0.0 2023-11-19 21:44:07,009 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 2350, loss[loss=0.0888, simple_loss=0.1082, pruned_loss=0.02406, audio_tagging_loss=0.01066, over 15130.00 frames. ], tot_loss[loss=0.0839, simple_loss=0.1034, pruned_loss=0.0218, audio_tagging_loss=0.01042, over 3054417.46 frames. ], batch size: 57, lr: 6.39e-03, grad_scale: 16.0 2023-11-19 21:44:12,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=817206.6666666666, ans=0.2 2023-11-19 21:44:19,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=817273.3333333334, ans=0.1 2023-11-19 21:44:28,148 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 122600 2023-11-19 21:44:32,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=817340.0, ans=0.125 2023-11-19 21:44:35,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=817340.0, ans=0.125 2023-11-19 21:44:40,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=817340.0, ans=0.0 2023-11-19 21:44:41,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=817340.0, ans=0.125 2023-11-19 21:44:57,160 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.70 vs. limit=15.0 2023-11-19 21:45:02,045 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.44 vs. limit=12.0 2023-11-19 21:45:11,202 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 2400, loss[loss=0.09539, simple_loss=0.1152, pruned_loss=0.02517, audio_tagging_loss=0.01259, over 15267.00 frames. ], tot_loss[loss=0.08447, simple_loss=0.104, pruned_loss=0.02203, audio_tagging_loss=0.01041, over 3055867.56 frames. ], batch size: 59, lr: 6.39e-03, grad_scale: 32.0 2023-11-19 21:45:12,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=817540.0, ans=0.125 2023-11-19 21:45:26,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=817606.6666666666, ans=0.2 2023-11-19 21:45:26,817 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.27 vs. limit=15.0 2023-11-19 21:45:30,244 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.655e+01 8.060e+01 8.978e+01 9.886e+01 1.686e+02, threshold=1.796e+02, percent-clipped=0.0 2023-11-19 21:45:32,809 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 122650 2023-11-19 21:45:36,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=817673.3333333334, ans=0.125 2023-11-19 21:46:07,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=817806.6666666666, ans=0.125 2023-11-19 21:46:14,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=817873.3333333334, ans=0.125 2023-11-19 21:46:15,441 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 2450, loss[loss=0.09994, simple_loss=0.1225, pruned_loss=0.028, audio_tagging_loss=0.01068, over 15363.00 frames. ], tot_loss[loss=0.08463, simple_loss=0.1042, pruned_loss=0.02207, audio_tagging_loss=0.01046, over 3047194.04 frames. ], batch size: 56, lr: 6.39e-03, grad_scale: 32.0 2023-11-19 21:46:32,496 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.71 vs. limit=10.0 2023-11-19 21:46:38,548 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 122700 2023-11-19 21:46:44,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=818006.6666666666, ans=0.125 2023-11-19 21:46:55,144 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.00 vs. limit=12.0 2023-11-19 21:46:56,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=818073.3333333334, ans=0.0 2023-11-19 21:47:10,620 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.84 vs. limit=15.0 2023-11-19 21:47:21,086 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 2500, loss[loss=0.07402, simple_loss=0.08484, pruned_loss=0.0174, audio_tagging_loss=0.0142, over 16062.00 frames. ], tot_loss[loss=0.08492, simple_loss=0.1047, pruned_loss=0.02217, audio_tagging_loss=0.01042, over 3048764.64 frames. ], batch size: 62, lr: 6.39e-03, grad_scale: 32.0 2023-11-19 21:47:36,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=818273.3333333334, ans=0.2 2023-11-19 21:47:40,051 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.459e+01 8.244e+01 8.954e+01 9.755e+01 1.221e+02, threshold=1.791e+02, percent-clipped=0.0 2023-11-19 21:47:42,574 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 122750 2023-11-19 21:48:10,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=818406.6666666666, ans=0.0 2023-11-19 21:48:10,913 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.67 vs. limit=15.0 2023-11-19 21:48:21,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=818473.3333333334, ans=0.125 2023-11-19 21:48:25,703 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 2550, loss[loss=0.0785, simple_loss=0.09398, pruned_loss=0.02281, audio_tagging_loss=0.008701, over 14436.00 frames. ], tot_loss[loss=0.08475, simple_loss=0.1042, pruned_loss=0.02222, audio_tagging_loss=0.01043, over 3050566.57 frames. ], batch size: 56, lr: 6.39e-03, grad_scale: 32.0 2023-11-19 21:48:31,247 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.24 vs. limit=22.5 2023-11-19 21:48:46,682 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.11 vs. limit=15.0 2023-11-19 21:48:47,235 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 122800 2023-11-19 21:48:59,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=818673.3333333334, ans=0.0 2023-11-19 21:49:01,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=818673.3333333334, ans=0.07 2023-11-19 21:49:09,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=818740.0, ans=0.125 2023-11-19 21:49:27,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=818806.6666666666, ans=0.125 2023-11-19 21:49:30,321 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 2600, loss[loss=0.0663, simple_loss=0.07792, pruned_loss=0.01673, audio_tagging_loss=0.01061, over 14647.00 frames. ], tot_loss[loss=0.08417, simple_loss=0.1036, pruned_loss=0.02202, audio_tagging_loss=0.01036, over 3049050.75 frames. ], batch size: 54, lr: 6.39e-03, grad_scale: 16.0 2023-11-19 21:49:32,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=818873.3333333334, ans=0.125 2023-11-19 21:49:33,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=818873.3333333334, ans=0.125 2023-11-19 21:49:44,102 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.05 vs. limit=22.5 2023-11-19 21:49:52,163 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.938e+01 8.517e+01 9.235e+01 1.002e+02 1.405e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-19 21:49:53,589 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 122850 2023-11-19 21:49:54,381 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.36 vs. limit=22.5 2023-11-19 21:49:55,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=818940.0, ans=0.125 2023-11-19 21:50:01,569 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.24 vs. limit=15.0 2023-11-19 21:50:19,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=819073.3333333334, ans=0.125 2023-11-19 21:50:35,314 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 2650, loss[loss=0.1002, simple_loss=0.1239, pruned_loss=0.02712, audio_tagging_loss=0.01115, over 14991.00 frames. ], tot_loss[loss=0.08451, simple_loss=0.1042, pruned_loss=0.02211, audio_tagging_loss=0.01029, over 3050612.22 frames. ], batch size: 57, lr: 6.38e-03, grad_scale: 16.0 2023-11-19 21:50:48,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=819273.3333333334, ans=0.125 2023-11-19 21:50:52,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=819273.3333333334, ans=0.125 2023-11-19 21:50:58,165 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 122900 2023-11-19 21:51:22,697 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.27 vs. limit=15.0 2023-11-19 21:51:32,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=819473.3333333334, ans=0.0 2023-11-19 21:51:41,531 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 2700, loss[loss=0.08738, simple_loss=0.1095, pruned_loss=0.02429, audio_tagging_loss=0.008334, over 16521.00 frames. ], tot_loss[loss=0.08501, simple_loss=0.105, pruned_loss=0.02229, audio_tagging_loss=0.0102, over 3056361.86 frames. ], batch size: 62, lr: 6.38e-03, grad_scale: 16.0 2023-11-19 21:51:41,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=819540.0, ans=0.1 2023-11-19 21:52:01,455 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.391e+01 8.510e+01 9.186e+01 1.041e+02 2.301e+02, threshold=1.837e+02, percent-clipped=1.0 2023-11-19 21:52:03,509 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 122950 2023-11-19 21:52:16,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=819673.3333333334, ans=0.1 2023-11-19 21:52:18,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=819673.3333333334, ans=0.2 2023-11-19 21:52:42,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=819806.6666666666, ans=0.125 2023-11-19 21:52:46,246 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 2750, loss[loss=0.08988, simple_loss=0.1104, pruned_loss=0.02478, audio_tagging_loss=0.009909, over 15178.00 frames. ], tot_loss[loss=0.08551, simple_loss=0.1059, pruned_loss=0.02243, audio_tagging_loss=0.01011, over 3056331.54 frames. ], batch size: 55, lr: 6.38e-03, grad_scale: 16.0 2023-11-19 21:53:00,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=819940.0, ans=0.0 2023-11-19 21:53:08,472 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 123000 2023-11-19 21:53:20,251 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.74 vs. limit=22.5 2023-11-19 21:53:26,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=820073.3333333334, ans=0.125 2023-11-19 21:53:32,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=820073.3333333334, ans=0.1 2023-11-19 21:53:33,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=820073.3333333334, ans=0.125 2023-11-19 21:53:34,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=820073.3333333334, ans=0.0 2023-11-19 21:53:38,652 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 21:53:40,860 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 21:53:51,406 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 2800, loss[loss=0.06527, simple_loss=0.07378, pruned_loss=0.01568, audio_tagging_loss=0.0127, over 16779.00 frames. ], tot_loss[loss=0.08517, simple_loss=0.1054, pruned_loss=0.02236, audio_tagging_loss=0.01012, over 3055687.64 frames. ], batch size: 64, lr: 6.38e-03, grad_scale: 16.0 2023-11-19 21:54:05,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=820273.3333333334, ans=0.125 2023-11-19 21:54:14,036 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.993e+01 8.139e+01 8.815e+01 9.780e+01 1.679e+02, threshold=1.763e+02, percent-clipped=0.0 2023-11-19 21:54:14,179 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 123050 2023-11-19 21:54:14,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=820273.3333333334, ans=0.0 2023-11-19 21:54:47,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=820473.3333333334, ans=0.125 2023-11-19 21:54:47,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=820473.3333333334, ans=0.125 2023-11-19 21:54:55,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=820540.0, ans=0.125 2023-11-19 21:54:56,820 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 2850, loss[loss=0.08047, simple_loss=0.09638, pruned_loss=0.02279, audio_tagging_loss=0.009488, over 15707.00 frames. ], tot_loss[loss=0.08458, simple_loss=0.1048, pruned_loss=0.02216, audio_tagging_loss=0.01002, over 3046462.99 frames. ], batch size: 61, lr: 6.38e-03, grad_scale: 16.0 2023-11-19 21:55:03,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=820540.0, ans=0.0 2023-11-19 21:55:08,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=820606.6666666666, ans=0.125 2023-11-19 21:55:18,660 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 123100 2023-11-19 21:55:23,488 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.12 vs. limit=15.0 2023-11-19 21:55:43,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=820740.0, ans=0.1 2023-11-19 21:56:02,091 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 2900, loss[loss=0.06711, simple_loss=0.07789, pruned_loss=0.01896, audio_tagging_loss=0.009207, over 15673.00 frames. ], tot_loss[loss=0.08442, simple_loss=0.1046, pruned_loss=0.02207, audio_tagging_loss=0.01006, over 3051029.20 frames. ], batch size: 59, lr: 6.38e-03, grad_scale: 16.0 2023-11-19 21:56:03,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=820873.3333333334, ans=0.125 2023-11-19 21:56:13,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=820873.3333333334, ans=0.0 2023-11-19 21:56:23,810 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.958e+01 8.184e+01 8.854e+01 9.488e+01 1.292e+02, threshold=1.771e+02, percent-clipped=0.0 2023-11-19 21:56:23,953 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 123150 2023-11-19 21:56:25,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=820940.0, ans=0.05 2023-11-19 21:56:27,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=821006.6666666666, ans=0.1 2023-11-19 21:56:36,345 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.08 vs. limit=10.0 2023-11-19 21:56:56,768 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.70 vs. limit=15.0 2023-11-19 21:57:05,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=821206.6666666666, ans=0.2 2023-11-19 21:57:06,584 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 2950, loss[loss=0.07781, simple_loss=0.09608, pruned_loss=0.02297, audio_tagging_loss=0.006795, over 14594.00 frames. ], tot_loss[loss=0.08538, simple_loss=0.1057, pruned_loss=0.02235, audio_tagging_loss=0.01019, over 3056454.45 frames. ], batch size: 55, lr: 6.38e-03, grad_scale: 16.0 2023-11-19 21:57:20,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=821273.3333333334, ans=0.125 2023-11-19 21:57:24,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=821273.3333333334, ans=0.04949747468305833 2023-11-19 21:57:27,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=821273.3333333334, ans=0.05 2023-11-19 21:57:28,825 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 123200 2023-11-19 21:57:41,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=821340.0, ans=0.125 2023-11-19 21:57:41,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=821340.0, ans=0.125 2023-11-19 21:57:51,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=821406.6666666666, ans=0.04949747468305833 2023-11-19 21:57:55,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=821406.6666666666, ans=0.125 2023-11-19 21:57:55,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=821406.6666666666, ans=0.1 2023-11-19 21:58:04,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=821473.3333333334, ans=0.125 2023-11-19 21:58:04,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=821473.3333333334, ans=0.125 2023-11-19 21:58:05,938 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.06 vs. limit=15.0 2023-11-19 21:58:11,557 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.64 vs. limit=22.5 2023-11-19 21:58:12,084 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 3000, loss[loss=0.08335, simple_loss=0.1017, pruned_loss=0.02306, audio_tagging_loss=0.009454, over 15291.00 frames. ], tot_loss[loss=0.08549, simple_loss=0.1056, pruned_loss=0.02246, audio_tagging_loss=0.01023, over 3045124.48 frames. ], batch size: 56, lr: 6.37e-03, grad_scale: 16.0 2023-11-19 21:58:12,085 INFO [train_asr.py:1285] (1/4) Computing validation loss 2023-11-19 21:58:38,387 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.3553, 5.0018, 4.7776, 5.1854], device='cuda:1') 2023-11-19 21:58:52,210 INFO [train_asr.py:1294] (1/4) Epoch 11, validation: loss=0.06441, simple_loss=0.05497, pruned_loss=0.006219, audio_tagging_loss=0.03071, over 4681554.00 frames. 2023-11-19 21:58:52,211 INFO [train_asr.py:1295] (1/4) Maximum memory allocated so far is 26082MB 2023-11-19 21:58:52,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=821540.0, ans=0.1 2023-11-19 21:58:58,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=821540.0, ans=0.04949747468305833 2023-11-19 21:59:00,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=821540.0, ans=0.5 2023-11-19 21:59:14,429 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.751e+01 8.445e+01 9.049e+01 1.018e+02 1.456e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-19 21:59:14,593 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 123250 2023-11-19 21:59:18,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=821673.3333333334, ans=0.2 2023-11-19 21:59:38,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=821740.0, ans=0.0 2023-11-19 21:59:41,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=821806.6666666666, ans=0.125 2023-11-19 21:59:52,913 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.82 vs. limit=15.0 2023-11-19 21:59:55,785 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 3050, loss[loss=0.1133, simple_loss=0.1533, pruned_loss=0.02907, audio_tagging_loss=0.007543, over 16194.00 frames. ], tot_loss[loss=0.08616, simple_loss=0.1064, pruned_loss=0.02268, audio_tagging_loss=0.01026, over 3042376.64 frames. ], batch size: 59, lr: 6.37e-03, grad_scale: 16.0 2023-11-19 22:00:13,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=821940.0, ans=0.125 2023-11-19 22:00:16,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=821940.0, ans=0.125 2023-11-19 22:00:18,120 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 123300 2023-11-19 22:00:21,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=822006.6666666666, ans=0.2 2023-11-19 22:00:23,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=822006.6666666666, ans=0.125 2023-11-19 22:00:33,955 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 22:00:37,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=822073.3333333334, ans=0.125 2023-11-19 22:01:01,018 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 3100, loss[loss=0.08112, simple_loss=0.1083, pruned_loss=0.01996, audio_tagging_loss=0.007014, over 14368.00 frames. ], tot_loss[loss=0.08584, simple_loss=0.1061, pruned_loss=0.02251, audio_tagging_loss=0.01029, over 3042916.81 frames. ], batch size: 55, lr: 6.37e-03, grad_scale: 16.0 2023-11-19 22:01:07,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=822206.6666666666, ans=0.1 2023-11-19 22:01:22,954 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.275e+01 8.148e+01 8.871e+01 9.460e+01 1.235e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-19 22:01:23,105 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 123350 2023-11-19 22:01:23,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=822273.3333333334, ans=0.1 2023-11-19 22:01:39,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=822406.6666666666, ans=0.125 2023-11-19 22:01:43,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=822406.6666666666, ans=0.125 2023-11-19 22:02:01,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=822473.3333333334, ans=0.0 2023-11-19 22:02:02,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=822473.3333333334, ans=0.0 2023-11-19 22:02:05,582 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 3150, loss[loss=0.1017, simple_loss=0.1301, pruned_loss=0.02572, audio_tagging_loss=0.01094, over 14850.00 frames. ], tot_loss[loss=0.08541, simple_loss=0.1055, pruned_loss=0.02231, audio_tagging_loss=0.01035, over 3040115.85 frames. ], batch size: 54, lr: 6.37e-03, grad_scale: 16.0 2023-11-19 22:02:12,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=822540.0, ans=0.125 2023-11-19 22:02:16,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=822540.0, ans=0.2 2023-11-19 22:02:27,744 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 123400 2023-11-19 22:02:33,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=822673.3333333334, ans=0.0 2023-11-19 22:02:44,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=822740.0, ans=0.125 2023-11-19 22:03:08,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=822806.6666666666, ans=0.125 2023-11-19 22:03:10,309 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 3200, loss[loss=0.08938, simple_loss=0.1156, pruned_loss=0.02192, audio_tagging_loss=0.009664, over 15718.00 frames. ], tot_loss[loss=0.08537, simple_loss=0.1051, pruned_loss=0.02217, audio_tagging_loss=0.01064, over 3044420.89 frames. ], batch size: 56, lr: 6.37e-03, grad_scale: 32.0 2023-11-19 22:03:13,270 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.65 vs. limit=15.0 2023-11-19 22:03:20,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=822873.3333333334, ans=0.125 2023-11-19 22:03:20,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=822873.3333333334, ans=0.125 2023-11-19 22:03:26,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=822940.0, ans=0.125 2023-11-19 22:03:32,362 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.734e+01 8.258e+01 8.832e+01 9.801e+01 1.591e+02, threshold=1.766e+02, percent-clipped=0.0 2023-11-19 22:03:32,500 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 123450 2023-11-19 22:03:32,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=822940.0, ans=0.2 2023-11-19 22:03:32,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=822940.0, ans=0.025 2023-11-19 22:03:55,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=823073.3333333334, ans=0.0 2023-11-19 22:04:15,944 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 3250, loss[loss=0.08688, simple_loss=0.1011, pruned_loss=0.02196, audio_tagging_loss=0.01438, over 15238.00 frames. ], tot_loss[loss=0.08442, simple_loss=0.1039, pruned_loss=0.02179, audio_tagging_loss=0.01068, over 3047782.50 frames. ], batch size: 56, lr: 6.37e-03, grad_scale: 32.0 2023-11-19 22:04:17,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=823206.6666666666, ans=0.125 2023-11-19 22:04:23,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=823206.6666666666, ans=0.125 2023-11-19 22:04:32,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=823273.3333333334, ans=0.2 2023-11-19 22:04:36,927 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 123500 2023-11-19 22:05:02,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=823406.6666666666, ans=0.09899494936611666 2023-11-19 22:05:05,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=823473.3333333334, ans=0.125 2023-11-19 22:05:09,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=823473.3333333334, ans=0.2 2023-11-19 22:05:18,908 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 3300, loss[loss=0.09033, simple_loss=0.1175, pruned_loss=0.02304, audio_tagging_loss=0.00852, over 15106.00 frames. ], tot_loss[loss=0.08503, simple_loss=0.1045, pruned_loss=0.02203, audio_tagging_loss=0.01073, over 3053199.36 frames. ], batch size: 54, lr: 6.37e-03, grad_scale: 32.0 2023-11-19 22:05:40,808 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.210e+01 8.419e+01 8.972e+01 9.610e+01 1.838e+02, threshold=1.794e+02, percent-clipped=1.0 2023-11-19 22:05:40,948 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 123550 2023-11-19 22:05:43,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=823606.6666666666, ans=0.125 2023-11-19 22:05:45,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=823673.3333333334, ans=0.125 2023-11-19 22:05:59,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=823740.0, ans=0.1 2023-11-19 22:06:04,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=823740.0, ans=0.05 2023-11-19 22:06:08,257 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 22:06:18,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=823806.6666666666, ans=0.1 2023-11-19 22:06:23,920 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 3350, loss[loss=0.07746, simple_loss=0.09515, pruned_loss=0.01913, audio_tagging_loss=0.01075, over 15595.00 frames. ], tot_loss[loss=0.08439, simple_loss=0.1038, pruned_loss=0.02186, audio_tagging_loss=0.01063, over 3054297.27 frames. ], batch size: 59, lr: 6.37e-03, grad_scale: 32.0 2023-11-19 22:06:46,360 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 123600 2023-11-19 22:06:55,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=824006.6666666666, ans=0.0 2023-11-19 22:07:14,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=824073.3333333334, ans=0.125 2023-11-19 22:07:29,856 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 3400, loss[loss=0.1008, simple_loss=0.1318, pruned_loss=0.02578, audio_tagging_loss=0.009106, over 15783.00 frames. ], tot_loss[loss=0.08471, simple_loss=0.1048, pruned_loss=0.02193, audio_tagging_loss=0.01039, over 3058850.37 frames. ], batch size: 56, lr: 6.36e-03, grad_scale: 32.0 2023-11-19 22:07:34,254 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.28 vs. limit=10.0 2023-11-19 22:07:35,564 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.33 vs. limit=22.5 2023-11-19 22:07:40,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=824206.6666666666, ans=0.1 2023-11-19 22:07:47,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=824273.3333333334, ans=0.0 2023-11-19 22:07:50,883 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.806e+01 8.550e+01 9.235e+01 1.051e+02 1.197e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-19 22:07:51,025 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 123650 2023-11-19 22:08:06,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=824340.0, ans=0.2 2023-11-19 22:08:33,941 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 3450, loss[loss=0.08786, simple_loss=0.1021, pruned_loss=0.02426, audio_tagging_loss=0.01256, over 15532.00 frames. ], tot_loss[loss=0.08507, simple_loss=0.1051, pruned_loss=0.0222, audio_tagging_loss=0.01034, over 3058710.15 frames. ], batch size: 59, lr: 6.36e-03, grad_scale: 32.0 2023-11-19 22:08:38,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=824540.0, ans=0.125 2023-11-19 22:08:42,109 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.24 vs. limit=10.0 2023-11-19 22:08:55,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=824606.6666666666, ans=0.0 2023-11-19 22:08:56,252 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 123700 2023-11-19 22:09:06,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=824673.3333333334, ans=0.0 2023-11-19 22:09:11,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=824673.3333333334, ans=0.125 2023-11-19 22:09:11,505 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.81 vs. limit=15.0 2023-11-19 22:09:14,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=824740.0, ans=0.125 2023-11-19 22:09:26,497 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.77 vs. limit=12.0 2023-11-19 22:09:38,739 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 3500, loss[loss=0.0553, simple_loss=0.05727, pruned_loss=0.01255, audio_tagging_loss=0.01411, over 14730.00 frames. ], tot_loss[loss=0.08505, simple_loss=0.1051, pruned_loss=0.02221, audio_tagging_loss=0.01029, over 3048360.06 frames. ], batch size: 57, lr: 6.36e-03, grad_scale: 32.0 2023-11-19 22:09:51,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=824940.0, ans=0.125 2023-11-19 22:09:52,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=824940.0, ans=0.125 2023-11-19 22:09:52,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=824940.0, ans=0.125 2023-11-19 22:10:01,197 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.383e+01 8.617e+01 9.360e+01 1.039e+02 1.365e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-19 22:10:01,341 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 123750 2023-11-19 22:10:12,142 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 22:10:16,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=825073.3333333334, ans=0.1 2023-11-19 22:10:23,313 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.49 vs. limit=10.0 2023-11-19 22:10:31,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=825140.0, ans=0.0 2023-11-19 22:10:43,981 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 3550, loss[loss=0.07262, simple_loss=0.08576, pruned_loss=0.01845, audio_tagging_loss=0.01129, over 14739.00 frames. ], tot_loss[loss=0.08432, simple_loss=0.1043, pruned_loss=0.02196, audio_tagging_loss=0.0102, over 3051354.94 frames. ], batch size: 57, lr: 6.36e-03, grad_scale: 32.0 2023-11-19 22:10:50,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=825206.6666666666, ans=0.125 2023-11-19 22:10:51,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=825206.6666666666, ans=0.125 2023-11-19 22:10:59,473 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.81 vs. limit=15.0 2023-11-19 22:11:04,767 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 123800 2023-11-19 22:11:16,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=825340.0, ans=0.125 2023-11-19 22:11:16,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=825340.0, ans=0.2 2023-11-19 22:11:31,734 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.59 vs. limit=15.0 2023-11-19 22:11:41,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=825473.3333333334, ans=0.125 2023-11-19 22:11:43,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=825473.3333333334, ans=0.125 2023-11-19 22:11:47,376 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 3600, loss[loss=0.07299, simple_loss=0.09257, pruned_loss=0.01807, audio_tagging_loss=0.008639, over 14688.00 frames. ], tot_loss[loss=0.0834, simple_loss=0.103, pruned_loss=0.02176, audio_tagging_loss=0.01014, over 3042690.08 frames. ], batch size: 59, lr: 6.36e-03, grad_scale: 32.0 2023-11-19 22:12:09,000 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.358e+01 8.243e+01 9.327e+01 1.037e+02 1.432e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-19 22:12:09,145 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 123850 2023-11-19 22:12:10,604 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=825606.6666666666, ans=0.125 2023-11-19 22:12:25,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=825740.0, ans=0.95 2023-11-19 22:12:45,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=825806.6666666666, ans=0.125 2023-11-19 22:12:52,068 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 3650, loss[loss=0.05872, simple_loss=0.07215, pruned_loss=0.008992, audio_tagging_loss=0.01365, over 15882.00 frames. ], tot_loss[loss=0.08298, simple_loss=0.1024, pruned_loss=0.02162, audio_tagging_loss=0.01014, over 3040879.05 frames. ], batch size: 60, lr: 6.36e-03, grad_scale: 32.0 2023-11-19 22:13:14,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=825940.0, ans=0.0 2023-11-19 22:13:15,355 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 123900 2023-11-19 22:13:20,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=826006.6666666666, ans=0.07 2023-11-19 22:13:42,133 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.18 vs. limit=15.0 2023-11-19 22:13:42,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=826140.0, ans=0.125 2023-11-19 22:13:57,507 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 3700, loss[loss=0.1042, simple_loss=0.1385, pruned_loss=0.02664, audio_tagging_loss=0.008367, over 15507.00 frames. ], tot_loss[loss=0.08282, simple_loss=0.1023, pruned_loss=0.02153, audio_tagging_loss=0.01012, over 3048484.54 frames. ], batch size: 56, lr: 6.36e-03, grad_scale: 32.0 2023-11-19 22:14:08,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=826206.6666666666, ans=0.1 2023-11-19 22:14:14,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=826273.3333333334, ans=0.04949747468305833 2023-11-19 22:14:14,616 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.22 vs. limit=15.0 2023-11-19 22:14:15,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=826273.3333333334, ans=0.0 2023-11-19 22:14:18,770 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.848e+01 8.345e+01 9.061e+01 9.917e+01 1.388e+02, threshold=1.812e+02, percent-clipped=0.0 2023-11-19 22:14:18,907 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 123950 2023-11-19 22:14:51,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=826473.3333333334, ans=0.0 2023-11-19 22:15:01,860 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 3750, loss[loss=0.08867, simple_loss=0.1133, pruned_loss=0.02432, audio_tagging_loss=0.007715, over 15679.00 frames. ], tot_loss[loss=0.08377, simple_loss=0.1033, pruned_loss=0.02182, audio_tagging_loss=0.01028, over 3050559.97 frames. ], batch size: 59, lr: 6.36e-03, grad_scale: 32.0 2023-11-19 22:15:08,750 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.57 vs. limit=15.0 2023-11-19 22:15:23,421 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 124000 2023-11-19 22:15:34,921 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.16 vs. limit=15.0 2023-11-19 22:15:50,594 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 22:15:59,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=826806.6666666666, ans=0.125 2023-11-19 22:16:06,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=826806.6666666666, ans=0.0 2023-11-19 22:16:09,640 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 3800, loss[loss=0.1016, simple_loss=0.133, pruned_loss=0.02881, audio_tagging_loss=0.006262, over 16193.00 frames. ], tot_loss[loss=0.08422, simple_loss=0.104, pruned_loss=0.02196, audio_tagging_loss=0.01027, over 3052031.40 frames. ], batch size: 58, lr: 6.35e-03, grad_scale: 32.0 2023-11-19 22:16:21,536 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.10 vs. limit=12.0 2023-11-19 22:16:31,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=826940.0, ans=0.0 2023-11-19 22:16:31,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=826940.0, ans=0.125 2023-11-19 22:16:32,418 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.048e+01 8.419e+01 9.148e+01 9.794e+01 1.250e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-19 22:16:32,561 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 124050 2023-11-19 22:16:38,065 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.63 vs. limit=6.0 2023-11-19 22:16:48,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=827073.3333333334, ans=0.1 2023-11-19 22:16:53,679 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.82 vs. limit=10.0 2023-11-19 22:16:54,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=827073.3333333334, ans=0.125 2023-11-19 22:17:00,567 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.15 vs. limit=6.0 2023-11-19 22:17:09,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=827140.0, ans=0.0 2023-11-19 22:17:13,884 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 3850, loss[loss=0.07905, simple_loss=0.1007, pruned_loss=0.02068, audio_tagging_loss=0.008004, over 15368.00 frames. ], tot_loss[loss=0.08506, simple_loss=0.1052, pruned_loss=0.02213, audio_tagging_loss=0.01034, over 3058368.05 frames. ], batch size: 56, lr: 6.35e-03, grad_scale: 32.0 2023-11-19 22:17:25,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=827273.3333333334, ans=0.0 2023-11-19 22:17:30,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=827273.3333333334, ans=0.1 2023-11-19 22:17:35,532 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 124100 2023-11-19 22:17:37,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=827273.3333333334, ans=0.0 2023-11-19 22:17:38,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=827340.0, ans=0.2 2023-11-19 22:17:55,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=827406.6666666666, ans=0.0 2023-11-19 22:17:58,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=827406.6666666666, ans=0.2 2023-11-19 22:18:11,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=827473.3333333334, ans=15.0 2023-11-19 22:18:18,124 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 3900, loss[loss=0.08655, simple_loss=0.09873, pruned_loss=0.02358, audio_tagging_loss=0.0136, over 15748.00 frames. ], tot_loss[loss=0.0844, simple_loss=0.1042, pruned_loss=0.02187, audio_tagging_loss=0.01043, over 3048098.98 frames. ], batch size: 59, lr: 6.35e-03, grad_scale: 32.0 2023-11-19 22:18:39,679 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.879e+01 8.442e+01 9.156e+01 1.005e+02 1.586e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-19 22:18:39,812 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 124150 2023-11-19 22:18:46,513 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.23 vs. limit=10.0 2023-11-19 22:18:59,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=827740.0, ans=0.125 2023-11-19 22:19:16,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=827806.6666666666, ans=0.125 2023-11-19 22:19:22,047 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 3950, loss[loss=0.09527, simple_loss=0.1215, pruned_loss=0.02591, audio_tagging_loss=0.008627, over 15221.00 frames. ], tot_loss[loss=0.0845, simple_loss=0.1044, pruned_loss=0.02185, audio_tagging_loss=0.01044, over 3045309.65 frames. ], batch size: 57, lr: 6.35e-03, grad_scale: 32.0 2023-11-19 22:19:25,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=827873.3333333334, ans=0.1 2023-11-19 22:19:32,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=827873.3333333334, ans=0.0 2023-11-19 22:19:44,113 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 124200 2023-11-19 22:19:55,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=828006.6666666666, ans=0.125 2023-11-19 22:20:17,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=828140.0, ans=0.0 2023-11-19 22:20:27,598 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 4000, loss[loss=0.07872, simple_loss=0.09289, pruned_loss=0.02157, audio_tagging_loss=0.0107, over 16510.00 frames. ], tot_loss[loss=0.08428, simple_loss=0.1042, pruned_loss=0.0217, audio_tagging_loss=0.01047, over 3047152.05 frames. ], batch size: 63, lr: 6.35e-03, grad_scale: 32.0 2023-11-19 22:20:38,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=828206.6666666666, ans=0.125 2023-11-19 22:20:42,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=828273.3333333334, ans=0.1 2023-11-19 22:20:49,237 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.86 vs. limit=22.5 2023-11-19 22:20:49,748 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.155e+01 8.189e+01 8.890e+01 9.727e+01 1.231e+02, threshold=1.778e+02, percent-clipped=0.0 2023-11-19 22:20:49,894 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 124250 2023-11-19 22:20:59,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=828340.0, ans=0.1 2023-11-19 22:21:02,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=828340.0, ans=0.1 2023-11-19 22:21:31,152 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.88 vs. limit=22.5 2023-11-19 22:21:31,904 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 4050, loss[loss=0.08356, simple_loss=0.1071, pruned_loss=0.01969, audio_tagging_loss=0.01031, over 16735.00 frames. ], tot_loss[loss=0.08462, simple_loss=0.1045, pruned_loss=0.02181, audio_tagging_loss=0.01054, over 3051379.01 frames. ], batch size: 62, lr: 6.35e-03, grad_scale: 32.0 2023-11-19 22:21:36,206 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 22:21:53,941 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 124300 2023-11-19 22:21:59,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=828673.3333333334, ans=0.125 2023-11-19 22:22:05,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=828673.3333333334, ans=0.125 2023-11-19 22:22:08,326 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.43 vs. limit=15.0 2023-11-19 22:22:26,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=828806.6666666666, ans=0.0 2023-11-19 22:22:34,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=828806.6666666666, ans=0.125 2023-11-19 22:22:36,363 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 4100, loss[loss=0.08172, simple_loss=0.1033, pruned_loss=0.0229, audio_tagging_loss=0.007143, over 14919.00 frames. ], tot_loss[loss=0.08531, simple_loss=0.1056, pruned_loss=0.02215, audio_tagging_loss=0.01038, over 3054069.86 frames. ], batch size: 58, lr: 6.35e-03, grad_scale: 32.0 2023-11-19 22:22:45,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=828873.3333333334, ans=0.1 2023-11-19 22:22:53,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=828940.0, ans=0.125 2023-11-19 22:22:58,272 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.257e+01 8.196e+01 8.855e+01 9.661e+01 1.383e+02, threshold=1.771e+02, percent-clipped=0.0 2023-11-19 22:22:58,496 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 124350 2023-11-19 22:23:09,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=829006.6666666666, ans=0.125 2023-11-19 22:23:15,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=829073.3333333334, ans=0.125 2023-11-19 22:23:37,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=829140.0, ans=0.2 2023-11-19 22:23:40,902 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 4150, loss[loss=0.1034, simple_loss=0.1239, pruned_loss=0.03341, audio_tagging_loss=0.008021, over 16065.00 frames. ], tot_loss[loss=0.08514, simple_loss=0.1057, pruned_loss=0.02213, audio_tagging_loss=0.01017, over 3058648.59 frames. ], batch size: 58, lr: 6.35e-03, grad_scale: 32.0 2023-11-19 22:23:47,942 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.90 vs. limit=15.0 2023-11-19 22:23:48,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=829206.6666666666, ans=0.0 2023-11-19 22:23:50,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=829206.6666666666, ans=0.0 2023-11-19 22:24:02,797 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 124400 2023-11-19 22:24:09,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=829340.0, ans=0.0 2023-11-19 22:24:10,504 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.71 vs. limit=22.5 2023-11-19 22:24:19,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=829406.6666666666, ans=0.2 2023-11-19 22:24:27,471 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 22:24:34,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=829473.3333333334, ans=0.025 2023-11-19 22:24:45,664 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 4200, loss[loss=0.1029, simple_loss=0.1292, pruned_loss=0.02791, audio_tagging_loss=0.01039, over 16759.00 frames. ], tot_loss[loss=0.08425, simple_loss=0.1044, pruned_loss=0.02196, audio_tagging_loss=0.01009, over 3051875.13 frames. ], batch size: 59, lr: 6.34e-03, grad_scale: 32.0 2023-11-19 22:24:51,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=829540.0, ans=0.125 2023-11-19 22:24:53,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=829540.0, ans=0.125 2023-11-19 22:25:06,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=829606.6666666666, ans=0.125 2023-11-19 22:25:07,644 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.364e+01 8.135e+01 8.898e+01 9.525e+01 1.896e+02, threshold=1.780e+02, percent-clipped=1.0 2023-11-19 22:25:07,788 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 124450 2023-11-19 22:25:08,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=829606.6666666666, ans=15.0 2023-11-19 22:25:11,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=829673.3333333334, ans=0.0 2023-11-19 22:25:16,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=829673.3333333334, ans=0.5 2023-11-19 22:25:31,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=829740.0, ans=0.125 2023-11-19 22:25:45,260 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.34 vs. limit=15.0 2023-11-19 22:25:45,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=829806.6666666666, ans=0.1 2023-11-19 22:25:50,204 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 4250, loss[loss=0.07018, simple_loss=0.08446, pruned_loss=0.01848, audio_tagging_loss=0.009475, over 16585.00 frames. ], tot_loss[loss=0.08418, simple_loss=0.1044, pruned_loss=0.0219, audio_tagging_loss=0.01007, over 3051498.15 frames. ], batch size: 62, lr: 6.34e-03, grad_scale: 16.0 2023-11-19 22:26:12,426 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 124500 2023-11-19 22:26:32,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=830073.3333333334, ans=0.125 2023-11-19 22:26:34,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=830073.3333333334, ans=0.1 2023-11-19 22:26:55,217 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 4300, loss[loss=0.08655, simple_loss=0.1133, pruned_loss=0.02147, audio_tagging_loss=0.008423, over 15957.00 frames. ], tot_loss[loss=0.08503, simple_loss=0.1056, pruned_loss=0.02223, audio_tagging_loss=0.009976, over 3047057.08 frames. ], batch size: 59, lr: 6.34e-03, grad_scale: 16.0 2023-11-19 22:26:57,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=830206.6666666666, ans=0.0 2023-11-19 22:27:09,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=830273.3333333334, ans=0.1 2023-11-19 22:27:13,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=830273.3333333334, ans=0.0 2023-11-19 22:27:17,365 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 124550 2023-11-19 22:27:18,440 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.753e+01 8.093e+01 8.901e+01 9.811e+01 2.323e+02, threshold=1.780e+02, percent-clipped=1.0 2023-11-19 22:27:26,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=830340.0, ans=0.2 2023-11-19 22:27:32,359 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 22:27:46,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=830473.3333333334, ans=0.125 2023-11-19 22:27:58,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=830540.0, ans=0.0 2023-11-19 22:27:59,294 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 4350, loss[loss=0.09947, simple_loss=0.1194, pruned_loss=0.03051, audio_tagging_loss=0.009261, over 13794.00 frames. ], tot_loss[loss=0.08466, simple_loss=0.105, pruned_loss=0.02207, audio_tagging_loss=0.01011, over 3040738.21 frames. ], batch size: 52, lr: 6.34e-03, grad_scale: 16.0 2023-11-19 22:28:05,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=830540.0, ans=0.125 2023-11-19 22:28:16,604 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.77 vs. limit=15.0 2023-11-19 22:28:20,723 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 124600 2023-11-19 22:28:27,073 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.66 vs. limit=15.0 2023-11-19 22:28:44,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=830740.0, ans=0.125 2023-11-19 22:28:50,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=830806.6666666666, ans=0.0 2023-11-19 22:29:00,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=830806.6666666666, ans=0.0 2023-11-19 22:29:00,771 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.75 vs. limit=15.0 2023-11-19 22:29:03,779 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 4400, loss[loss=0.1119, simple_loss=0.1324, pruned_loss=0.0394, audio_tagging_loss=0.0063, over 15770.00 frames. ], tot_loss[loss=0.08433, simple_loss=0.1044, pruned_loss=0.02198, audio_tagging_loss=0.01014, over 3037102.31 frames. ], batch size: 56, lr: 6.34e-03, grad_scale: 32.0 2023-11-19 22:29:04,376 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.45 vs. limit=15.0 2023-11-19 22:29:18,019 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.98 vs. limit=22.5 2023-11-19 22:29:26,141 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 124650 2023-11-19 22:29:27,903 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.389e+01 8.577e+01 9.158e+01 9.839e+01 1.465e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-19 22:30:08,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=831206.6666666666, ans=0.0 2023-11-19 22:30:09,317 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 4450, loss[loss=0.07456, simple_loss=0.08616, pruned_loss=0.0196, audio_tagging_loss=0.01188, over 14960.00 frames. ], tot_loss[loss=0.08474, simple_loss=0.105, pruned_loss=0.02209, audio_tagging_loss=0.01015, over 3042823.59 frames. ], batch size: 57, lr: 6.34e-03, grad_scale: 32.0 2023-11-19 22:30:31,370 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 124700 2023-11-19 22:31:06,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=831473.3333333334, ans=0.125 2023-11-19 22:31:13,687 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 4500, loss[loss=0.1019, simple_loss=0.1328, pruned_loss=0.02553, audio_tagging_loss=0.009976, over 15378.00 frames. ], tot_loss[loss=0.08508, simple_loss=0.1056, pruned_loss=0.02222, audio_tagging_loss=0.01007, over 3045543.27 frames. ], batch size: 56, lr: 6.34e-03, grad_scale: 32.0 2023-11-19 22:31:35,407 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 124750 2023-11-19 22:31:36,469 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.020e+01 8.327e+01 8.964e+01 9.884e+01 1.189e+02, threshold=1.793e+02, percent-clipped=0.0 2023-11-19 22:32:03,820 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.91 vs. limit=10.0 2023-11-19 22:32:18,449 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 4550, loss[loss=0.06606, simple_loss=0.0707, pruned_loss=0.01603, audio_tagging_loss=0.01468, over 15161.00 frames. ], tot_loss[loss=0.08477, simple_loss=0.1048, pruned_loss=0.0222, audio_tagging_loss=0.01014, over 3040907.93 frames. ], batch size: 59, lr: 6.34e-03, grad_scale: 32.0 2023-11-19 22:32:27,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=831873.3333333334, ans=0.125 2023-11-19 22:32:40,902 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 124800 2023-11-19 22:32:43,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=832006.6666666666, ans=10.0 2023-11-19 22:32:44,183 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.59 vs. limit=15.0 2023-11-19 22:32:53,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=832006.6666666666, ans=0.0 2023-11-19 22:33:08,648 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 22:33:10,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=832140.0, ans=0.5 2023-11-19 22:33:24,152 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 4600, loss[loss=0.1072, simple_loss=0.1486, pruned_loss=0.02599, audio_tagging_loss=0.006906, over 15449.00 frames. ], tot_loss[loss=0.08383, simple_loss=0.1035, pruned_loss=0.02184, audio_tagging_loss=0.01026, over 3040026.08 frames. ], batch size: 56, lr: 6.33e-03, grad_scale: 32.0 2023-11-19 22:33:24,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=832206.6666666666, ans=0.125 2023-11-19 22:33:33,649 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 22:33:46,734 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 124850 2023-11-19 22:33:47,808 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.870e+01 8.306e+01 8.909e+01 9.778e+01 1.815e+02, threshold=1.782e+02, percent-clipped=2.0 2023-11-19 22:34:04,920 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.67 vs. limit=6.0 2023-11-19 22:34:12,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=832406.6666666666, ans=0.125 2023-11-19 22:34:12,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=832406.6666666666, ans=0.1 2023-11-19 22:34:20,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=832473.3333333334, ans=0.0 2023-11-19 22:34:26,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=832473.3333333334, ans=0.1 2023-11-19 22:34:27,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=832473.3333333334, ans=0.0 2023-11-19 22:34:29,656 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 4650, loss[loss=0.06932, simple_loss=0.07875, pruned_loss=0.01707, audio_tagging_loss=0.01287, over 13677.00 frames. ], tot_loss[loss=0.08357, simple_loss=0.1031, pruned_loss=0.02169, audio_tagging_loss=0.01033, over 3036285.92 frames. ], batch size: 53, lr: 6.33e-03, grad_scale: 32.0 2023-11-19 22:34:48,324 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.67 vs. limit=15.0 2023-11-19 22:34:51,367 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 124900 2023-11-19 22:34:55,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=832673.3333333334, ans=0.1 2023-11-19 22:34:58,333 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.70 vs. limit=15.0 2023-11-19 22:35:00,149 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.45 vs. limit=15.0 2023-11-19 22:35:17,381 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.96 vs. limit=15.0 2023-11-19 22:35:18,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=832740.0, ans=0.125 2023-11-19 22:35:34,360 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 4700, loss[loss=0.0831, simple_loss=0.09696, pruned_loss=0.02514, audio_tagging_loss=0.00948, over 15411.00 frames. ], tot_loss[loss=0.08296, simple_loss=0.1021, pruned_loss=0.02136, audio_tagging_loss=0.01053, over 3040915.13 frames. ], batch size: 58, lr: 6.33e-03, grad_scale: 32.0 2023-11-19 22:35:47,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=832940.0, ans=0.1 2023-11-19 22:35:51,317 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.54 vs. limit=12.0 2023-11-19 22:35:51,640 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.27 vs. limit=8.0 2023-11-19 22:35:55,640 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 124950 2023-11-19 22:35:56,664 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.758e+01 8.340e+01 9.099e+01 9.695e+01 1.346e+02, threshold=1.820e+02, percent-clipped=0.0 2023-11-19 22:35:59,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=833006.6666666666, ans=0.0 2023-11-19 22:36:12,305 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.20 vs. limit=22.5 2023-11-19 22:36:24,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=833140.0, ans=0.125 2023-11-19 22:36:31,084 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.94 vs. limit=15.0 2023-11-19 22:36:33,641 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.32 vs. limit=22.5 2023-11-19 22:36:38,650 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 4750, loss[loss=0.06489, simple_loss=0.07544, pruned_loss=0.01311, audio_tagging_loss=0.01406, over 14275.00 frames. ], tot_loss[loss=0.08253, simple_loss=0.1014, pruned_loss=0.0212, audio_tagging_loss=0.01061, over 3044142.43 frames. ], batch size: 55, lr: 6.33e-03, grad_scale: 32.0 2023-11-19 22:36:48,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=833206.6666666666, ans=0.0 2023-11-19 22:36:55,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=833273.3333333334, ans=10.0 2023-11-19 22:37:00,288 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 125000 2023-11-19 22:37:01,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=833273.3333333334, ans=0.0 2023-11-19 22:37:15,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=833406.6666666666, ans=0.0 2023-11-19 22:37:28,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=833473.3333333334, ans=0.125 2023-11-19 22:37:29,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=833473.3333333334, ans=0.125 2023-11-19 22:37:42,759 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 4800, loss[loss=0.07502, simple_loss=0.0961, pruned_loss=0.01719, audio_tagging_loss=0.009786, over 16193.00 frames. ], tot_loss[loss=0.08334, simple_loss=0.1025, pruned_loss=0.02145, audio_tagging_loss=0.01062, over 3044016.44 frames. ], batch size: 60, lr: 6.33e-03, grad_scale: 32.0 2023-11-19 22:38:04,932 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 125050 2023-11-19 22:38:07,373 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.959e+01 8.107e+01 9.125e+01 1.005e+02 1.234e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-19 22:38:07,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=833673.3333333334, ans=0.1 2023-11-19 22:38:37,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=833806.6666666666, ans=0.0 2023-11-19 22:38:39,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=833806.6666666666, ans=0.0 2023-11-19 22:38:47,587 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 4850, loss[loss=0.06326, simple_loss=0.07393, pruned_loss=0.01336, audio_tagging_loss=0.01294, over 16246.00 frames. ], tot_loss[loss=0.08387, simple_loss=0.103, pruned_loss=0.02168, audio_tagging_loss=0.01068, over 3039074.61 frames. ], batch size: 62, lr: 6.33e-03, grad_scale: 32.0 2023-11-19 22:38:52,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=833873.3333333334, ans=0.125 2023-11-19 22:39:09,118 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 125100 2023-11-19 22:39:09,722 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.11 vs. limit=22.5 2023-11-19 22:39:13,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=834006.6666666666, ans=0.125 2023-11-19 22:39:25,743 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.81 vs. limit=12.0 2023-11-19 22:39:33,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=834073.3333333334, ans=0.125 2023-11-19 22:39:39,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=834140.0, ans=0.125 2023-11-19 22:39:51,850 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 4900, loss[loss=0.08558, simple_loss=0.111, pruned_loss=0.01931, audio_tagging_loss=0.01078, over 15185.00 frames. ], tot_loss[loss=0.08489, simple_loss=0.1049, pruned_loss=0.02195, audio_tagging_loss=0.01051, over 3047032.44 frames. ], batch size: 56, lr: 6.33e-03, grad_scale: 32.0 2023-11-19 22:40:13,349 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 125150 2023-11-19 22:40:16,265 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.135e+01 8.432e+01 8.847e+01 9.512e+01 1.221e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-19 22:40:21,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=834340.0, ans=0.2 2023-11-19 22:40:23,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=834340.0, ans=0.125 2023-11-19 22:40:33,422 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.19 vs. limit=12.0 2023-11-19 22:40:46,922 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=5.973e-01 2023-11-19 22:40:51,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=834473.3333333334, ans=0.125 2023-11-19 22:40:54,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=834540.0, ans=0.2 2023-11-19 22:40:55,047 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 4950, loss[loss=0.08062, simple_loss=0.09835, pruned_loss=0.0236, audio_tagging_loss=0.007841, over 14771.00 frames. ], tot_loss[loss=0.08492, simple_loss=0.1052, pruned_loss=0.02207, audio_tagging_loss=0.01023, over 3046036.57 frames. ], batch size: 57, lr: 6.33e-03, grad_scale: 32.0 2023-11-19 22:40:59,851 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.55 vs. limit=15.0 2023-11-19 22:41:17,916 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 125200 2023-11-19 22:41:18,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=834606.6666666666, ans=0.0 2023-11-19 22:41:38,703 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.80 vs. limit=22.5 2023-11-19 22:41:48,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=834806.6666666666, ans=0.1 2023-11-19 22:42:00,387 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 5000, loss[loss=0.08922, simple_loss=0.1092, pruned_loss=0.02139, audio_tagging_loss=0.01322, over 15972.00 frames. ], tot_loss[loss=0.08414, simple_loss=0.1045, pruned_loss=0.02179, audio_tagging_loss=0.01009, over 3045564.17 frames. ], batch size: 61, lr: 6.32e-03, grad_scale: 32.0 2023-11-19 22:42:01,098 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.94 vs. limit=22.5 2023-11-19 22:42:02,372 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.72 vs. limit=15.0 2023-11-19 22:42:03,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=834873.3333333334, ans=0.1 2023-11-19 22:42:10,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=834873.3333333334, ans=0.0 2023-11-19 22:42:19,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=834940.0, ans=0.0 2023-11-19 22:42:22,128 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 125250 2023-11-19 22:42:24,392 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.897e+01 8.190e+01 8.915e+01 9.653e+01 1.690e+02, threshold=1.783e+02, percent-clipped=0.0 2023-11-19 22:42:42,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=835073.3333333334, ans=0.125 2023-11-19 22:42:43,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=835073.3333333334, ans=0.125 2023-11-19 22:43:00,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=835140.0, ans=0.125 2023-11-19 22:43:04,414 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 5050, loss[loss=0.09074, simple_loss=0.1117, pruned_loss=0.0273, audio_tagging_loss=0.0076, over 15490.00 frames. ], tot_loss[loss=0.08417, simple_loss=0.1044, pruned_loss=0.02195, audio_tagging_loss=0.01003, over 3038314.35 frames. ], batch size: 57, lr: 6.32e-03, grad_scale: 32.0 2023-11-19 22:43:22,680 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.44 vs. limit=15.0 2023-11-19 22:43:26,319 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 125300 2023-11-19 22:43:26,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=835273.3333333334, ans=0.125 2023-11-19 22:43:35,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=835340.0, ans=0.0 2023-11-19 22:44:01,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=835473.3333333334, ans=0.07 2023-11-19 22:44:02,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=835473.3333333334, ans=0.125 2023-11-19 22:44:06,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=835473.3333333334, ans=0.0 2023-11-19 22:44:08,446 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 5100, loss[loss=0.08732, simple_loss=0.1169, pruned_loss=0.02129, audio_tagging_loss=0.007559, over 14797.00 frames. ], tot_loss[loss=0.08452, simple_loss=0.1048, pruned_loss=0.02206, audio_tagging_loss=0.01006, over 3036506.18 frames. ], batch size: 56, lr: 6.32e-03, grad_scale: 32.0 2023-11-19 22:44:13,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=835540.0, ans=0.125 2023-11-19 22:44:31,268 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 125350 2023-11-19 22:44:33,596 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.822e+01 8.119e+01 8.827e+01 9.463e+01 1.199e+02, threshold=1.765e+02, percent-clipped=0.0 2023-11-19 22:44:44,461 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.91 vs. limit=15.0 2023-11-19 22:44:53,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=835740.0, ans=0.125 2023-11-19 22:44:54,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=835740.0, ans=0.125 2023-11-19 22:44:54,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=835740.0, ans=0.0 2023-11-19 22:44:56,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=835740.0, ans=0.0 2023-11-19 22:44:56,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=835740.0, ans=0.0 2023-11-19 22:45:14,076 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 5150, loss[loss=0.1044, simple_loss=0.1183, pruned_loss=0.03396, audio_tagging_loss=0.01129, over 14726.00 frames. ], tot_loss[loss=0.08349, simple_loss=0.1032, pruned_loss=0.02175, audio_tagging_loss=0.01012, over 3034447.40 frames. ], batch size: 54, lr: 6.32e-03, grad_scale: 32.0 2023-11-19 22:45:26,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=835940.0, ans=0.125 2023-11-19 22:45:36,417 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 125400 2023-11-19 22:46:02,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=836073.3333333334, ans=0.0 2023-11-19 22:46:19,177 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 5200, loss[loss=0.09789, simple_loss=0.1244, pruned_loss=0.02746, audio_tagging_loss=0.008227, over 14804.00 frames. ], tot_loss[loss=0.08392, simple_loss=0.1038, pruned_loss=0.02186, audio_tagging_loss=0.01018, over 3038635.84 frames. ], batch size: 53, lr: 6.32e-03, grad_scale: 32.0 2023-11-19 22:46:23,497 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.10 vs. limit=22.5 2023-11-19 22:46:40,485 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 125450 2023-11-19 22:46:40,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=836273.3333333334, ans=0.125 2023-11-19 22:46:42,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=836273.3333333334, ans=0.2 2023-11-19 22:46:43,467 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.938e+01 8.292e+01 9.057e+01 9.966e+01 1.254e+02, threshold=1.811e+02, percent-clipped=0.0 2023-11-19 22:46:55,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=836340.0, ans=0.125 2023-11-19 22:46:55,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=836340.0, ans=0.125 2023-11-19 22:47:23,273 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 5250, loss[loss=0.1211, simple_loss=0.1546, pruned_loss=0.03773, audio_tagging_loss=0.006105, over 15539.00 frames. ], tot_loss[loss=0.08493, simple_loss=0.105, pruned_loss=0.02237, audio_tagging_loss=0.01007, over 3048986.52 frames. ], batch size: 56, lr: 6.32e-03, grad_scale: 32.0 2023-11-19 22:47:33,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=836540.0, ans=0.2 2023-11-19 22:47:35,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=836606.6666666666, ans=0.0 2023-11-19 22:47:45,647 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 125500 2023-11-19 22:48:02,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=836740.0, ans=0.125 2023-11-19 22:48:28,092 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 5300, loss[loss=0.07246, simple_loss=0.09368, pruned_loss=0.01665, audio_tagging_loss=0.008971, over 15355.00 frames. ], tot_loss[loss=0.08461, simple_loss=0.1048, pruned_loss=0.02215, audio_tagging_loss=0.01008, over 3044917.64 frames. ], batch size: 57, lr: 6.32e-03, grad_scale: 32.0 2023-11-19 22:48:50,345 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 125550 2023-11-19 22:48:53,336 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.677e+01 8.460e+01 9.151e+01 1.090e+02 1.553e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-19 22:49:03,871 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.00 vs. limit=15.0 2023-11-19 22:49:07,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=837073.3333333334, ans=0.125 2023-11-19 22:49:16,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=837073.3333333334, ans=0.0 2023-11-19 22:49:33,677 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 5350, loss[loss=0.07163, simple_loss=0.09066, pruned_loss=0.01609, audio_tagging_loss=0.01021, over 15234.00 frames. ], tot_loss[loss=0.08398, simple_loss=0.104, pruned_loss=0.02191, audio_tagging_loss=0.01008, over 3045167.98 frames. ], batch size: 57, lr: 6.32e-03, grad_scale: 32.0 2023-11-19 22:49:41,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=837206.6666666666, ans=0.0 2023-11-19 22:49:45,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=837273.3333333334, ans=0.0 2023-11-19 22:49:49,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=837273.3333333334, ans=0.125 2023-11-19 22:49:54,923 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 125600 2023-11-19 22:49:55,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=837273.3333333334, ans=0.2 2023-11-19 22:49:56,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=837273.3333333334, ans=0.125 2023-11-19 22:50:03,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=837340.0, ans=0.05 2023-11-19 22:50:34,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=837473.3333333334, ans=0.125 2023-11-19 22:50:38,026 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 5400, loss[loss=0.08141, simple_loss=0.08959, pruned_loss=0.02293, audio_tagging_loss=0.01368, over 15058.00 frames. ], tot_loss[loss=0.0841, simple_loss=0.104, pruned_loss=0.02196, audio_tagging_loss=0.01016, over 3035755.22 frames. ], batch size: 57, lr: 6.31e-03, grad_scale: 16.0 2023-11-19 22:50:55,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=837606.6666666666, ans=0.0 2023-11-19 22:51:00,248 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 125650 2023-11-19 22:51:03,914 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.803e+01 8.135e+01 8.658e+01 9.508e+01 1.289e+02, threshold=1.732e+02, percent-clipped=0.0 2023-11-19 22:51:25,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=837740.0, ans=0.0 2023-11-19 22:51:29,377 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.08 vs. limit=6.0 2023-11-19 22:51:32,848 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.94 vs. limit=15.0 2023-11-19 22:51:41,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=837873.3333333334, ans=0.125 2023-11-19 22:51:41,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=837873.3333333334, ans=0.0 2023-11-19 22:51:42,667 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 5450, loss[loss=0.06888, simple_loss=0.08772, pruned_loss=0.01383, audio_tagging_loss=0.01119, over 14027.00 frames. ], tot_loss[loss=0.08388, simple_loss=0.1036, pruned_loss=0.02182, audio_tagging_loss=0.01027, over 3029624.24 frames. ], batch size: 52, lr: 6.31e-03, grad_scale: 16.0 2023-11-19 22:51:46,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=837873.3333333334, ans=0.0 2023-11-19 22:51:49,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=837873.3333333334, ans=0.125 2023-11-19 22:51:53,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=837873.3333333334, ans=0.1 2023-11-19 22:52:04,545 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 125700 2023-11-19 22:52:10,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=838006.6666666666, ans=0.125 2023-11-19 22:52:16,402 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.33 vs. limit=15.0 2023-11-19 22:52:38,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=838140.0, ans=0.1 2023-11-19 22:52:47,681 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 5500, loss[loss=0.06494, simple_loss=0.08532, pruned_loss=0.01094, audio_tagging_loss=0.01134, over 14891.00 frames. ], tot_loss[loss=0.08286, simple_loss=0.1021, pruned_loss=0.0214, audio_tagging_loss=0.01042, over 3029056.97 frames. ], batch size: 56, lr: 6.31e-03, grad_scale: 16.0 2023-11-19 22:52:47,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=838206.6666666666, ans=0.0 2023-11-19 22:52:58,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=838206.6666666666, ans=0.125 2023-11-19 22:53:00,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=838273.3333333334, ans=0.1 2023-11-19 22:53:09,223 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 125750 2023-11-19 22:53:12,826 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.838e+01 8.496e+01 8.990e+01 9.633e+01 1.229e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-19 22:53:34,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=838406.6666666666, ans=0.0 2023-11-19 22:53:35,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=838406.6666666666, ans=0.1 2023-11-19 22:53:39,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=838473.3333333334, ans=0.0 2023-11-19 22:53:51,892 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.85 vs. limit=22.5 2023-11-19 22:53:52,458 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 5550, loss[loss=0.09415, simple_loss=0.1157, pruned_loss=0.02641, audio_tagging_loss=0.009877, over 14622.00 frames. ], tot_loss[loss=0.08256, simple_loss=0.1016, pruned_loss=0.02124, audio_tagging_loss=0.01054, over 3029230.08 frames. ], batch size: 56, lr: 6.31e-03, grad_scale: 16.0 2023-11-19 22:54:08,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=838606.6666666666, ans=0.0 2023-11-19 22:54:14,215 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 125800 2023-11-19 22:54:41,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=838740.0, ans=0.125 2023-11-19 22:54:46,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=838806.6666666666, ans=0.125 2023-11-19 22:54:57,848 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 5600, loss[loss=0.07012, simple_loss=0.08708, pruned_loss=0.01835, audio_tagging_loss=0.008227, over 14220.00 frames. ], tot_loss[loss=0.08375, simple_loss=0.1032, pruned_loss=0.02161, audio_tagging_loss=0.01056, over 3037087.88 frames. ], batch size: 55, lr: 6.31e-03, grad_scale: 32.0 2023-11-19 22:55:12,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=838940.0, ans=0.2 2023-11-19 22:55:19,546 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 125850 2023-11-19 22:55:22,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=839006.6666666666, ans=0.0 2023-11-19 22:55:23,385 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.434e+01 8.075e+01 8.698e+01 9.694e+01 1.274e+02, threshold=1.740e+02, percent-clipped=0.0 2023-11-19 22:55:24,188 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.85 vs. limit=22.5 2023-11-19 22:55:32,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=839006.6666666666, ans=0.1 2023-11-19 22:55:34,628 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=839006.6666666666, ans=0.2 2023-11-19 22:55:35,868 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 22:55:46,055 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 22:55:46,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=839073.3333333334, ans=0.125 2023-11-19 22:55:58,292 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.88 vs. limit=15.0 2023-11-19 22:55:59,425 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.92 vs. limit=6.0 2023-11-19 22:56:02,499 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 5650, loss[loss=0.07735, simple_loss=0.09491, pruned_loss=0.01882, audio_tagging_loss=0.01107, over 15085.00 frames. ], tot_loss[loss=0.08359, simple_loss=0.1026, pruned_loss=0.02156, audio_tagging_loss=0.01074, over 3044416.25 frames. ], batch size: 56, lr: 6.31e-03, grad_scale: 32.0 2023-11-19 22:56:08,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=839206.6666666666, ans=0.125 2023-11-19 22:56:23,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=839273.3333333334, ans=0.125 2023-11-19 22:56:24,934 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 125900 2023-11-19 22:56:25,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=839273.3333333334, ans=0.1 2023-11-19 22:56:29,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=839340.0, ans=0.0 2023-11-19 22:56:36,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=839340.0, ans=0.1 2023-11-19 22:56:40,671 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.53 vs. limit=15.0 2023-11-19 22:56:41,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=839406.6666666666, ans=0.125 2023-11-19 22:56:45,823 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 22:56:46,519 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.70 vs. limit=15.0 2023-11-19 22:57:03,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=839473.3333333334, ans=0.125 2023-11-19 22:57:06,839 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 5700, loss[loss=0.1086, simple_loss=0.1384, pruned_loss=0.03227, audio_tagging_loss=0.007096, over 15749.00 frames. ], tot_loss[loss=0.08392, simple_loss=0.1034, pruned_loss=0.02168, audio_tagging_loss=0.01056, over 3047052.09 frames. ], batch size: 57, lr: 6.31e-03, grad_scale: 32.0 2023-11-19 22:57:08,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=839540.0, ans=0.0 2023-11-19 22:57:26,003 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.53 vs. limit=15.0 2023-11-19 22:57:29,115 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 125950 2023-11-19 22:57:30,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=839606.6666666666, ans=0.0 2023-11-19 22:57:34,474 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.315e+01 7.815e+01 8.868e+01 9.889e+01 1.263e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-19 22:57:45,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=839740.0, ans=0.125 2023-11-19 22:57:55,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=839740.0, ans=0.0 2023-11-19 22:58:11,319 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 5750, loss[loss=0.06914, simple_loss=0.07953, pruned_loss=0.01776, audio_tagging_loss=0.01161, over 14587.00 frames. ], tot_loss[loss=0.08364, simple_loss=0.1031, pruned_loss=0.02168, audio_tagging_loss=0.01042, over 3051425.00 frames. ], batch size: 55, lr: 6.31e-03, grad_scale: 16.0 2023-11-19 22:58:14,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=839873.3333333334, ans=0.125 2023-11-19 22:58:15,341 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.68 vs. limit=15.0 2023-11-19 22:58:25,289 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=839940.0, ans=0.1 2023-11-19 22:58:33,765 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 126000 2023-11-19 22:58:40,834 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.12 vs. limit=15.0 2023-11-19 22:59:00,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=840073.3333333334, ans=0.2 2023-11-19 22:59:17,498 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 5800, loss[loss=0.0902, simple_loss=0.109, pruned_loss=0.02499, audio_tagging_loss=0.01069, over 13639.00 frames. ], tot_loss[loss=0.08311, simple_loss=0.1025, pruned_loss=0.02147, audio_tagging_loss=0.01038, over 3045020.37 frames. ], batch size: 54, lr: 6.30e-03, grad_scale: 16.0 2023-11-19 22:59:24,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=840206.6666666666, ans=0.0 2023-11-19 22:59:29,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=840273.3333333334, ans=0.0 2023-11-19 22:59:39,031 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 126050 2023-11-19 22:59:43,816 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.582e+01 8.258e+01 8.850e+01 9.872e+01 1.564e+02, threshold=1.770e+02, percent-clipped=0.0 2023-11-19 22:59:44,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=840340.0, ans=0.125 2023-11-19 22:59:53,214 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.17 vs. limit=22.5 2023-11-19 23:00:00,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=840406.6666666666, ans=0.0 2023-11-19 23:00:10,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=840473.3333333334, ans=0.0 2023-11-19 23:00:12,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=840473.3333333334, ans=10.0 2023-11-19 23:00:13,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=840473.3333333334, ans=0.0 2023-11-19 23:00:22,552 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 5850, loss[loss=0.08096, simple_loss=0.1047, pruned_loss=0.01939, audio_tagging_loss=0.009197, over 15405.00 frames. ], tot_loss[loss=0.08286, simple_loss=0.1021, pruned_loss=0.02152, audio_tagging_loss=0.01027, over 3045015.64 frames. ], batch size: 57, lr: 6.30e-03, grad_scale: 16.0 2023-11-19 23:00:23,113 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.46 vs. limit=22.5 2023-11-19 23:00:39,971 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.09 vs. limit=15.0 2023-11-19 23:00:40,713 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 23:00:44,888 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 126100 2023-11-19 23:00:48,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=840673.3333333334, ans=0.0 2023-11-19 23:00:53,269 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 23:00:58,455 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.50 vs. limit=10.0 2023-11-19 23:01:00,765 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.99 vs. limit=6.0 2023-11-19 23:01:06,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=840740.0, ans=0.125 2023-11-19 23:01:18,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=840806.6666666666, ans=0.07 2023-11-19 23:01:23,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=840806.6666666666, ans=0.125 2023-11-19 23:01:27,264 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 5900, loss[loss=0.08628, simple_loss=0.1054, pruned_loss=0.02549, audio_tagging_loss=0.008103, over 14988.00 frames. ], tot_loss[loss=0.08281, simple_loss=0.102, pruned_loss=0.02151, audio_tagging_loss=0.01028, over 3044128.88 frames. ], batch size: 56, lr: 6.30e-03, grad_scale: 16.0 2023-11-19 23:01:30,416 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.85 vs. limit=6.0 2023-11-19 23:01:43,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=840940.0, ans=0.125 2023-11-19 23:01:47,538 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.10 vs. limit=12.0 2023-11-19 23:01:49,500 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 126150 2023-11-19 23:01:54,162 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.488e+01 8.354e+01 9.139e+01 9.974e+01 1.416e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-19 23:02:32,586 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 5950, loss[loss=0.07063, simple_loss=0.08597, pruned_loss=0.01596, audio_tagging_loss=0.01169, over 15466.00 frames. ], tot_loss[loss=0.08326, simple_loss=0.103, pruned_loss=0.02154, audio_tagging_loss=0.01023, over 3047439.32 frames. ], batch size: 57, lr: 6.30e-03, grad_scale: 16.0 2023-11-19 23:02:53,879 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 126200 2023-11-19 23:03:05,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=841340.0, ans=0.95 2023-11-19 23:03:11,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=841406.6666666666, ans=0.2 2023-11-19 23:03:15,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=841406.6666666666, ans=0.95 2023-11-19 23:03:36,319 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 6000, loss[loss=0.08889, simple_loss=0.1142, pruned_loss=0.0215, audio_tagging_loss=0.01028, over 15342.00 frames. ], tot_loss[loss=0.08459, simple_loss=0.1048, pruned_loss=0.02211, audio_tagging_loss=0.0101, over 3044082.73 frames. ], batch size: 59, lr: 6.30e-03, grad_scale: 32.0 2023-11-19 23:03:36,320 INFO [train_asr.py:1285] (1/4) Computing validation loss 2023-11-19 23:04:18,344 INFO [train_asr.py:1294] (1/4) Epoch 11, validation: loss=0.06364, simple_loss=0.05477, pruned_loss=0.006179, audio_tagging_loss=0.03008, over 4681554.00 frames. 2023-11-19 23:04:18,345 INFO [train_asr.py:1295] (1/4) Maximum memory allocated so far is 26082MB 2023-11-19 23:04:39,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=841606.6666666666, ans=0.0 2023-11-19 23:04:40,331 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 126250 2023-11-19 23:04:45,165 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.220e+01 8.140e+01 8.896e+01 9.801e+01 1.425e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-19 23:05:06,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=841740.0, ans=0.0 2023-11-19 23:05:07,210 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 23:05:12,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=841806.6666666666, ans=0.125 2023-11-19 23:05:19,001 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.45 vs. limit=15.0 2023-11-19 23:05:23,635 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 6050, loss[loss=0.09505, simple_loss=0.1211, pruned_loss=0.0227, audio_tagging_loss=0.01181, over 14891.00 frames. ], tot_loss[loss=0.08492, simple_loss=0.1059, pruned_loss=0.02207, audio_tagging_loss=0.009911, over 3041649.95 frames. ], batch size: 54, lr: 6.30e-03, grad_scale: 32.0 2023-11-19 23:05:38,493 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.47 vs. limit=12.0 2023-11-19 23:05:45,488 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 126300 2023-11-19 23:05:49,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=842006.6666666666, ans=0.04949747468305833 2023-11-19 23:06:13,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=842073.3333333334, ans=0.125 2023-11-19 23:06:16,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=842140.0, ans=0.125 2023-11-19 23:06:28,645 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 6100, loss[loss=0.09391, simple_loss=0.1295, pruned_loss=0.02156, audio_tagging_loss=0.007584, over 14869.00 frames. ], tot_loss[loss=0.08473, simple_loss=0.1054, pruned_loss=0.02207, audio_tagging_loss=0.009966, over 3045799.80 frames. ], batch size: 52, lr: 6.30e-03, grad_scale: 32.0 2023-11-19 23:06:41,413 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.89 vs. limit=15.0 2023-11-19 23:06:50,072 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 126350 2023-11-19 23:06:54,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=842340.0, ans=0.125 2023-11-19 23:06:55,415 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.987e+01 8.655e+01 9.472e+01 1.043e+02 1.487e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-19 23:06:56,062 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.16 vs. limit=22.5 2023-11-19 23:07:01,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=842340.0, ans=0.0 2023-11-19 23:07:04,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=842340.0, ans=0.2 2023-11-19 23:07:16,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=842406.6666666666, ans=0.125 2023-11-19 23:07:16,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=842406.6666666666, ans=0.125 2023-11-19 23:07:24,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=842473.3333333334, ans=0.0 2023-11-19 23:07:30,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=842473.3333333334, ans=0.0 2023-11-19 23:07:32,559 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 6150, loss[loss=0.09543, simple_loss=0.1108, pruned_loss=0.0291, audio_tagging_loss=0.01093, over 15245.00 frames. ], tot_loss[loss=0.08457, simple_loss=0.1051, pruned_loss=0.02195, audio_tagging_loss=0.01008, over 3038500.59 frames. ], batch size: 57, lr: 6.30e-03, grad_scale: 32.0 2023-11-19 23:07:34,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=842540.0, ans=0.125 2023-11-19 23:07:55,441 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 126400 2023-11-19 23:08:39,210 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 6200, loss[loss=0.06979, simple_loss=0.08057, pruned_loss=0.01809, audio_tagging_loss=0.01142, over 15591.00 frames. ], tot_loss[loss=0.0841, simple_loss=0.1043, pruned_loss=0.02176, audio_tagging_loss=0.01019, over 3037440.38 frames. ], batch size: 61, lr: 6.29e-03, grad_scale: 32.0 2023-11-19 23:08:40,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=842873.3333333334, ans=0.1 2023-11-19 23:09:00,690 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 126450 2023-11-19 23:09:05,393 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.030e+01 8.350e+01 8.922e+01 9.859e+01 1.209e+02, threshold=1.784e+02, percent-clipped=0.0 2023-11-19 23:09:18,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=843073.3333333334, ans=0.0 2023-11-19 23:09:30,499 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 23:09:42,309 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 6250, loss[loss=0.1069, simple_loss=0.1392, pruned_loss=0.02913, audio_tagging_loss=0.008172, over 14970.00 frames. ], tot_loss[loss=0.08407, simple_loss=0.1041, pruned_loss=0.02176, audio_tagging_loss=0.01028, over 3046534.79 frames. ], batch size: 56, lr: 6.29e-03, grad_scale: 32.0 2023-11-19 23:10:04,672 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 126500 2023-11-19 23:10:26,359 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.77 vs. limit=15.0 2023-11-19 23:10:33,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=843473.3333333334, ans=0.125 2023-11-19 23:10:47,069 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 6300, loss[loss=0.09262, simple_loss=0.1215, pruned_loss=0.02369, audio_tagging_loss=0.008196, over 15122.00 frames. ], tot_loss[loss=0.08393, simple_loss=0.1036, pruned_loss=0.02173, audio_tagging_loss=0.01039, over 3040832.24 frames. ], batch size: 55, lr: 6.29e-03, grad_scale: 32.0 2023-11-19 23:10:49,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=843540.0, ans=0.035 2023-11-19 23:11:09,955 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 126550 2023-11-19 23:11:14,951 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.441e+01 8.231e+01 8.988e+01 9.749e+01 1.273e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-19 23:11:17,208 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.30 vs. limit=15.0 2023-11-19 23:11:50,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=843806.6666666666, ans=0.125 2023-11-19 23:11:52,578 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 6350, loss[loss=0.07571, simple_loss=0.09648, pruned_loss=0.01741, audio_tagging_loss=0.01006, over 13447.00 frames. ], tot_loss[loss=0.08419, simple_loss=0.1037, pruned_loss=0.02186, audio_tagging_loss=0.01047, over 3039853.48 frames. ], batch size: 52, lr: 6.29e-03, grad_scale: 32.0 2023-11-19 23:12:04,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=843940.0, ans=0.125 2023-11-19 23:12:06,253 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.29 vs. limit=12.0 2023-11-19 23:12:08,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=843940.0, ans=0.2 2023-11-19 23:12:14,902 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 126600 2023-11-19 23:12:15,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=843940.0, ans=0.025 2023-11-19 23:12:28,183 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.69 vs. limit=15.0 2023-11-19 23:12:30,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=844073.3333333334, ans=0.125 2023-11-19 23:12:47,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=844140.0, ans=0.125 2023-11-19 23:12:47,750 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.86 vs. limit=22.5 2023-11-19 23:12:57,576 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 6400, loss[loss=0.09942, simple_loss=0.1208, pruned_loss=0.03105, audio_tagging_loss=0.00798, over 14624.00 frames. ], tot_loss[loss=0.08435, simple_loss=0.104, pruned_loss=0.02187, audio_tagging_loss=0.01051, over 3039751.27 frames. ], batch size: 55, lr: 6.29e-03, grad_scale: 32.0 2023-11-19 23:13:03,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=844206.6666666666, ans=0.125 2023-11-19 23:13:14,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=844273.3333333334, ans=0.1 2023-11-19 23:13:17,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=844273.3333333334, ans=0.125 2023-11-19 23:13:19,733 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 126650 2023-11-19 23:13:23,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=844340.0, ans=0.2 2023-11-19 23:13:25,599 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.681e+01 8.306e+01 8.849e+01 9.605e+01 1.260e+02, threshold=1.770e+02, percent-clipped=0.0 2023-11-19 23:13:34,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.whiten.whitening_limit, batch_count=844340.0, ans=12.0 2023-11-19 23:13:35,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=844406.6666666666, ans=0.0 2023-11-19 23:14:01,784 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 6450, loss[loss=0.09473, simple_loss=0.104, pruned_loss=0.02765, audio_tagging_loss=0.01509, over 15214.00 frames. ], tot_loss[loss=0.08451, simple_loss=0.1035, pruned_loss=0.02204, audio_tagging_loss=0.0107, over 3036913.81 frames. ], batch size: 57, lr: 6.29e-03, grad_scale: 32.0 2023-11-19 23:14:12,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=844540.0, ans=0.2 2023-11-19 23:14:12,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=844540.0, ans=0.0 2023-11-19 23:14:12,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=844540.0, ans=0.1 2023-11-19 23:14:24,062 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 126700 2023-11-19 23:14:34,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=844673.3333333334, ans=0.125 2023-11-19 23:14:46,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=844740.0, ans=0.125 2023-11-19 23:14:54,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=844806.6666666666, ans=0.125 2023-11-19 23:15:06,438 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 6500, loss[loss=0.09643, simple_loss=0.1097, pruned_loss=0.03101, audio_tagging_loss=0.01056, over 13993.00 frames. ], tot_loss[loss=0.08503, simple_loss=0.1046, pruned_loss=0.02223, audio_tagging_loss=0.0105, over 3043757.12 frames. ], batch size: 53, lr: 6.29e-03, grad_scale: 32.0 2023-11-19 23:15:29,792 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 126750 2023-11-19 23:15:32,681 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.67 vs. limit=10.0 2023-11-19 23:15:35,916 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.834e+01 8.115e+01 8.787e+01 9.556e+01 1.431e+02, threshold=1.757e+02, percent-clipped=0.0 2023-11-19 23:15:44,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=845073.3333333334, ans=0.0 2023-11-19 23:16:07,634 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.95 vs. limit=15.0 2023-11-19 23:16:12,391 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 6550, loss[loss=0.1374, simple_loss=0.1722, pruned_loss=0.04551, audio_tagging_loss=0.005775, over 15737.00 frames. ], tot_loss[loss=0.08491, simple_loss=0.1045, pruned_loss=0.02231, audio_tagging_loss=0.01034, over 3044470.88 frames. ], batch size: 54, lr: 6.29e-03, grad_scale: 32.0 2023-11-19 23:16:16,734 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.94 vs. limit=15.0 2023-11-19 23:16:30,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=845273.3333333334, ans=0.0 2023-11-19 23:16:34,035 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 126800 2023-11-19 23:16:41,426 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.97 vs. limit=15.0 2023-11-19 23:16:50,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=845406.6666666666, ans=0.0 2023-11-19 23:17:01,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=845406.6666666666, ans=0.125 2023-11-19 23:17:17,390 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 6600, loss[loss=0.09488, simple_loss=0.1116, pruned_loss=0.0291, audio_tagging_loss=0.009973, over 15482.00 frames. ], tot_loss[loss=0.08466, simple_loss=0.1042, pruned_loss=0.0223, audio_tagging_loss=0.01025, over 3047846.14 frames. ], batch size: 58, lr: 6.28e-03, grad_scale: 32.0 2023-11-19 23:17:30,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=845606.6666666666, ans=0.125 2023-11-19 23:17:40,140 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 126850 2023-11-19 23:17:42,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=845673.3333333334, ans=0.0 2023-11-19 23:17:46,102 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.644e+01 8.323e+01 8.980e+01 9.690e+01 1.359e+02, threshold=1.796e+02, percent-clipped=0.0 2023-11-19 23:17:56,013 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.73 vs. limit=15.0 2023-11-19 23:18:08,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=845806.6666666666, ans=0.07 2023-11-19 23:18:09,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=845806.6666666666, ans=0.025 2023-11-19 23:18:16,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=845806.6666666666, ans=0.0 2023-11-19 23:18:22,259 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 6650, loss[loss=0.08261, simple_loss=0.09019, pruned_loss=0.02557, audio_tagging_loss=0.01195, over 14957.00 frames. ], tot_loss[loss=0.08418, simple_loss=0.1036, pruned_loss=0.02214, audio_tagging_loss=0.01024, over 3037897.98 frames. ], batch size: 59, lr: 6.28e-03, grad_scale: 32.0 2023-11-19 23:18:35,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=845940.0, ans=0.09899494936611666 2023-11-19 23:18:38,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=845940.0, ans=0.0 2023-11-19 23:18:43,956 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 126900 2023-11-19 23:18:48,330 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.65 vs. limit=10.0 2023-11-19 23:18:48,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=846006.6666666666, ans=0.1 2023-11-19 23:18:53,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=846006.6666666666, ans=0.1 2023-11-19 23:19:23,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=846140.0, ans=0.0 2023-11-19 23:19:26,935 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 6700, loss[loss=0.0732, simple_loss=0.08605, pruned_loss=0.0169, audio_tagging_loss=0.01327, over 15570.00 frames. ], tot_loss[loss=0.08423, simple_loss=0.1039, pruned_loss=0.02212, audio_tagging_loss=0.01018, over 3039119.40 frames. ], batch size: 58, lr: 6.28e-03, grad_scale: 16.0 2023-11-19 23:19:49,126 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 126950 2023-11-19 23:19:56,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=846340.0, ans=0.5 2023-11-19 23:19:57,072 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.564e+01 8.084e+01 8.667e+01 9.126e+01 1.226e+02, threshold=1.733e+02, percent-clipped=0.0 2023-11-19 23:19:58,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=846340.0, ans=0.125 2023-11-19 23:20:32,105 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 6750, loss[loss=0.0576, simple_loss=0.06597, pruned_loss=0.01377, audio_tagging_loss=0.01085, over 13835.00 frames. ], tot_loss[loss=0.08396, simple_loss=0.1036, pruned_loss=0.02201, audio_tagging_loss=0.01016, over 3041015.65 frames. ], batch size: 55, lr: 6.28e-03, grad_scale: 16.0 2023-11-19 23:20:48,930 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.19 vs. limit=15.0 2023-11-19 23:20:53,342 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 127000 2023-11-19 23:20:59,547 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.49 vs. limit=6.0 2023-11-19 23:21:12,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=846740.0, ans=0.0 2023-11-19 23:21:17,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=846740.0, ans=0.125 2023-11-19 23:21:33,206 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.70 vs. limit=15.0 2023-11-19 23:21:36,259 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 6800, loss[loss=0.09333, simple_loss=0.1306, pruned_loss=0.02211, audio_tagging_loss=0.005918, over 15303.00 frames. ], tot_loss[loss=0.08328, simple_loss=0.1028, pruned_loss=0.02171, audio_tagging_loss=0.01015, over 3040565.20 frames. ], batch size: 55, lr: 6.28e-03, grad_scale: 32.0 2023-11-19 23:21:41,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=846873.3333333334, ans=0.2 2023-11-19 23:21:55,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=846940.0, ans=0.125 2023-11-19 23:21:57,892 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 127050 2023-11-19 23:22:05,920 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.069e+01 8.284e+01 8.995e+01 1.009e+02 1.556e+02, threshold=1.799e+02, percent-clipped=0.0 2023-11-19 23:22:21,176 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.02 vs. limit=15.0 2023-11-19 23:22:35,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=847140.0, ans=0.125 2023-11-19 23:22:41,146 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 6850, loss[loss=0.09028, simple_loss=0.1151, pruned_loss=0.02235, audio_tagging_loss=0.01037, over 15145.00 frames. ], tot_loss[loss=0.08267, simple_loss=0.1023, pruned_loss=0.02145, audio_tagging_loss=0.0101, over 3039829.42 frames. ], batch size: 57, lr: 6.28e-03, grad_scale: 32.0 2023-11-19 23:22:50,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=847206.6666666666, ans=0.1 2023-11-19 23:23:03,154 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 127100 2023-11-19 23:23:25,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=847406.6666666666, ans=0.125 2023-11-19 23:23:38,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=847473.3333333334, ans=0.0 2023-11-19 23:23:45,642 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 6900, loss[loss=0.1051, simple_loss=0.1349, pruned_loss=0.02763, audio_tagging_loss=0.01007, over 15473.00 frames. ], tot_loss[loss=0.08337, simple_loss=0.1035, pruned_loss=0.02158, audio_tagging_loss=0.01002, over 3043234.08 frames. ], batch size: 58, lr: 6.28e-03, grad_scale: 32.0 2023-11-19 23:24:04,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=847606.6666666666, ans=0.125 2023-11-19 23:24:07,949 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 127150 2023-11-19 23:24:16,950 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.980e+01 8.238e+01 8.922e+01 9.730e+01 1.552e+02, threshold=1.784e+02, percent-clipped=0.0 2023-11-19 23:24:32,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=847740.0, ans=0.2 2023-11-19 23:24:37,222 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 23:24:39,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=847806.6666666666, ans=0.0 2023-11-19 23:24:44,589 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=12.27 vs. limit=12.0 2023-11-19 23:24:50,544 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 6950, loss[loss=0.07606, simple_loss=0.08592, pruned_loss=0.02073, audio_tagging_loss=0.01237, over 15134.00 frames. ], tot_loss[loss=0.08355, simple_loss=0.1035, pruned_loss=0.02173, audio_tagging_loss=0.01008, over 3041107.23 frames. ], batch size: 58, lr: 6.28e-03, grad_scale: 16.0 2023-11-19 23:24:54,872 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.05 vs. limit=15.0 2023-11-19 23:25:05,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=847940.0, ans=0.0 2023-11-19 23:25:12,270 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 127200 2023-11-19 23:25:13,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=847940.0, ans=0.125 2023-11-19 23:25:18,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=848006.6666666666, ans=0.125 2023-11-19 23:25:20,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=848006.6666666666, ans=0.0 2023-11-19 23:25:26,223 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.95 vs. limit=15.0 2023-11-19 23:25:55,821 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 7000, loss[loss=0.07831, simple_loss=0.08732, pruned_loss=0.01906, audio_tagging_loss=0.01559, over 15071.00 frames. ], tot_loss[loss=0.0828, simple_loss=0.1023, pruned_loss=0.02142, audio_tagging_loss=0.01024, over 3041779.52 frames. ], batch size: 57, lr: 6.27e-03, grad_scale: 16.0 2023-11-19 23:26:00,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=848206.6666666666, ans=0.125 2023-11-19 23:26:04,904 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.15 vs. limit=10.0 2023-11-19 23:26:17,194 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 127250 2023-11-19 23:26:18,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=848273.3333333334, ans=0.1 2023-11-19 23:26:26,374 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.925e+01 8.193e+01 9.050e+01 1.000e+02 1.255e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-19 23:26:44,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=848406.6666666666, ans=0.125 2023-11-19 23:26:49,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=848473.3333333334, ans=0.0 2023-11-19 23:27:00,066 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 7050, loss[loss=0.07915, simple_loss=0.09989, pruned_loss=0.02006, audio_tagging_loss=0.009142, over 14806.00 frames. ], tot_loss[loss=0.08272, simple_loss=0.1021, pruned_loss=0.02133, audio_tagging_loss=0.01035, over 3036087.94 frames. ], batch size: 57, lr: 6.27e-03, grad_scale: 16.0 2023-11-19 23:27:15,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=848606.6666666666, ans=0.125 2023-11-19 23:27:19,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=848606.6666666666, ans=0.2 2023-11-19 23:27:21,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=848606.6666666666, ans=0.2 2023-11-19 23:27:22,375 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 127300 2023-11-19 23:28:00,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=848806.6666666666, ans=0.0 2023-11-19 23:28:03,951 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 7100, loss[loss=0.07175, simple_loss=0.08574, pruned_loss=0.01617, audio_tagging_loss=0.01271, over 15366.00 frames. ], tot_loss[loss=0.0836, simple_loss=0.1031, pruned_loss=0.0216, audio_tagging_loss=0.01044, over 3039277.83 frames. ], batch size: 59, lr: 6.27e-03, grad_scale: 16.0 2023-11-19 23:28:26,302 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 127350 2023-11-19 23:28:30,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=849006.6666666666, ans=0.125 2023-11-19 23:28:34,691 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.672e+01 8.042e+01 8.769e+01 9.719e+01 1.215e+02, threshold=1.754e+02, percent-clipped=0.0 2023-11-19 23:29:08,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=849206.6666666666, ans=0.0 2023-11-19 23:29:08,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=849206.6666666666, ans=0.0 2023-11-19 23:29:09,328 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 7150, loss[loss=0.07684, simple_loss=0.09344, pruned_loss=0.0176, audio_tagging_loss=0.01252, over 16097.00 frames. ], tot_loss[loss=0.08348, simple_loss=0.1031, pruned_loss=0.02147, audio_tagging_loss=0.01047, over 3042558.51 frames. ], batch size: 59, lr: 6.27e-03, grad_scale: 16.0 2023-11-19 23:29:20,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=849273.3333333334, ans=0.1 2023-11-19 23:29:24,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=849273.3333333334, ans=0.1 2023-11-19 23:29:30,721 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 127400 2023-11-19 23:29:31,280 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.80 vs. limit=22.5 2023-11-19 23:29:42,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=849340.0, ans=0.125 2023-11-19 23:29:56,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=849406.6666666666, ans=0.0 2023-11-19 23:30:13,348 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 7200, loss[loss=0.09501, simple_loss=0.0979, pruned_loss=0.03142, audio_tagging_loss=0.01464, over 13279.00 frames. ], tot_loss[loss=0.0837, simple_loss=0.1031, pruned_loss=0.02162, audio_tagging_loss=0.01051, over 3040906.42 frames. ], batch size: 52, lr: 6.27e-03, grad_scale: 32.0 2023-11-19 23:30:27,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=849606.6666666666, ans=0.125 2023-11-19 23:30:35,608 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 127450 2023-11-19 23:30:45,309 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.244e+01 8.591e+01 9.847e+01 1.111e+02 1.455e+02, threshold=1.969e+02, percent-clipped=0.0 2023-11-19 23:30:52,277 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.52 vs. limit=15.0 2023-11-19 23:31:18,085 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 7250, loss[loss=0.0668, simple_loss=0.07788, pruned_loss=0.01584, audio_tagging_loss=0.01203, over 14549.00 frames. ], tot_loss[loss=0.08362, simple_loss=0.1031, pruned_loss=0.02151, audio_tagging_loss=0.01054, over 3046622.50 frames. ], batch size: 56, lr: 6.27e-03, grad_scale: 32.0 2023-11-19 23:31:20,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=849873.3333333334, ans=0.125 2023-11-19 23:31:22,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=849873.3333333334, ans=0.1 2023-11-19 23:31:29,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=849940.0, ans=0.2 2023-11-19 23:31:40,789 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 127500 2023-11-19 23:32:02,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=850073.3333333334, ans=0.125 2023-11-19 23:32:07,729 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.28 vs. limit=22.5 2023-11-19 23:32:08,931 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.44 vs. limit=12.0 2023-11-19 23:32:09,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=850140.0, ans=0.0 2023-11-19 23:32:12,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=850140.0, ans=0.1 2023-11-19 23:32:13,501 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.44 vs. limit=22.5 2023-11-19 23:32:20,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=850140.0, ans=0.0 2023-11-19 23:32:23,469 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 7300, loss[loss=0.06784, simple_loss=0.07188, pruned_loss=0.01846, audio_tagging_loss=0.01344, over 15989.00 frames. ], tot_loss[loss=0.08342, simple_loss=0.1029, pruned_loss=0.02151, audio_tagging_loss=0.01045, over 3044117.55 frames. ], batch size: 61, lr: 6.27e-03, grad_scale: 32.0 2023-11-19 23:32:45,007 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 127550 2023-11-19 23:32:53,440 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.382e+01 8.309e+01 8.798e+01 9.625e+01 1.232e+02, threshold=1.760e+02, percent-clipped=0.0 2023-11-19 23:32:56,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=850340.0, ans=0.125 2023-11-19 23:33:01,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=850406.6666666666, ans=0.0 2023-11-19 23:33:22,908 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.51 vs. limit=6.0 2023-11-19 23:33:26,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=850540.0, ans=0.2 2023-11-19 23:33:26,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=850540.0, ans=0.125 2023-11-19 23:33:27,447 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 7350, loss[loss=0.08446, simple_loss=0.1156, pruned_loss=0.01936, audio_tagging_loss=0.007275, over 15193.00 frames. ], tot_loss[loss=0.08316, simple_loss=0.1029, pruned_loss=0.02143, audio_tagging_loss=0.01027, over 3037409.78 frames. ], batch size: 57, lr: 6.27e-03, grad_scale: 32.0 2023-11-19 23:33:29,120 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 23:33:38,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=850540.0, ans=0.125 2023-11-19 23:33:46,672 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.642e-03 2023-11-19 23:33:48,896 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 127600 2023-11-19 23:33:52,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=850673.3333333334, ans=0.0 2023-11-19 23:33:58,476 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.60 vs. limit=15.0 2023-11-19 23:34:00,264 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.49 vs. limit=15.0 2023-11-19 23:34:25,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=850806.6666666666, ans=0.125 2023-11-19 23:34:31,287 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 7400, loss[loss=0.1061, simple_loss=0.1337, pruned_loss=0.03171, audio_tagging_loss=0.0075, over 15916.00 frames. ], tot_loss[loss=0.08372, simple_loss=0.1038, pruned_loss=0.02166, audio_tagging_loss=0.01014, over 3036262.91 frames. ], batch size: 57, lr: 6.26e-03, grad_scale: 32.0 2023-11-19 23:34:53,556 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 127650 2023-11-19 23:34:57,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=851006.6666666666, ans=0.125 2023-11-19 23:34:58,353 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.96 vs. limit=15.0 2023-11-19 23:35:02,494 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.769e+01 8.438e+01 8.975e+01 9.970e+01 1.315e+02, threshold=1.795e+02, percent-clipped=0.0 2023-11-19 23:35:10,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=851073.3333333334, ans=0.125 2023-11-19 23:35:15,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=851073.3333333334, ans=0.2 2023-11-19 23:35:35,668 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 7450, loss[loss=0.08817, simple_loss=0.1192, pruned_loss=0.02003, audio_tagging_loss=0.008551, over 15936.00 frames. ], tot_loss[loss=0.08349, simple_loss=0.1034, pruned_loss=0.02162, audio_tagging_loss=0.01019, over 3040704.56 frames. ], batch size: 57, lr: 6.26e-03, grad_scale: 32.0 2023-11-19 23:35:57,640 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 127700 2023-11-19 23:36:09,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=851340.0, ans=0.0 2023-11-19 23:36:36,252 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.17 vs. limit=15.0 2023-11-19 23:36:38,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=851473.3333333334, ans=0.1 2023-11-19 23:36:40,529 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 7500, loss[loss=0.07008, simple_loss=0.09417, pruned_loss=0.01349, audio_tagging_loss=0.0095, over 15149.00 frames. ], tot_loss[loss=0.08409, simple_loss=0.1042, pruned_loss=0.02186, audio_tagging_loss=0.01014, over 3051399.21 frames. ], batch size: 56, lr: 6.26e-03, grad_scale: 32.0 2023-11-19 23:36:46,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=851540.0, ans=0.0 2023-11-19 23:36:47,286 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.98 vs. limit=6.0 2023-11-19 23:37:02,028 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 127750 2023-11-19 23:37:11,663 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.49 vs. limit=15.0 2023-11-19 23:37:12,212 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.986e+01 8.268e+01 8.982e+01 9.702e+01 1.380e+02, threshold=1.796e+02, percent-clipped=0.0 2023-11-19 23:37:39,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=851806.6666666666, ans=0.125 2023-11-19 23:37:42,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=851806.6666666666, ans=0.0 2023-11-19 23:37:44,455 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 7550, loss[loss=0.08544, simple_loss=0.108, pruned_loss=0.02029, audio_tagging_loss=0.01116, over 13977.00 frames. ], tot_loss[loss=0.08372, simple_loss=0.1038, pruned_loss=0.0217, audio_tagging_loss=0.0101, over 3054511.62 frames. ], batch size: 53, lr: 6.26e-03, grad_scale: 16.0 2023-11-19 23:37:50,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=851873.3333333334, ans=0.125 2023-11-19 23:38:06,328 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 127800 2023-11-19 23:38:10,823 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.00 vs. limit=12.0 2023-11-19 23:38:26,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=852073.3333333334, ans=0.5 2023-11-19 23:38:41,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=852140.0, ans=0.125 2023-11-19 23:38:41,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=852140.0, ans=0.0 2023-11-19 23:38:42,639 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.82 vs. limit=15.0 2023-11-19 23:38:48,576 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 7600, loss[loss=0.05502, simple_loss=0.06241, pruned_loss=0.01302, audio_tagging_loss=0.0108, over 16136.00 frames. ], tot_loss[loss=0.08332, simple_loss=0.1028, pruned_loss=0.02173, audio_tagging_loss=0.01018, over 3059592.42 frames. ], batch size: 62, lr: 6.26e-03, grad_scale: 32.0 2023-11-19 23:38:50,454 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.66 vs. limit=6.0 2023-11-19 23:38:56,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=852206.6666666666, ans=0.0 2023-11-19 23:39:02,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=852273.3333333334, ans=0.125 2023-11-19 23:39:10,679 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 127850 2023-11-19 23:39:19,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=852340.0, ans=0.0 2023-11-19 23:39:20,827 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.143e+01 8.303e+01 8.868e+01 9.604e+01 1.243e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-19 23:39:49,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=852473.3333333334, ans=0.2 2023-11-19 23:39:50,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=852473.3333333334, ans=0.0 2023-11-19 23:39:52,477 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 7650, loss[loss=0.07616, simple_loss=0.1003, pruned_loss=0.01607, audio_tagging_loss=0.00995, over 14667.00 frames. ], tot_loss[loss=0.0837, simple_loss=0.1034, pruned_loss=0.02184, audio_tagging_loss=0.01015, over 3052078.18 frames. ], batch size: 53, lr: 6.26e-03, grad_scale: 32.0 2023-11-19 23:40:14,495 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 127900 2023-11-19 23:40:18,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=852673.3333333334, ans=0.0 2023-11-19 23:40:41,278 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=852740.0, ans=0.125 2023-11-19 23:40:57,029 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 7700, loss[loss=0.06094, simple_loss=0.0749, pruned_loss=0.01517, audio_tagging_loss=0.008317, over 15314.00 frames. ], tot_loss[loss=0.08379, simple_loss=0.1037, pruned_loss=0.0218, audio_tagging_loss=0.01016, over 3055221.78 frames. ], batch size: 58, lr: 6.26e-03, grad_scale: 16.0 2023-11-19 23:41:19,328 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 127950 2023-11-19 23:41:22,334 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.57 vs. limit=12.0 2023-11-19 23:41:31,369 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.499e+01 8.407e+01 9.041e+01 9.727e+01 1.362e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-19 23:41:37,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=853073.3333333334, ans=0.015 2023-11-19 23:41:51,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=853140.0, ans=0.1 2023-11-19 23:42:01,276 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 7750, loss[loss=0.1098, simple_loss=0.1434, pruned_loss=0.02919, audio_tagging_loss=0.008893, over 14768.00 frames. ], tot_loss[loss=0.08399, simple_loss=0.104, pruned_loss=0.02179, audio_tagging_loss=0.0102, over 3055350.99 frames. ], batch size: 52, lr: 6.26e-03, grad_scale: 8.0 2023-11-19 23:42:22,896 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 128000 2023-11-19 23:42:39,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=853340.0, ans=0.0 2023-11-19 23:42:47,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=853406.6666666666, ans=0.2 2023-11-19 23:42:54,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=853406.6666666666, ans=0.125 2023-11-19 23:43:06,577 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.84 vs. limit=22.5 2023-11-19 23:43:09,387 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 7800, loss[loss=0.09725, simple_loss=0.1181, pruned_loss=0.02901, audio_tagging_loss=0.00919, over 15875.00 frames. ], tot_loss[loss=0.08396, simple_loss=0.1038, pruned_loss=0.0218, audio_tagging_loss=0.01026, over 3048061.72 frames. ], batch size: 57, lr: 6.25e-03, grad_scale: 8.0 2023-11-19 23:43:10,303 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.98 vs. limit=10.0 2023-11-19 23:43:31,540 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 128050 2023-11-19 23:43:36,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=853673.3333333334, ans=0.125 2023-11-19 23:43:41,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=853673.3333333334, ans=0.125 2023-11-19 23:43:44,191 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.125e+01 8.217e+01 8.897e+01 9.655e+01 1.501e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-19 23:43:56,274 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.42 vs. limit=15.0 2023-11-19 23:44:14,101 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 7850, loss[loss=0.06556, simple_loss=0.08392, pruned_loss=0.01453, audio_tagging_loss=0.009061, over 14803.00 frames. ], tot_loss[loss=0.0838, simple_loss=0.1034, pruned_loss=0.0217, audio_tagging_loss=0.01038, over 3045938.21 frames. ], batch size: 56, lr: 6.25e-03, grad_scale: 8.0 2023-11-19 23:44:25,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=853940.0, ans=0.2 2023-11-19 23:44:29,049 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 23:44:33,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=853940.0, ans=0.0 2023-11-19 23:44:35,538 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 128100 2023-11-19 23:45:13,507 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.79 vs. limit=15.0 2023-11-19 23:45:17,512 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 7900, loss[loss=0.0842, simple_loss=0.1041, pruned_loss=0.02093, audio_tagging_loss=0.01123, over 16034.00 frames. ], tot_loss[loss=0.08408, simple_loss=0.1036, pruned_loss=0.02181, audio_tagging_loss=0.01048, over 3044113.49 frames. ], batch size: 59, lr: 6.25e-03, grad_scale: 8.0 2023-11-19 23:45:30,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=854273.3333333334, ans=0.125 2023-11-19 23:45:39,338 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 128150 2023-11-19 23:45:42,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=854340.0, ans=0.0 2023-11-19 23:45:42,694 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.30 vs. limit=15.0 2023-11-19 23:45:51,686 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.70 vs. limit=15.0 2023-11-19 23:45:52,895 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.807e+01 8.209e+01 8.987e+01 9.607e+01 1.593e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-19 23:46:13,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=854473.3333333334, ans=0.125 2023-11-19 23:46:14,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=854473.3333333334, ans=0.0 2023-11-19 23:46:20,323 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.42 vs. limit=22.5 2023-11-19 23:46:22,318 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 7950, loss[loss=0.07161, simple_loss=0.09069, pruned_loss=0.01354, audio_tagging_loss=0.01272, over 15777.00 frames. ], tot_loss[loss=0.08422, simple_loss=0.1033, pruned_loss=0.02203, audio_tagging_loss=0.01056, over 3042733.71 frames. ], batch size: 59, lr: 6.25e-03, grad_scale: 8.0 2023-11-19 23:46:38,894 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 23:46:39,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=854606.6666666666, ans=0.125 2023-11-19 23:46:44,291 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 128200 2023-11-19 23:46:48,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=854673.3333333334, ans=0.125 2023-11-19 23:47:18,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=854806.6666666666, ans=0.125 2023-11-19 23:47:19,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=854806.6666666666, ans=0.125 2023-11-19 23:47:21,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=854806.6666666666, ans=0.125 2023-11-19 23:47:22,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=854806.6666666666, ans=0.1 2023-11-19 23:47:26,473 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 8000, loss[loss=0.1014, simple_loss=0.1215, pruned_loss=0.02855, audio_tagging_loss=0.01216, over 14847.00 frames. ], tot_loss[loss=0.08434, simple_loss=0.1034, pruned_loss=0.02203, audio_tagging_loss=0.01061, over 3042024.35 frames. ], batch size: 54, lr: 6.25e-03, grad_scale: 16.0 2023-11-19 23:47:44,347 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.32 vs. limit=15.0 2023-11-19 23:47:45,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=854940.0, ans=0.0 2023-11-19 23:47:46,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=854940.0, ans=0.2 2023-11-19 23:47:49,104 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 128250 2023-11-19 23:47:54,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=855006.6666666666, ans=0.0 2023-11-19 23:48:01,615 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.413e+01 8.250e+01 9.015e+01 9.647e+01 1.325e+02, threshold=1.803e+02, percent-clipped=0.0 2023-11-19 23:48:11,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=855073.3333333334, ans=0.0 2023-11-19 23:48:14,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=855073.3333333334, ans=0.125 2023-11-19 23:48:17,010 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.49 vs. limit=15.0 2023-11-19 23:48:21,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=855140.0, ans=0.1 2023-11-19 23:48:31,453 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 8050, loss[loss=0.08695, simple_loss=0.1079, pruned_loss=0.02331, audio_tagging_loss=0.009679, over 15109.00 frames. ], tot_loss[loss=0.08349, simple_loss=0.1024, pruned_loss=0.02163, audio_tagging_loss=0.01065, over 3038111.25 frames. ], batch size: 56, lr: 6.25e-03, grad_scale: 16.0 2023-11-19 23:48:34,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=855206.6666666666, ans=0.5 2023-11-19 23:48:53,368 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 128300 2023-11-19 23:48:53,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=855273.3333333334, ans=0.125 2023-11-19 23:49:00,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=855340.0, ans=0.125 2023-11-19 23:49:04,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=855340.0, ans=0.1 2023-11-19 23:49:05,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=855340.0, ans=0.0 2023-11-19 23:49:07,752 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.37 vs. limit=15.0 2023-11-19 23:49:08,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=855406.6666666666, ans=0.125 2023-11-19 23:49:31,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=855473.3333333334, ans=0.0 2023-11-19 23:49:35,451 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 8100, loss[loss=0.0862, simple_loss=0.1045, pruned_loss=0.0201, audio_tagging_loss=0.01383, over 15319.00 frames. ], tot_loss[loss=0.08348, simple_loss=0.1023, pruned_loss=0.02169, audio_tagging_loss=0.01065, over 3041876.86 frames. ], batch size: 55, lr: 6.25e-03, grad_scale: 16.0 2023-11-19 23:49:56,756 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 128350 2023-11-19 23:50:09,347 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.585e+01 8.405e+01 9.063e+01 9.959e+01 1.355e+02, threshold=1.813e+02, percent-clipped=0.0 2023-11-19 23:50:16,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=855740.0, ans=0.125 2023-11-19 23:50:25,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=855806.6666666666, ans=0.0 2023-11-19 23:50:37,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=855873.3333333334, ans=0.1 2023-11-19 23:50:38,125 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 8150, loss[loss=0.06221, simple_loss=0.07124, pruned_loss=0.01478, audio_tagging_loss=0.01181, over 15971.00 frames. ], tot_loss[loss=0.08419, simple_loss=0.1037, pruned_loss=0.02191, audio_tagging_loss=0.01042, over 3051953.46 frames. ], batch size: 65, lr: 6.25e-03, grad_scale: 16.0 2023-11-19 23:50:38,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=855873.3333333334, ans=0.0 2023-11-19 23:50:50,437 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.76 vs. limit=22.5 2023-11-19 23:51:00,901 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 128400 2023-11-19 23:51:09,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=856006.6666666666, ans=0.125 2023-11-19 23:51:23,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=856073.3333333334, ans=0.2 2023-11-19 23:51:42,598 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 8200, loss[loss=0.06748, simple_loss=0.08013, pruned_loss=0.01489, audio_tagging_loss=0.01253, over 15122.00 frames. ], tot_loss[loss=0.08361, simple_loss=0.1032, pruned_loss=0.02165, audio_tagging_loss=0.01035, over 3050687.80 frames. ], batch size: 57, lr: 6.24e-03, grad_scale: 16.0 2023-11-19 23:51:45,067 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 23:52:05,181 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 128450 2023-11-19 23:52:12,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=856340.0, ans=0.125 2023-11-19 23:52:17,387 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.991e+01 8.248e+01 8.899e+01 9.586e+01 1.451e+02, threshold=1.780e+02, percent-clipped=0.0 2023-11-19 23:52:42,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=856473.3333333334, ans=0.125 2023-11-19 23:52:48,581 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 8250, loss[loss=0.07842, simple_loss=0.09966, pruned_loss=0.01988, audio_tagging_loss=0.008708, over 15210.00 frames. ], tot_loss[loss=0.08367, simple_loss=0.1037, pruned_loss=0.02165, audio_tagging_loss=0.01018, over 3052397.89 frames. ], batch size: 55, lr: 6.24e-03, grad_scale: 16.0 2023-11-19 23:52:59,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=856606.6666666666, ans=0.0 2023-11-19 23:53:01,265 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.01 vs. limit=15.0 2023-11-19 23:53:06,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=856606.6666666666, ans=0.07 2023-11-19 23:53:07,331 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.74 vs. limit=15.0 2023-11-19 23:53:09,892 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 128500 2023-11-19 23:53:27,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=856740.0, ans=0.1 2023-11-19 23:53:33,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=856740.0, ans=0.125 2023-11-19 23:53:35,289 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.83 vs. limit=15.0 2023-11-19 23:53:44,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=856806.6666666666, ans=0.125 2023-11-19 23:53:51,365 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 8300, loss[loss=0.07731, simple_loss=0.09246, pruned_loss=0.02013, audio_tagging_loss=0.01095, over 13649.00 frames. ], tot_loss[loss=0.08319, simple_loss=0.1032, pruned_loss=0.02141, audio_tagging_loss=0.01019, over 3062306.88 frames. ], batch size: 52, lr: 6.24e-03, grad_scale: 16.0 2023-11-19 23:53:59,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=856873.3333333334, ans=0.0 2023-11-19 23:54:00,921 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.75 vs. limit=15.0 2023-11-19 23:54:00,986 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.26 vs. limit=6.0 2023-11-19 23:54:12,773 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 128550 2023-11-19 23:54:13,246 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.75 vs. limit=15.0 2023-11-19 23:54:20,486 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.99 vs. limit=15.0 2023-11-19 23:54:27,919 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.671e+01 8.283e+01 8.806e+01 9.666e+01 1.225e+02, threshold=1.761e+02, percent-clipped=0.0 2023-11-19 23:54:29,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=857073.3333333334, ans=0.125 2023-11-19 23:54:31,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=857073.3333333334, ans=0.125 2023-11-19 23:54:35,984 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.66 vs. limit=15.0 2023-11-19 23:54:47,379 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.71 vs. limit=15.0 2023-11-19 23:54:55,407 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 8350, loss[loss=0.0769, simple_loss=0.09514, pruned_loss=0.01802, audio_tagging_loss=0.01131, over 14935.00 frames. ], tot_loss[loss=0.08302, simple_loss=0.1034, pruned_loss=0.0213, audio_tagging_loss=0.01, over 3062220.24 frames. ], batch size: 56, lr: 6.24e-03, grad_scale: 8.0 2023-11-19 23:54:55,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=857206.6666666666, ans=0.125 2023-11-19 23:55:04,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=857206.6666666666, ans=0.125 2023-11-19 23:55:18,364 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 128600 2023-11-19 23:55:29,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=857340.0, ans=0.5 2023-11-19 23:55:29,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=857340.0, ans=0.2 2023-11-19 23:55:51,551 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.21 vs. limit=22.5 2023-11-19 23:55:54,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=857473.3333333334, ans=0.125 2023-11-19 23:55:56,710 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=857473.3333333334, ans=0.125 2023-11-19 23:56:00,793 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 8400, loss[loss=0.08143, simple_loss=0.1011, pruned_loss=0.0208, audio_tagging_loss=0.01007, over 15218.00 frames. ], tot_loss[loss=0.08344, simple_loss=0.1039, pruned_loss=0.02158, audio_tagging_loss=0.009898, over 3056514.34 frames. ], batch size: 57, lr: 6.24e-03, grad_scale: 16.0 2023-11-19 23:56:22,357 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 128650 2023-11-19 23:56:35,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=857673.3333333334, ans=0.0 2023-11-19 23:56:36,148 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.566e+01 8.256e+01 8.921e+01 9.764e+01 1.880e+02, threshold=1.784e+02, percent-clipped=1.0 2023-11-19 23:56:49,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=857740.0, ans=0.125 2023-11-19 23:56:58,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=857806.6666666666, ans=0.125 2023-11-19 23:57:04,612 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 8450, loss[loss=0.07366, simple_loss=0.09022, pruned_loss=0.01631, audio_tagging_loss=0.01224, over 14893.00 frames. ], tot_loss[loss=0.08326, simple_loss=0.1034, pruned_loss=0.02156, audio_tagging_loss=0.01002, over 3057592.01 frames. ], batch size: 56, lr: 6.24e-03, grad_scale: 16.0 2023-11-19 23:57:23,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=857940.0, ans=0.1 2023-11-19 23:57:26,132 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 128700 2023-11-19 23:57:42,514 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.68 vs. limit=15.0 2023-11-19 23:57:43,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=858073.3333333334, ans=0.1 2023-11-19 23:57:46,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=858073.3333333334, ans=0.2 2023-11-19 23:58:07,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=858206.6666666666, ans=0.1 2023-11-19 23:58:08,039 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 8500, loss[loss=0.09238, simple_loss=0.1209, pruned_loss=0.02521, audio_tagging_loss=0.00671, over 15203.00 frames. ], tot_loss[loss=0.08367, simple_loss=0.1036, pruned_loss=0.02183, audio_tagging_loss=0.01002, over 3052265.42 frames. ], batch size: 57, lr: 6.24e-03, grad_scale: 16.0 2023-11-19 23:58:25,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=858273.3333333334, ans=0.125 2023-11-19 23:58:29,600 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.979e-01 2023-11-19 23:58:30,648 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 128750 2023-11-19 23:58:43,810 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.768e+01 8.248e+01 9.044e+01 1.008e+02 1.243e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-19 23:58:44,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=858340.0, ans=0.0 2023-11-19 23:58:55,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=858406.6666666666, ans=0.125 2023-11-19 23:59:12,537 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 8550, loss[loss=0.08808, simple_loss=0.1119, pruned_loss=0.02165, audio_tagging_loss=0.01047, over 16290.00 frames. ], tot_loss[loss=0.08324, simple_loss=0.1031, pruned_loss=0.02157, audio_tagging_loss=0.0101, over 3051969.15 frames. ], batch size: 57, lr: 6.24e-03, grad_scale: 16.0 2023-11-19 23:59:24,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=858606.6666666666, ans=0.05 2023-11-19 23:59:34,430 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 128800 2023-11-19 23:59:35,664 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 23:59:46,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=858673.3333333334, ans=0.0 2023-11-19 23:59:55,078 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=858740.0, ans=0.125 2023-11-19 23:59:56,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=858740.0, ans=0.09899494936611666 2023-11-20 00:00:04,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=858806.6666666666, ans=0.0 2023-11-20 00:00:14,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=858806.6666666666, ans=0.125 2023-11-20 00:00:17,216 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 8600, loss[loss=0.1061, simple_loss=0.1301, pruned_loss=0.03355, audio_tagging_loss=0.007504, over 15409.00 frames. ], tot_loss[loss=0.08317, simple_loss=0.103, pruned_loss=0.02153, audio_tagging_loss=0.01017, over 3050102.25 frames. ], batch size: 58, lr: 6.24e-03, grad_scale: 16.0 2023-11-20 00:00:25,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=858873.3333333334, ans=0.125 2023-11-20 00:00:28,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=858940.0, ans=0.125 2023-11-20 00:00:28,952 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.45 vs. limit=22.5 2023-11-20 00:00:34,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=858940.0, ans=0.125 2023-11-20 00:00:38,575 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 128850 2023-11-20 00:00:42,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=859006.6666666666, ans=0.0 2023-11-20 00:00:52,607 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.469e+01 8.239e+01 8.842e+01 9.457e+01 1.153e+02, threshold=1.768e+02, percent-clipped=0.0 2023-11-20 00:01:12,883 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.05 vs. limit=22.5 2023-11-20 00:01:13,037 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.12 vs. limit=12.0 2023-11-20 00:01:21,495 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 8650, loss[loss=0.08135, simple_loss=0.09569, pruned_loss=0.02325, audio_tagging_loss=0.01025, over 15005.00 frames. ], tot_loss[loss=0.08395, simple_loss=0.1043, pruned_loss=0.02165, audio_tagging_loss=0.01016, over 3053385.50 frames. ], batch size: 54, lr: 6.23e-03, grad_scale: 16.0 2023-11-20 00:01:30,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=859206.6666666666, ans=0.125 2023-11-20 00:01:34,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=859273.3333333334, ans=0.0 2023-11-20 00:01:43,231 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 128900 2023-11-20 00:02:02,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=859406.6666666666, ans=0.1 2023-11-20 00:02:15,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=859473.3333333334, ans=0.125 2023-11-20 00:02:24,865 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 8700, loss[loss=0.06216, simple_loss=0.08346, pruned_loss=0.01076, audio_tagging_loss=0.009661, over 14647.00 frames. ], tot_loss[loss=0.08422, simple_loss=0.1042, pruned_loss=0.0218, audio_tagging_loss=0.01029, over 3054126.05 frames. ], batch size: 54, lr: 6.23e-03, grad_scale: 16.0 2023-11-20 00:02:29,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=859540.0, ans=0.125 2023-11-20 00:02:47,775 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 128950 2023-11-20 00:02:47,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=859606.6666666666, ans=0.125 2023-11-20 00:03:00,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=859673.3333333334, ans=0.125 2023-11-20 00:03:01,598 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.745e+01 8.243e+01 8.970e+01 9.872e+01 1.298e+02, threshold=1.794e+02, percent-clipped=0.0 2023-11-20 00:03:12,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=859740.0, ans=0.0 2023-11-20 00:03:19,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=859806.6666666666, ans=0.0 2023-11-20 00:03:29,211 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 8750, loss[loss=0.09932, simple_loss=0.1222, pruned_loss=0.0278, audio_tagging_loss=0.01041, over 15052.00 frames. ], tot_loss[loss=0.0835, simple_loss=0.1028, pruned_loss=0.02172, audio_tagging_loss=0.01039, over 3052327.33 frames. ], batch size: 56, lr: 6.23e-03, grad_scale: 16.0 2023-11-20 00:03:33,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=859873.3333333334, ans=0.5 2023-11-20 00:03:39,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=859873.3333333334, ans=0.125 2023-11-20 00:03:41,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=859940.0, ans=0.125 2023-11-20 00:03:51,140 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 129000 2023-11-20 00:03:54,548 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.77 vs. limit=22.5 2023-11-20 00:04:00,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=860006.6666666666, ans=0.125 2023-11-20 00:04:00,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=860006.6666666666, ans=0.125 2023-11-20 00:04:08,729 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.54 vs. limit=15.0 2023-11-20 00:04:13,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=860073.3333333334, ans=0.0 2023-11-20 00:04:30,375 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.68 vs. limit=15.0 2023-11-20 00:04:33,983 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 8800, loss[loss=0.0958, simple_loss=0.1105, pruned_loss=0.02825, audio_tagging_loss=0.01232, over 15335.00 frames. ], tot_loss[loss=0.08438, simple_loss=0.1039, pruned_loss=0.02196, audio_tagging_loss=0.01049, over 3050797.57 frames. ], batch size: 58, lr: 6.23e-03, grad_scale: 32.0 2023-11-20 00:04:43,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=860206.6666666666, ans=0.1 2023-11-20 00:04:51,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=860273.3333333334, ans=0.125 2023-11-20 00:04:55,349 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 129050 2023-11-20 00:05:03,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=860340.0, ans=0.125 2023-11-20 00:05:09,093 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.191e+01 8.422e+01 9.194e+01 1.008e+02 1.237e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-20 00:05:12,167 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.36 vs. limit=15.0 2023-11-20 00:05:14,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=860406.6666666666, ans=22.5 2023-11-20 00:05:26,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=860473.3333333334, ans=0.2 2023-11-20 00:05:37,380 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 8850, loss[loss=0.08658, simple_loss=0.1061, pruned_loss=0.02284, audio_tagging_loss=0.01069, over 15114.00 frames. ], tot_loss[loss=0.08396, simple_loss=0.1033, pruned_loss=0.02177, audio_tagging_loss=0.01055, over 3040517.33 frames. ], batch size: 55, lr: 6.23e-03, grad_scale: 32.0 2023-11-20 00:05:44,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=860540.0, ans=0.0 2023-11-20 00:05:52,303 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 00:05:56,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=860606.6666666666, ans=0.0 2023-11-20 00:05:59,868 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 129100 2023-11-20 00:06:03,087 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.41 vs. limit=15.0 2023-11-20 00:06:43,099 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 8900, loss[loss=0.07877, simple_loss=0.09694, pruned_loss=0.02164, audio_tagging_loss=0.008665, over 14865.00 frames. ], tot_loss[loss=0.08389, simple_loss=0.1032, pruned_loss=0.02186, audio_tagging_loss=0.01042, over 3041712.47 frames. ], batch size: 56, lr: 6.23e-03, grad_scale: 32.0 2023-11-20 00:06:47,323 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.29 vs. limit=15.0 2023-11-20 00:06:57,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=860940.0, ans=0.125 2023-11-20 00:07:01,481 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.68 vs. limit=22.5 2023-11-20 00:07:05,285 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 129150 2023-11-20 00:07:18,685 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.416e+01 8.249e+01 8.942e+01 1.024e+02 1.298e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-20 00:07:28,038 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.95 vs. limit=15.0 2023-11-20 00:07:28,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=861073.3333333334, ans=0.0 2023-11-20 00:07:35,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=861140.0, ans=0.125 2023-11-20 00:07:47,621 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 8950, loss[loss=0.08712, simple_loss=0.1025, pruned_loss=0.02324, audio_tagging_loss=0.01265, over 14120.00 frames. ], tot_loss[loss=0.08405, simple_loss=0.1036, pruned_loss=0.02197, audio_tagging_loss=0.01031, over 3046926.93 frames. ], batch size: 55, lr: 6.23e-03, grad_scale: 32.0 2023-11-20 00:07:48,184 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.53 vs. limit=15.0 2023-11-20 00:07:54,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=861206.6666666666, ans=0.125 2023-11-20 00:08:09,168 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 129200 2023-11-20 00:08:42,819 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.78 vs. limit=15.0 2023-11-20 00:08:52,272 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 9000, loss[loss=0.1007, simple_loss=0.1307, pruned_loss=0.02669, audio_tagging_loss=0.0087, over 15526.00 frames. ], tot_loss[loss=0.08488, simple_loss=0.1048, pruned_loss=0.02225, audio_tagging_loss=0.01024, over 3054557.01 frames. ], batch size: 57, lr: 6.23e-03, grad_scale: 32.0 2023-11-20 00:08:52,273 INFO [train_asr.py:1285] (1/4) Computing validation loss 2023-11-20 00:09:27,410 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([3.9636, 3.2759, 2.8008, 2.9918, 3.4645, 2.7138, 3.3098, 2.8217], device='cuda:1') 2023-11-20 00:09:31,828 INFO [train_asr.py:1294] (1/4) Epoch 11, validation: loss=0.06425, simple_loss=0.05461, pruned_loss=0.006061, audio_tagging_loss=0.03088, over 4681554.00 frames. 2023-11-20 00:09:31,828 INFO [train_asr.py:1295] (1/4) Maximum memory allocated so far is 26082MB 2023-11-20 00:09:38,465 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.34 vs. limit=15.0 2023-11-20 00:09:53,909 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 129250 2023-11-20 00:10:08,227 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.624e+01 8.204e+01 8.877e+01 9.469e+01 1.301e+02, threshold=1.775e+02, percent-clipped=0.0 2023-11-20 00:10:11,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=861740.0, ans=0.1 2023-11-20 00:10:21,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=861740.0, ans=0.0 2023-11-20 00:10:35,647 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 9050, loss[loss=0.07576, simple_loss=0.1004, pruned_loss=0.01601, audio_tagging_loss=0.009569, over 16102.00 frames. ], tot_loss[loss=0.08406, simple_loss=0.1036, pruned_loss=0.02204, audio_tagging_loss=0.01021, over 3053063.66 frames. ], batch size: 59, lr: 6.22e-03, grad_scale: 32.0 2023-11-20 00:10:35,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=861873.3333333334, ans=0.125 2023-11-20 00:10:39,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=861873.3333333334, ans=0.125 2023-11-20 00:10:50,118 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 00:10:52,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=861940.0, ans=0.125 2023-11-20 00:10:55,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=861940.0, ans=0.125 2023-11-20 00:10:57,867 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 129300 2023-11-20 00:11:10,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=862006.6666666666, ans=0.0 2023-11-20 00:11:13,647 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.56 vs. limit=15.0 2023-11-20 00:11:16,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=862073.3333333334, ans=0.2 2023-11-20 00:11:28,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=862140.0, ans=0.2 2023-11-20 00:11:39,798 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 9100, loss[loss=0.09756, simple_loss=0.1055, pruned_loss=0.03344, audio_tagging_loss=0.01138, over 15068.00 frames. ], tot_loss[loss=0.08403, simple_loss=0.1041, pruned_loss=0.02188, audio_tagging_loss=0.01009, over 3062558.05 frames. ], batch size: 55, lr: 6.22e-03, grad_scale: 16.0 2023-11-20 00:12:02,104 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 129350 2023-11-20 00:12:17,840 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.021e+01 8.243e+01 9.006e+01 9.571e+01 1.391e+02, threshold=1.801e+02, percent-clipped=0.0 2023-11-20 00:12:25,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=862406.6666666666, ans=0.125 2023-11-20 00:12:28,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=862406.6666666666, ans=0.125 2023-11-20 00:12:41,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=862473.3333333334, ans=0.125 2023-11-20 00:12:45,019 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 9150, loss[loss=0.07117, simple_loss=0.08629, pruned_loss=0.01756, audio_tagging_loss=0.01046, over 14939.00 frames. ], tot_loss[loss=0.08465, simple_loss=0.105, pruned_loss=0.02212, audio_tagging_loss=0.01004, over 3061297.03 frames. ], batch size: 57, lr: 6.22e-03, grad_scale: 16.0 2023-11-20 00:12:50,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=862540.0, ans=0.125 2023-11-20 00:12:52,920 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.77 vs. limit=22.5 2023-11-20 00:13:06,326 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 129400 2023-11-20 00:13:19,064 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.83 vs. limit=6.0 2023-11-20 00:13:21,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=862673.3333333334, ans=0.125 2023-11-20 00:13:29,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=862740.0, ans=0.0 2023-11-20 00:13:39,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=862806.6666666666, ans=0.0 2023-11-20 00:13:49,200 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 9200, loss[loss=0.09064, simple_loss=0.1164, pruned_loss=0.02276, audio_tagging_loss=0.009682, over 16616.00 frames. ], tot_loss[loss=0.08448, simple_loss=0.1048, pruned_loss=0.02195, audio_tagging_loss=0.0101, over 3067744.27 frames. ], batch size: 61, lr: 6.22e-03, grad_scale: 32.0 2023-11-20 00:13:55,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=862873.3333333334, ans=0.0 2023-11-20 00:14:04,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=862940.0, ans=0.125 2023-11-20 00:14:06,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=862940.0, ans=0.04949747468305833 2023-11-20 00:14:10,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=862940.0, ans=0.125 2023-11-20 00:14:11,535 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 129450 2023-11-20 00:14:14,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=863006.6666666666, ans=0.0 2023-11-20 00:14:15,868 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=863006.6666666666, ans=0.125 2023-11-20 00:14:17,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=863006.6666666666, ans=0.0 2023-11-20 00:14:22,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=863006.6666666666, ans=0.125 2023-11-20 00:14:27,344 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.764e+01 8.355e+01 9.081e+01 9.853e+01 1.317e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-20 00:14:40,484 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.41 vs. limit=15.0 2023-11-20 00:14:41,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=863140.0, ans=0.125 2023-11-20 00:14:51,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=863140.0, ans=0.0 2023-11-20 00:14:54,720 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 9250, loss[loss=0.07321, simple_loss=0.08218, pruned_loss=0.02007, audio_tagging_loss=0.01205, over 15121.00 frames. ], tot_loss[loss=0.08408, simple_loss=0.1044, pruned_loss=0.02184, audio_tagging_loss=0.01007, over 3059464.51 frames. ], batch size: 56, lr: 6.22e-03, grad_scale: 32.0 2023-11-20 00:15:02,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=863206.6666666666, ans=0.125 2023-11-20 00:15:10,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=863273.3333333334, ans=0.125 2023-11-20 00:15:13,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=863273.3333333334, ans=0.1 2023-11-20 00:15:15,073 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.67 vs. limit=15.0 2023-11-20 00:15:16,856 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 129500 2023-11-20 00:15:25,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=863340.0, ans=0.1 2023-11-20 00:15:40,109 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.37 vs. limit=15.0 2023-11-20 00:15:53,475 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.38 vs. limit=12.0 2023-11-20 00:15:59,590 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 9300, loss[loss=0.09758, simple_loss=0.1264, pruned_loss=0.0283, audio_tagging_loss=0.006068, over 15572.00 frames. ], tot_loss[loss=0.08331, simple_loss=0.1031, pruned_loss=0.02165, audio_tagging_loss=0.0101, over 3060708.27 frames. ], batch size: 56, lr: 6.22e-03, grad_scale: 32.0 2023-11-20 00:16:12,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=863606.6666666666, ans=0.125 2023-11-20 00:16:17,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=863606.6666666666, ans=0.0 2023-11-20 00:16:21,308 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 129550 2023-11-20 00:16:22,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=863606.6666666666, ans=0.0 2023-11-20 00:16:23,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=863673.3333333334, ans=0.125 2023-11-20 00:16:25,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=863673.3333333334, ans=0.125 2023-11-20 00:16:26,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=863673.3333333334, ans=0.0 2023-11-20 00:16:27,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=863673.3333333334, ans=0.02 2023-11-20 00:16:30,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=863673.3333333334, ans=0.125 2023-11-20 00:16:36,560 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.891e+01 8.083e+01 9.100e+01 9.829e+01 1.304e+02, threshold=1.820e+02, percent-clipped=0.0 2023-11-20 00:16:40,651 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.87 vs. limit=22.5 2023-11-20 00:16:50,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=863806.6666666666, ans=0.09899494936611666 2023-11-20 00:16:57,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=863806.6666666666, ans=0.125 2023-11-20 00:17:03,647 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 9350, loss[loss=0.08125, simple_loss=0.09844, pruned_loss=0.01829, audio_tagging_loss=0.01373, over 15022.00 frames. ], tot_loss[loss=0.08278, simple_loss=0.1022, pruned_loss=0.02146, audio_tagging_loss=0.0102, over 3062072.67 frames. ], batch size: 56, lr: 6.22e-03, grad_scale: 32.0 2023-11-20 00:17:04,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=863873.3333333334, ans=0.025 2023-11-20 00:17:24,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=863940.0, ans=0.1 2023-11-20 00:17:26,558 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 129600 2023-11-20 00:17:34,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=864006.6666666666, ans=0.125 2023-11-20 00:17:40,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=864006.6666666666, ans=0.0 2023-11-20 00:17:53,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=864073.3333333334, ans=0.0 2023-11-20 00:17:58,380 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.25 vs. limit=15.0 2023-11-20 00:18:05,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=864140.0, ans=0.07 2023-11-20 00:18:09,196 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 9400, loss[loss=0.1026, simple_loss=0.1199, pruned_loss=0.03095, audio_tagging_loss=0.01176, over 15066.00 frames. ], tot_loss[loss=0.08352, simple_loss=0.1031, pruned_loss=0.02174, audio_tagging_loss=0.01022, over 3064750.20 frames. ], batch size: 55, lr: 6.22e-03, grad_scale: 32.0 2023-11-20 00:18:28,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=864273.3333333334, ans=0.125 2023-11-20 00:18:30,332 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.78 vs. limit=15.0 2023-11-20 00:18:32,239 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 129650 2023-11-20 00:18:32,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=864273.3333333334, ans=0.1 2023-11-20 00:18:42,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=864340.0, ans=0.0 2023-11-20 00:18:46,739 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.773e+01 8.451e+01 9.430e+01 1.021e+02 1.598e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-20 00:19:02,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=864473.3333333334, ans=0.0 2023-11-20 00:19:14,287 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 9450, loss[loss=0.09145, simple_loss=0.1276, pruned_loss=0.01988, audio_tagging_loss=0.007741, over 16137.00 frames. ], tot_loss[loss=0.08406, simple_loss=0.1041, pruned_loss=0.02174, audio_tagging_loss=0.01029, over 3059158.11 frames. ], batch size: 56, lr: 6.21e-03, grad_scale: 32.0 2023-11-20 00:19:14,335 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 00:19:14,491 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 00:19:22,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=864540.0, ans=0.125 2023-11-20 00:19:35,881 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 129700 2023-11-20 00:20:03,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=864740.0, ans=0.125 2023-11-20 00:20:18,800 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 9500, loss[loss=0.08781, simple_loss=0.1123, pruned_loss=0.01976, audio_tagging_loss=0.0119, over 15049.00 frames. ], tot_loss[loss=0.0838, simple_loss=0.1037, pruned_loss=0.02158, audio_tagging_loss=0.01035, over 3055500.74 frames. ], batch size: 55, lr: 6.21e-03, grad_scale: 32.0 2023-11-20 00:20:40,542 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 129750 2023-11-20 00:20:50,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=865006.6666666666, ans=0.125 2023-11-20 00:20:57,182 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.873e+01 8.143e+01 9.051e+01 9.932e+01 1.802e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-20 00:21:12,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=865140.0, ans=0.0 2023-11-20 00:21:13,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=865140.0, ans=0.125 2023-11-20 00:21:21,942 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.55 vs. limit=6.0 2023-11-20 00:21:23,866 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 9550, loss[loss=0.0854, simple_loss=0.1016, pruned_loss=0.02243, audio_tagging_loss=0.01215, over 14173.00 frames. ], tot_loss[loss=0.08432, simple_loss=0.1043, pruned_loss=0.02177, audio_tagging_loss=0.01039, over 3049626.95 frames. ], batch size: 55, lr: 6.21e-03, grad_scale: 32.0 2023-11-20 00:21:32,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=865206.6666666666, ans=0.125 2023-11-20 00:21:46,581 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 129800 2023-11-20 00:21:54,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=865340.0, ans=0.125 2023-11-20 00:22:11,497 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.66 vs. limit=22.5 2023-11-20 00:22:17,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=865473.3333333334, ans=0.125 2023-11-20 00:22:29,119 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 9600, loss[loss=0.08288, simple_loss=0.09688, pruned_loss=0.02138, audio_tagging_loss=0.01306, over 16042.00 frames. ], tot_loss[loss=0.08377, simple_loss=0.1035, pruned_loss=0.02155, audio_tagging_loss=0.01048, over 3048702.66 frames. ], batch size: 61, lr: 6.21e-03, grad_scale: 32.0 2023-11-20 00:22:29,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=865540.0, ans=0.125 2023-11-20 00:22:50,664 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 129850 2023-11-20 00:23:02,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=865673.3333333334, ans=0.0 2023-11-20 00:23:05,962 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.678e+01 8.225e+01 8.966e+01 9.703e+01 1.238e+02, threshold=1.793e+02, percent-clipped=0.0 2023-11-20 00:23:08,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=865740.0, ans=0.1 2023-11-20 00:23:26,752 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.58 vs. limit=15.0 2023-11-20 00:23:33,784 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 9650, loss[loss=0.08434, simple_loss=0.1144, pruned_loss=0.0187, audio_tagging_loss=0.008432, over 15157.00 frames. ], tot_loss[loss=0.08428, simple_loss=0.104, pruned_loss=0.02188, audio_tagging_loss=0.01042, over 3044859.90 frames. ], batch size: 54, lr: 6.21e-03, grad_scale: 32.0 2023-11-20 00:23:55,427 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 129900 2023-11-20 00:24:08,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=866006.6666666666, ans=0.125 2023-11-20 00:24:15,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=866073.3333333334, ans=0.125 2023-11-20 00:24:22,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=866073.3333333334, ans=0.2 2023-11-20 00:24:37,642 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 9700, loss[loss=0.1011, simple_loss=0.132, pruned_loss=0.02311, audio_tagging_loss=0.01203, over 15032.00 frames. ], tot_loss[loss=0.08393, simple_loss=0.1038, pruned_loss=0.02173, audio_tagging_loss=0.01028, over 3037689.14 frames. ], batch size: 54, lr: 6.21e-03, grad_scale: 32.0 2023-11-20 00:24:56,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=866273.3333333334, ans=0.035 2023-11-20 00:24:57,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=866273.3333333334, ans=0.1 2023-11-20 00:24:59,576 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 129950 2023-11-20 00:24:59,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=866273.3333333334, ans=0.125 2023-11-20 00:25:14,932 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.210e+01 8.274e+01 8.934e+01 1.009e+02 1.297e+02, threshold=1.787e+02, percent-clipped=0.0 2023-11-20 00:25:17,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=866406.6666666666, ans=0.125 2023-11-20 00:25:33,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=866473.3333333334, ans=0.0 2023-11-20 00:25:36,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=866473.3333333334, ans=0.125 2023-11-20 00:25:41,625 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 9750, loss[loss=0.07787, simple_loss=0.09499, pruned_loss=0.01946, audio_tagging_loss=0.01091, over 14655.00 frames. ], tot_loss[loss=0.08351, simple_loss=0.1035, pruned_loss=0.02158, audio_tagging_loss=0.0102, over 3036973.38 frames. ], batch size: 56, lr: 6.21e-03, grad_scale: 32.0 2023-11-20 00:26:04,583 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 130000 2023-11-20 00:26:47,953 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 9800, loss[loss=0.1058, simple_loss=0.1437, pruned_loss=0.02474, audio_tagging_loss=0.009182, over 16334.00 frames. ], tot_loss[loss=0.08347, simple_loss=0.1033, pruned_loss=0.02164, audio_tagging_loss=0.01016, over 3038411.46 frames. ], batch size: 58, lr: 6.21e-03, grad_scale: 16.0 2023-11-20 00:26:49,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=866873.3333333334, ans=0.0 2023-11-20 00:27:05,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=866940.0, ans=0.125 2023-11-20 00:27:09,863 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 130050 2023-11-20 00:27:26,359 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.248e+01 8.368e+01 8.974e+01 9.697e+01 1.703e+02, threshold=1.795e+02, percent-clipped=0.0 2023-11-20 00:27:37,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=867073.3333333334, ans=0.05 2023-11-20 00:27:47,321 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 00:27:49,212 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.06 vs. limit=15.0 2023-11-20 00:27:52,248 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 9850, loss[loss=0.1027, simple_loss=0.132, pruned_loss=0.02765, audio_tagging_loss=0.009017, over 16218.00 frames. ], tot_loss[loss=0.08394, simple_loss=0.1039, pruned_loss=0.02188, audio_tagging_loss=0.01013, over 3041111.91 frames. ], batch size: 58, lr: 6.21e-03, grad_scale: 16.0 2023-11-20 00:27:54,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=867206.6666666666, ans=0.125 2023-11-20 00:28:06,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=867273.3333333334, ans=0.125 2023-11-20 00:28:09,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff3.min_abs, batch_count=867273.3333333334, ans=0.2 2023-11-20 00:28:14,571 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 130100 2023-11-20 00:28:19,407 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.10 vs. limit=22.5 2023-11-20 00:28:39,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=867406.6666666666, ans=0.0 2023-11-20 00:28:41,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=867406.6666666666, ans=0.125 2023-11-20 00:28:46,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=867473.3333333334, ans=0.0 2023-11-20 00:28:51,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=867473.3333333334, ans=0.2 2023-11-20 00:28:57,199 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 9900, loss[loss=0.113, simple_loss=0.1427, pruned_loss=0.03322, audio_tagging_loss=0.008402, over 14761.00 frames. ], tot_loss[loss=0.08442, simple_loss=0.1047, pruned_loss=0.02202, audio_tagging_loss=0.01007, over 3042710.65 frames. ], batch size: 53, lr: 6.20e-03, grad_scale: 16.0 2023-11-20 00:29:01,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=867540.0, ans=10.0 2023-11-20 00:29:07,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=867540.0, ans=0.0 2023-11-20 00:29:07,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=867540.0, ans=0.95 2023-11-20 00:29:20,363 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 130150 2023-11-20 00:29:24,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=867673.3333333334, ans=0.125 2023-11-20 00:29:36,775 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.726e+01 8.221e+01 8.937e+01 9.593e+01 1.338e+02, threshold=1.787e+02, percent-clipped=0.0 2023-11-20 00:29:54,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=867806.6666666666, ans=0.2 2023-11-20 00:29:56,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=867806.6666666666, ans=0.0 2023-11-20 00:30:02,309 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 9950, loss[loss=0.1048, simple_loss=0.1253, pruned_loss=0.03275, audio_tagging_loss=0.009351, over 16422.00 frames. ], tot_loss[loss=0.08449, simple_loss=0.1049, pruned_loss=0.02209, audio_tagging_loss=0.009979, over 3051775.91 frames. ], batch size: 58, lr: 6.20e-03, grad_scale: 16.0 2023-11-20 00:30:24,643 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 130200 2023-11-20 00:30:42,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=868073.3333333334, ans=0.0 2023-11-20 00:30:59,048 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.58 vs. limit=15.0 2023-11-20 00:31:01,754 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.20 vs. limit=22.5 2023-11-20 00:31:03,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=868140.0, ans=0.125 2023-11-20 00:31:07,010 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 10000, loss[loss=0.07487, simple_loss=0.09101, pruned_loss=0.01485, audio_tagging_loss=0.01452, over 14756.00 frames. ], tot_loss[loss=0.08406, simple_loss=0.1046, pruned_loss=0.02176, audio_tagging_loss=0.009973, over 3051659.89 frames. ], batch size: 55, lr: 6.20e-03, grad_scale: 32.0 2023-11-20 00:31:12,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=868206.6666666666, ans=0.1 2023-11-20 00:31:21,824 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.58 vs. limit=15.0 2023-11-20 00:31:27,761 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.19 vs. limit=15.0 2023-11-20 00:31:28,607 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 130250 2023-11-20 00:31:36,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=868340.0, ans=0.1 2023-11-20 00:31:45,673 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.042e+01 8.141e+01 8.733e+01 9.527e+01 1.222e+02, threshold=1.747e+02, percent-clipped=0.0 2023-11-20 00:31:47,747 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.07 vs. limit=15.0 2023-11-20 00:31:51,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=868406.6666666666, ans=0.125 2023-11-20 00:32:11,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=868540.0, ans=0.5 2023-11-20 00:32:11,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=868540.0, ans=0.125 2023-11-20 00:32:11,980 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 10050, loss[loss=0.06929, simple_loss=0.08169, pruned_loss=0.01801, audio_tagging_loss=0.01044, over 14320.00 frames. ], tot_loss[loss=0.08384, simple_loss=0.1042, pruned_loss=0.02176, audio_tagging_loss=0.009966, over 3050461.54 frames. ], batch size: 55, lr: 6.20e-03, grad_scale: 32.0 2023-11-20 00:32:27,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=868606.6666666666, ans=0.1 2023-11-20 00:32:28,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=868606.6666666666, ans=0.2 2023-11-20 00:32:33,897 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 130300 2023-11-20 00:32:40,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=868673.3333333334, ans=0.0 2023-11-20 00:32:46,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=868673.3333333334, ans=0.2 2023-11-20 00:32:46,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=868673.3333333334, ans=0.125 2023-11-20 00:32:58,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=868740.0, ans=0.2 2023-11-20 00:33:17,426 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 10100, loss[loss=0.07134, simple_loss=0.07961, pruned_loss=0.01899, audio_tagging_loss=0.01255, over 13788.00 frames. ], tot_loss[loss=0.08374, simple_loss=0.1039, pruned_loss=0.02171, audio_tagging_loss=0.01006, over 3053123.21 frames. ], batch size: 53, lr: 6.20e-03, grad_scale: 32.0 2023-11-20 00:33:21,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=868873.3333333334, ans=0.125 2023-11-20 00:33:24,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=868873.3333333334, ans=0.125 2023-11-20 00:33:33,514 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.75 vs. limit=22.5 2023-11-20 00:33:36,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=868940.0, ans=0.125 2023-11-20 00:33:36,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=868940.0, ans=0.1 2023-11-20 00:33:39,568 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 130350 2023-11-20 00:33:40,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=868940.0, ans=0.125 2023-11-20 00:33:42,540 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.91 vs. limit=15.0 2023-11-20 00:33:49,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=869006.6666666666, ans=0.125 2023-11-20 00:33:55,281 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.691e+01 8.143e+01 8.992e+01 9.764e+01 1.668e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-20 00:34:01,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=869073.3333333334, ans=0.2 2023-11-20 00:34:07,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=869140.0, ans=0.125 2023-11-20 00:34:11,668 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 00:34:21,583 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 10150, loss[loss=0.108, simple_loss=0.139, pruned_loss=0.02924, audio_tagging_loss=0.00928, over 15452.00 frames. ], tot_loss[loss=0.08416, simple_loss=0.1045, pruned_loss=0.0218, audio_tagging_loss=0.0101, over 3060506.03 frames. ], batch size: 55, lr: 6.20e-03, grad_scale: 32.0 2023-11-20 00:34:21,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=869206.6666666666, ans=0.0 2023-11-20 00:34:26,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=869206.6666666666, ans=0.0 2023-11-20 00:34:35,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=869273.3333333334, ans=0.125 2023-11-20 00:34:36,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=869273.3333333334, ans=0.07 2023-11-20 00:34:37,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=869273.3333333334, ans=0.125 2023-11-20 00:34:42,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=869273.3333333334, ans=0.2 2023-11-20 00:34:43,459 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 130400 2023-11-20 00:34:49,940 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.19 vs. limit=15.0 2023-11-20 00:34:54,942 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 00:35:03,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=869406.6666666666, ans=0.125 2023-11-20 00:35:11,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=869406.6666666666, ans=0.125 2023-11-20 00:35:23,061 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.91 vs. limit=15.0 2023-11-20 00:35:27,260 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 10200, loss[loss=0.07828, simple_loss=0.08741, pruned_loss=0.02138, audio_tagging_loss=0.01319, over 15708.00 frames. ], tot_loss[loss=0.08397, simple_loss=0.1038, pruned_loss=0.02183, audio_tagging_loss=0.01023, over 3061183.03 frames. ], batch size: 59, lr: 6.20e-03, grad_scale: 32.0 2023-11-20 00:35:43,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=869606.6666666666, ans=0.125 2023-11-20 00:35:49,322 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 130450 2023-11-20 00:35:50,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=869606.6666666666, ans=0.0 2023-11-20 00:35:54,260 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 00:36:06,668 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.786e+01 8.250e+01 8.852e+01 1.003e+02 1.443e+02, threshold=1.770e+02, percent-clipped=0.0 2023-11-20 00:36:06,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=869740.0, ans=0.05 2023-11-20 00:36:10,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=869740.0, ans=0.0 2023-11-20 00:36:25,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=869806.6666666666, ans=0.125 2023-11-20 00:36:32,652 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 10250, loss[loss=0.06357, simple_loss=0.08033, pruned_loss=0.01069, audio_tagging_loss=0.01271, over 15276.00 frames. ], tot_loss[loss=0.08294, simple_loss=0.1026, pruned_loss=0.02135, audio_tagging_loss=0.0103, over 3058146.79 frames. ], batch size: 59, lr: 6.20e-03, grad_scale: 32.0 2023-11-20 00:36:53,515 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 130500 2023-11-20 00:37:15,442 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.09 vs. limit=22.5 2023-11-20 00:37:17,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=870073.3333333334, ans=0.125 2023-11-20 00:37:18,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=870073.3333333334, ans=0.95 2023-11-20 00:37:36,721 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 10300, loss[loss=0.06965, simple_loss=0.08121, pruned_loss=0.01663, audio_tagging_loss=0.01241, over 14779.00 frames. ], tot_loss[loss=0.08239, simple_loss=0.1017, pruned_loss=0.02113, audio_tagging_loss=0.01042, over 3054741.35 frames. ], batch size: 56, lr: 6.19e-03, grad_scale: 32.0 2023-11-20 00:37:45,421 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.41 vs. limit=15.0 2023-11-20 00:37:50,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=870273.3333333334, ans=0.04949747468305833 2023-11-20 00:37:59,505 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 130550 2023-11-20 00:38:16,961 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.312e+01 8.334e+01 9.071e+01 9.729e+01 1.396e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-20 00:38:26,277 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.91 vs. limit=6.0 2023-11-20 00:38:40,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=870473.3333333334, ans=0.125 2023-11-20 00:38:42,744 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 10350, loss[loss=0.09143, simple_loss=0.1072, pruned_loss=0.02625, audio_tagging_loss=0.01159, over 14690.00 frames. ], tot_loss[loss=0.08235, simple_loss=0.1016, pruned_loss=0.02105, audio_tagging_loss=0.01051, over 3054413.47 frames. ], batch size: 54, lr: 6.19e-03, grad_scale: 32.0 2023-11-20 00:38:46,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=870540.0, ans=0.0 2023-11-20 00:38:49,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=870540.0, ans=0.125 2023-11-20 00:39:05,162 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 130600 2023-11-20 00:39:06,543 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=870606.6666666666, ans=0.1 2023-11-20 00:39:47,656 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 10400, loss[loss=0.103, simple_loss=0.1313, pruned_loss=0.02706, audio_tagging_loss=0.01025, over 15779.00 frames. ], tot_loss[loss=0.08335, simple_loss=0.1028, pruned_loss=0.02141, audio_tagging_loss=0.01055, over 3050311.37 frames. ], batch size: 57, lr: 6.19e-03, grad_scale: 32.0 2023-11-20 00:39:52,884 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2023-11-20 00:40:02,707 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.73 vs. limit=15.0 2023-11-20 00:40:09,380 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 130650 2023-11-20 00:40:09,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=870940.0, ans=0.07 2023-11-20 00:40:19,022 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.69 vs. limit=6.0 2023-11-20 00:40:26,387 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.869e+01 8.319e+01 9.012e+01 9.651e+01 1.388e+02, threshold=1.802e+02, percent-clipped=0.0 2023-11-20 00:40:28,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=871073.3333333334, ans=0.125 2023-11-20 00:40:36,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=871073.3333333334, ans=0.125 2023-11-20 00:40:47,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=871140.0, ans=0.125 2023-11-20 00:40:52,083 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 10450, loss[loss=0.07403, simple_loss=0.09176, pruned_loss=0.01871, audio_tagging_loss=0.009442, over 14224.00 frames. ], tot_loss[loss=0.08307, simple_loss=0.1025, pruned_loss=0.02131, audio_tagging_loss=0.01051, over 3046176.24 frames. ], batch size: 55, lr: 6.19e-03, grad_scale: 32.0 2023-11-20 00:41:14,132 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 130700 2023-11-20 00:41:20,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=871340.0, ans=0.2 2023-11-20 00:41:42,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=871473.3333333334, ans=0.0 2023-11-20 00:41:45,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=871473.3333333334, ans=0.125 2023-11-20 00:41:56,750 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 10500, loss[loss=0.07327, simple_loss=0.09127, pruned_loss=0.01717, audio_tagging_loss=0.01047, over 15231.00 frames. ], tot_loss[loss=0.08296, simple_loss=0.1024, pruned_loss=0.0213, audio_tagging_loss=0.01043, over 3046513.57 frames. ], batch size: 60, lr: 6.19e-03, grad_scale: 32.0 2023-11-20 00:42:19,556 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 130750 2023-11-20 00:42:35,578 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.944e+01 8.359e+01 9.035e+01 1.000e+02 1.181e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-20 00:42:37,381 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.37 vs. limit=15.0 2023-11-20 00:42:53,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=871806.6666666666, ans=0.0 2023-11-20 00:43:01,874 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 10550, loss[loss=0.08201, simple_loss=0.1054, pruned_loss=0.02256, audio_tagging_loss=0.006752, over 15870.00 frames. ], tot_loss[loss=0.08209, simple_loss=0.1011, pruned_loss=0.02121, audio_tagging_loss=0.01034, over 3048656.02 frames. ], batch size: 57, lr: 6.19e-03, grad_scale: 32.0 2023-11-20 00:43:07,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=871873.3333333334, ans=0.125 2023-11-20 00:43:23,339 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 130800 2023-11-20 00:43:43,938 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.51 vs. limit=15.0 2023-11-20 00:44:03,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=872140.0, ans=0.125 2023-11-20 00:44:06,377 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 10600, loss[loss=0.09285, simple_loss=0.1153, pruned_loss=0.02634, audio_tagging_loss=0.00884, over 14966.00 frames. ], tot_loss[loss=0.0814, simple_loss=0.1002, pruned_loss=0.02099, audio_tagging_loss=0.01031, over 3043649.56 frames. ], batch size: 58, lr: 6.19e-03, grad_scale: 16.0 2023-11-20 00:44:12,152 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.03 vs. limit=15.0 2023-11-20 00:44:14,061 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 00:44:21,207 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.12 vs. limit=15.0 2023-11-20 00:44:27,752 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 130850 2023-11-20 00:44:36,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=872340.0, ans=0.125 2023-11-20 00:44:39,364 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.68 vs. limit=12.0 2023-11-20 00:44:46,877 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.153e+01 8.252e+01 9.029e+01 9.863e+01 1.267e+02, threshold=1.806e+02, percent-clipped=0.0 2023-11-20 00:45:04,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=872473.3333333334, ans=0.5 2023-11-20 00:45:09,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=872540.0, ans=0.2 2023-11-20 00:45:10,719 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 10650, loss[loss=0.07883, simple_loss=0.09493, pruned_loss=0.01759, audio_tagging_loss=0.01377, over 14740.00 frames. ], tot_loss[loss=0.08235, simple_loss=0.1015, pruned_loss=0.02135, audio_tagging_loss=0.01024, over 3038011.30 frames. ], batch size: 56, lr: 6.19e-03, grad_scale: 16.0 2023-11-20 00:45:32,913 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 130900 2023-11-20 00:45:53,468 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.99 vs. limit=22.5 2023-11-20 00:46:03,806 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.97 vs. limit=6.0 2023-11-20 00:46:05,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=872806.6666666666, ans=0.125 2023-11-20 00:46:05,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=872806.6666666666, ans=0.0 2023-11-20 00:46:15,578 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 10700, loss[loss=0.08496, simple_loss=0.09915, pruned_loss=0.02268, audio_tagging_loss=0.0127, over 14585.00 frames. ], tot_loss[loss=0.08317, simple_loss=0.1027, pruned_loss=0.02164, audio_tagging_loss=0.0102, over 3033575.52 frames. ], batch size: 54, lr: 6.19e-03, grad_scale: 16.0 2023-11-20 00:46:18,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=872873.3333333334, ans=0.04949747468305833 2023-11-20 00:46:19,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=872873.3333333334, ans=0.0 2023-11-20 00:46:35,397 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.34 vs. limit=15.0 2023-11-20 00:46:37,377 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 130950 2023-11-20 00:46:37,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=872940.0, ans=0.125 2023-11-20 00:46:55,060 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.497e+01 8.315e+01 9.053e+01 9.865e+01 1.273e+02, threshold=1.811e+02, percent-clipped=0.0 2023-11-20 00:47:03,716 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.99 vs. limit=15.0 2023-11-20 00:47:06,766 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.64 vs. limit=15.0 2023-11-20 00:47:20,602 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 10750, loss[loss=0.07143, simple_loss=0.09289, pruned_loss=0.01577, audio_tagging_loss=0.009217, over 15509.00 frames. ], tot_loss[loss=0.0815, simple_loss=0.1007, pruned_loss=0.02094, audio_tagging_loss=0.0102, over 3032720.06 frames. ], batch size: 57, lr: 6.18e-03, grad_scale: 16.0 2023-11-20 00:47:41,956 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 131000 2023-11-20 00:48:02,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=873406.6666666666, ans=0.125 2023-11-20 00:48:06,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=873406.6666666666, ans=0.125 2023-11-20 00:48:23,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=873540.0, ans=0.125 2023-11-20 00:48:24,192 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 10800, loss[loss=0.08579, simple_loss=0.1082, pruned_loss=0.02214, audio_tagging_loss=0.00956, over 14656.00 frames. ], tot_loss[loss=0.08174, simple_loss=0.101, pruned_loss=0.02108, audio_tagging_loss=0.01017, over 3031844.39 frames. ], batch size: 54, lr: 6.18e-03, grad_scale: 16.0 2023-11-20 00:48:38,676 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.77 vs. limit=15.0 2023-11-20 00:48:40,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=873606.6666666666, ans=0.015 2023-11-20 00:48:46,047 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 131050 2023-11-20 00:49:03,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=873740.0, ans=0.125 2023-11-20 00:49:05,230 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.697e+01 8.209e+01 8.933e+01 9.655e+01 1.364e+02, threshold=1.787e+02, percent-clipped=0.0 2023-11-20 00:49:21,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=873806.6666666666, ans=0.125 2023-11-20 00:49:27,989 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 10850, loss[loss=0.07974, simple_loss=0.08976, pruned_loss=0.02114, audio_tagging_loss=0.01371, over 15449.00 frames. ], tot_loss[loss=0.08249, simple_loss=0.102, pruned_loss=0.02135, audio_tagging_loss=0.01012, over 3035681.22 frames. ], batch size: 58, lr: 6.18e-03, grad_scale: 16.0 2023-11-20 00:49:31,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=873873.3333333334, ans=0.1 2023-11-20 00:49:45,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=873940.0, ans=0.125 2023-11-20 00:49:50,564 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 131100 2023-11-20 00:50:01,835 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 00:50:04,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=874006.6666666666, ans=0.0 2023-11-20 00:50:12,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=874073.3333333334, ans=0.2 2023-11-20 00:50:32,919 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 10900, loss[loss=0.06034, simple_loss=0.07604, pruned_loss=0.01188, audio_tagging_loss=0.01044, over 14913.00 frames. ], tot_loss[loss=0.08274, simple_loss=0.1025, pruned_loss=0.02127, audio_tagging_loss=0.01022, over 3040794.11 frames. ], batch size: 56, lr: 6.18e-03, grad_scale: 16.0 2023-11-20 00:50:32,956 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 00:50:34,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=874206.6666666666, ans=0.0 2023-11-20 00:50:37,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=874206.6666666666, ans=0.125 2023-11-20 00:50:39,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=874206.6666666666, ans=0.0 2023-11-20 00:50:48,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=874273.3333333334, ans=0.125 2023-11-20 00:50:50,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=874273.3333333334, ans=0.125 2023-11-20 00:50:54,860 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 131150 2023-11-20 00:51:06,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=874340.0, ans=0.125 2023-11-20 00:51:13,344 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.101e+01 8.251e+01 8.753e+01 9.767e+01 1.236e+02, threshold=1.751e+02, percent-clipped=0.0 2023-11-20 00:51:34,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=874473.3333333334, ans=0.125 2023-11-20 00:51:35,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=874540.0, ans=0.2 2023-11-20 00:51:36,262 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 10950, loss[loss=0.04875, simple_loss=0.05938, pruned_loss=0.007222, audio_tagging_loss=0.01184, over 14654.00 frames. ], tot_loss[loss=0.08365, simple_loss=0.1038, pruned_loss=0.02159, audio_tagging_loss=0.01017, over 3038035.59 frames. ], batch size: 59, lr: 6.18e-03, grad_scale: 16.0 2023-11-20 00:51:40,868 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=874540.0, ans=0.125 2023-11-20 00:51:43,763 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.36 vs. limit=15.0 2023-11-20 00:51:58,410 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 131200 2023-11-20 00:52:08,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=874673.3333333334, ans=0.125 2023-11-20 00:52:16,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=874740.0, ans=0.125 2023-11-20 00:52:18,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=874740.0, ans=0.125 2023-11-20 00:52:32,456 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.49 vs. limit=10.0 2023-11-20 00:52:41,622 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 11000, loss[loss=0.0756, simple_loss=0.1011, pruned_loss=0.01829, audio_tagging_loss=0.006775, over 15341.00 frames. ], tot_loss[loss=0.08425, simple_loss=0.1046, pruned_loss=0.02182, audio_tagging_loss=0.01015, over 3046855.86 frames. ], batch size: 57, lr: 6.18e-03, grad_scale: 16.0 2023-11-20 00:52:42,239 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.85 vs. limit=6.0 2023-11-20 00:52:56,456 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 00:53:00,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=874940.0, ans=0.125 2023-11-20 00:53:04,385 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 131250 2023-11-20 00:53:04,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=874940.0, ans=0.125 2023-11-20 00:53:05,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=874940.0, ans=0.0 2023-11-20 00:53:10,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=875006.6666666666, ans=0.125 2023-11-20 00:53:11,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=875006.6666666666, ans=0.0 2023-11-20 00:53:22,561 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=875073.3333333334, ans=0.125 2023-11-20 00:53:23,522 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.815e+01 8.049e+01 8.869e+01 9.421e+01 1.178e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-20 00:53:23,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=875073.3333333334, ans=0.0 2023-11-20 00:53:26,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=875073.3333333334, ans=0.125 2023-11-20 00:53:32,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=875140.0, ans=0.035 2023-11-20 00:53:43,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=875140.0, ans=0.125 2023-11-20 00:53:46,581 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 11050, loss[loss=0.07947, simple_loss=0.09181, pruned_loss=0.01981, audio_tagging_loss=0.01376, over 14573.00 frames. ], tot_loss[loss=0.08349, simple_loss=0.1032, pruned_loss=0.02155, audio_tagging_loss=0.01033, over 3048221.98 frames. ], batch size: 55, lr: 6.18e-03, grad_scale: 8.0 2023-11-20 00:54:00,745 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=875273.3333333334, ans=0.1 2023-11-20 00:54:08,805 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 131300 2023-11-20 00:54:10,721 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.02 vs. limit=22.5 2023-11-20 00:54:43,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=875473.3333333334, ans=0.125 2023-11-20 00:54:51,018 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 11100, loss[loss=0.06237, simple_loss=0.07487, pruned_loss=0.01463, audio_tagging_loss=0.0103, over 16154.00 frames. ], tot_loss[loss=0.08388, simple_loss=0.1036, pruned_loss=0.02169, audio_tagging_loss=0.0104, over 3048108.67 frames. ], batch size: 63, lr: 6.18e-03, grad_scale: 8.0 2023-11-20 00:54:53,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=875540.0, ans=0.125 2023-11-20 00:55:12,446 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 131350 2023-11-20 00:55:21,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=875673.3333333334, ans=0.125 2023-11-20 00:55:25,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=875673.3333333334, ans=0.125 2023-11-20 00:55:33,840 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.519e+01 8.391e+01 9.002e+01 9.833e+01 1.655e+02, threshold=1.800e+02, percent-clipped=0.0 2023-11-20 00:55:41,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=875806.6666666666, ans=0.0 2023-11-20 00:55:44,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=875806.6666666666, ans=0.0 2023-11-20 00:55:55,325 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 11150, loss[loss=0.0583, simple_loss=0.0674, pruned_loss=0.01125, audio_tagging_loss=0.01335, over 15020.00 frames. ], tot_loss[loss=0.08326, simple_loss=0.1025, pruned_loss=0.02142, audio_tagging_loss=0.01057, over 3046444.57 frames. ], batch size: 59, lr: 6.17e-03, grad_scale: 8.0 2023-11-20 00:56:06,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=875873.3333333334, ans=0.07 2023-11-20 00:56:13,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=875940.0, ans=0.0 2023-11-20 00:56:17,347 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 131400 2023-11-20 00:56:20,606 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.80 vs. limit=12.0 2023-11-20 00:56:26,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=876006.6666666666, ans=0.125 2023-11-20 00:57:00,050 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 11200, loss[loss=0.06781, simple_loss=0.07859, pruned_loss=0.01627, audio_tagging_loss=0.01225, over 13518.00 frames. ], tot_loss[loss=0.08269, simple_loss=0.1018, pruned_loss=0.02121, audio_tagging_loss=0.01061, over 3043562.83 frames. ], batch size: 53, lr: 6.17e-03, grad_scale: 16.0 2023-11-20 00:57:17,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=876273.3333333334, ans=0.1 2023-11-20 00:57:22,029 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 131450 2023-11-20 00:57:26,860 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.55 vs. limit=12.0 2023-11-20 00:57:42,708 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.046e+01 8.062e+01 8.697e+01 9.573e+01 1.140e+02, threshold=1.739e+02, percent-clipped=0.0 2023-11-20 00:57:46,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=876406.6666666666, ans=0.0 2023-11-20 00:58:04,925 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 11250, loss[loss=0.07757, simple_loss=0.093, pruned_loss=0.02015, audio_tagging_loss=0.01092, over 14547.00 frames. ], tot_loss[loss=0.08291, simple_loss=0.1023, pruned_loss=0.02123, audio_tagging_loss=0.01054, over 3043540.55 frames. ], batch size: 56, lr: 6.17e-03, grad_scale: 16.0 2023-11-20 00:58:18,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=876606.6666666666, ans=0.125 2023-11-20 00:58:26,618 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 131500 2023-11-20 00:58:59,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=876806.6666666666, ans=0.125 2023-11-20 00:59:09,623 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 11300, loss[loss=0.08436, simple_loss=0.1081, pruned_loss=0.01746, audio_tagging_loss=0.01283, over 14308.00 frames. ], tot_loss[loss=0.08231, simple_loss=0.1018, pruned_loss=0.02104, audio_tagging_loss=0.01035, over 3043501.93 frames. ], batch size: 54, lr: 6.17e-03, grad_scale: 16.0 2023-11-20 00:59:14,593 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=876873.3333333334, ans=0.125 2023-11-20 00:59:14,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=876873.3333333334, ans=0.1 2023-11-20 00:59:23,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=876940.0, ans=0.0 2023-11-20 00:59:31,656 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 131550 2023-11-20 00:59:53,567 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.175e+01 8.098e+01 8.659e+01 9.613e+01 1.705e+02, threshold=1.732e+02, percent-clipped=0.0 2023-11-20 01:00:13,580 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.16 vs. limit=15.0 2023-11-20 01:00:14,039 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 11350, loss[loss=0.08193, simple_loss=0.1062, pruned_loss=0.02136, audio_tagging_loss=0.007485, over 16362.00 frames. ], tot_loss[loss=0.08268, simple_loss=0.1023, pruned_loss=0.0212, audio_tagging_loss=0.01032, over 3050188.18 frames. ], batch size: 59, lr: 6.17e-03, grad_scale: 8.0 2023-11-20 01:00:35,819 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 131600 2023-11-20 01:00:39,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=877340.0, ans=0.125 2023-11-20 01:00:39,705 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.89 vs. limit=15.0 2023-11-20 01:00:46,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=877340.0, ans=0.2 2023-11-20 01:01:00,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=877406.6666666666, ans=0.125 2023-11-20 01:01:18,976 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 11400, loss[loss=0.07238, simple_loss=0.08482, pruned_loss=0.01986, audio_tagging_loss=0.0101, over 15398.00 frames. ], tot_loss[loss=0.08355, simple_loss=0.1037, pruned_loss=0.02153, audio_tagging_loss=0.01017, over 3053964.55 frames. ], batch size: 58, lr: 6.17e-03, grad_scale: 8.0 2023-11-20 01:01:29,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=877540.0, ans=0.0 2023-11-20 01:01:33,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=877606.6666666666, ans=0.0 2023-11-20 01:01:39,007 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.15 vs. limit=15.0 2023-11-20 01:01:40,733 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 131650 2023-11-20 01:01:55,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=877673.3333333334, ans=0.0 2023-11-20 01:01:58,624 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.34 vs. limit=10.0 2023-11-20 01:01:59,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=877740.0, ans=0.1 2023-11-20 01:02:02,814 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.051e+01 8.241e+01 9.056e+01 1.011e+02 3.989e+02, threshold=1.811e+02, percent-clipped=1.0 2023-11-20 01:02:09,610 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.26 vs. limit=15.0 2023-11-20 01:02:11,986 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.77 vs. limit=15.0 2023-11-20 01:02:21,534 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=877806.6666666666, ans=0.1 2023-11-20 01:02:23,685 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 11450, loss[loss=0.07568, simple_loss=0.08808, pruned_loss=0.02108, audio_tagging_loss=0.01056, over 15256.00 frames. ], tot_loss[loss=0.08323, simple_loss=0.1033, pruned_loss=0.02142, audio_tagging_loss=0.01016, over 3055887.94 frames. ], batch size: 57, lr: 6.17e-03, grad_scale: 8.0 2023-11-20 01:02:23,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=877873.3333333334, ans=0.95 2023-11-20 01:02:39,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=877940.0, ans=0.125 2023-11-20 01:02:45,826 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 131700 2023-11-20 01:03:06,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=878073.3333333334, ans=0.0 2023-11-20 01:03:27,749 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 11500, loss[loss=0.08542, simple_loss=0.09875, pruned_loss=0.02481, audio_tagging_loss=0.01123, over 15260.00 frames. ], tot_loss[loss=0.08319, simple_loss=0.1033, pruned_loss=0.02142, audio_tagging_loss=0.0101, over 3051653.18 frames. ], batch size: 56, lr: 6.17e-03, grad_scale: 8.0 2023-11-20 01:03:30,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=878206.6666666666, ans=0.0 2023-11-20 01:03:49,221 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 131750 2023-11-20 01:04:01,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=878340.0, ans=0.125 2023-11-20 01:04:05,524 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 01:04:11,078 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.409e+01 8.478e+01 9.308e+01 1.005e+02 1.725e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-20 01:04:18,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=878473.3333333334, ans=0.1 2023-11-20 01:04:24,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=878473.3333333334, ans=0.1 2023-11-20 01:04:25,303 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.27 vs. limit=22.5 2023-11-20 01:04:31,784 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 11550, loss[loss=0.08238, simple_loss=0.09764, pruned_loss=0.02021, audio_tagging_loss=0.01335, over 15373.00 frames. ], tot_loss[loss=0.08328, simple_loss=0.1033, pruned_loss=0.0215, audio_tagging_loss=0.01012, over 3051567.62 frames. ], batch size: 58, lr: 6.17e-03, grad_scale: 8.0 2023-11-20 01:04:54,012 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 131800 2023-11-20 01:04:57,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=878673.3333333334, ans=0.07 2023-11-20 01:05:01,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=878673.3333333334, ans=0.125 2023-11-20 01:05:09,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=878740.0, ans=0.0 2023-11-20 01:05:15,159 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 01:05:21,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=878740.0, ans=0.1 2023-11-20 01:05:28,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=878806.6666666666, ans=0.125 2023-11-20 01:05:36,285 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 11600, loss[loss=0.08882, simple_loss=0.1131, pruned_loss=0.02236, audio_tagging_loss=0.0099, over 15102.00 frames. ], tot_loss[loss=0.08289, simple_loss=0.1029, pruned_loss=0.02138, audio_tagging_loss=0.01007, over 3055501.78 frames. ], batch size: 55, lr: 6.16e-03, grad_scale: 16.0 2023-11-20 01:05:43,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=878873.3333333334, ans=0.125 2023-11-20 01:05:57,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=878940.0, ans=0.0 2023-11-20 01:05:58,512 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 131850 2023-11-20 01:06:01,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=879006.6666666666, ans=10.0 2023-11-20 01:06:19,909 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.853e+01 8.090e+01 8.633e+01 9.438e+01 1.426e+02, threshold=1.727e+02, percent-clipped=0.0 2023-11-20 01:06:27,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=879140.0, ans=0.0 2023-11-20 01:06:40,879 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 11650, loss[loss=0.1054, simple_loss=0.134, pruned_loss=0.03103, audio_tagging_loss=0.007316, over 15057.00 frames. ], tot_loss[loss=0.0833, simple_loss=0.1037, pruned_loss=0.02134, audio_tagging_loss=0.01009, over 3048653.99 frames. ], batch size: 59, lr: 6.16e-03, grad_scale: 16.0 2023-11-20 01:06:45,659 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.09 vs. limit=22.5 2023-11-20 01:07:02,955 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 131900 2023-11-20 01:07:06,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=879340.0, ans=0.07 2023-11-20 01:07:11,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=879340.0, ans=0.0 2023-11-20 01:07:16,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=879340.0, ans=0.05 2023-11-20 01:07:22,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=879406.6666666666, ans=0.2 2023-11-20 01:07:29,366 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.59 vs. limit=15.0 2023-11-20 01:07:30,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=879406.6666666666, ans=0.125 2023-11-20 01:07:45,862 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 11700, loss[loss=0.07536, simple_loss=0.09152, pruned_loss=0.02053, audio_tagging_loss=0.009072, over 14996.00 frames. ], tot_loss[loss=0.08337, simple_loss=0.1037, pruned_loss=0.02137, audio_tagging_loss=0.01017, over 3054477.32 frames. ], batch size: 58, lr: 6.16e-03, grad_scale: 16.0 2023-11-20 01:07:53,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=879540.0, ans=0.0 2023-11-20 01:07:57,473 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.38 vs. limit=10.0 2023-11-20 01:08:07,439 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 131950 2023-11-20 01:08:07,916 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.57 vs. limit=15.0 2023-11-20 01:08:30,044 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.705e+01 8.385e+01 9.019e+01 1.002e+02 1.324e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-20 01:08:34,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=879740.0, ans=0.0 2023-11-20 01:08:46,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=879806.6666666666, ans=0.0 2023-11-20 01:08:49,734 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 11750, loss[loss=0.0858, simple_loss=0.1041, pruned_loss=0.02423, audio_tagging_loss=0.009497, over 16218.00 frames. ], tot_loss[loss=0.08295, simple_loss=0.1031, pruned_loss=0.02122, audio_tagging_loss=0.01017, over 3051503.06 frames. ], batch size: 64, lr: 6.16e-03, grad_scale: 16.0 2023-11-20 01:09:12,281 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 132000 2023-11-20 01:09:26,473 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.30 vs. limit=12.0 2023-11-20 01:09:28,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=880006.6666666666, ans=0.0 2023-11-20 01:09:45,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=880140.0, ans=0.125 2023-11-20 01:09:58,223 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 11800, loss[loss=0.07545, simple_loss=0.08112, pruned_loss=0.02079, audio_tagging_loss=0.0141, over 15510.00 frames. ], tot_loss[loss=0.0821, simple_loss=0.1021, pruned_loss=0.02088, audio_tagging_loss=0.01017, over 3048127.05 frames. ], batch size: 59, lr: 6.16e-03, grad_scale: 16.0 2023-11-20 01:10:04,594 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.01 vs. limit=15.0 2023-11-20 01:10:11,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=880273.3333333334, ans=0.0 2023-11-20 01:10:18,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=880273.3333333334, ans=0.1 2023-11-20 01:10:20,395 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 132050 2023-11-20 01:10:20,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=880273.3333333334, ans=0.2 2023-11-20 01:10:41,790 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.087e+01 8.587e+01 9.267e+01 9.920e+01 1.513e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-20 01:10:44,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=880406.6666666666, ans=0.035 2023-11-20 01:10:45,122 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.22 vs. limit=15.0 2023-11-20 01:10:47,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=880406.6666666666, ans=0.125 2023-11-20 01:10:47,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=880406.6666666666, ans=0.125 2023-11-20 01:11:02,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=880540.0, ans=0.125 2023-11-20 01:11:03,508 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 11850, loss[loss=0.06497, simple_loss=0.07266, pruned_loss=0.01673, audio_tagging_loss=0.01192, over 14800.00 frames. ], tot_loss[loss=0.08229, simple_loss=0.1019, pruned_loss=0.02104, audio_tagging_loss=0.01032, over 3029123.64 frames. ], batch size: 58, lr: 6.16e-03, grad_scale: 16.0 2023-11-20 01:11:07,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=880540.0, ans=10.0 2023-11-20 01:11:07,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=880540.0, ans=0.0 2023-11-20 01:11:24,954 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 132100 2023-11-20 01:11:31,406 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=880673.3333333334, ans=0.125 2023-11-20 01:11:33,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=880673.3333333334, ans=0.125 2023-11-20 01:11:43,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=880740.0, ans=0.0 2023-11-20 01:11:49,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=880740.0, ans=0.125 2023-11-20 01:11:50,313 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.65 vs. limit=15.0 2023-11-20 01:11:52,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=880740.0, ans=0.2 2023-11-20 01:11:59,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=880806.6666666666, ans=0.1 2023-11-20 01:12:06,493 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 11900, loss[loss=0.09164, simple_loss=0.1193, pruned_loss=0.02348, audio_tagging_loss=0.008514, over 15866.00 frames. ], tot_loss[loss=0.08306, simple_loss=0.1031, pruned_loss=0.02119, audio_tagging_loss=0.01035, over 3044820.90 frames. ], batch size: 58, lr: 6.16e-03, grad_scale: 16.0 2023-11-20 01:12:18,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=880940.0, ans=0.0 2023-11-20 01:12:23,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=880940.0, ans=0.125 2023-11-20 01:12:28,384 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 132150 2023-11-20 01:12:34,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=881006.6666666666, ans=0.0 2023-11-20 01:12:39,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=881006.6666666666, ans=0.125 2023-11-20 01:12:39,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=881006.6666666666, ans=0.125 2023-11-20 01:12:47,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=881073.3333333334, ans=0.1 2023-11-20 01:12:50,268 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.260e+01 8.333e+01 8.992e+01 9.854e+01 1.328e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-20 01:13:02,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=881140.0, ans=0.125 2023-11-20 01:13:10,597 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 11950, loss[loss=0.09219, simple_loss=0.112, pruned_loss=0.02566, audio_tagging_loss=0.01051, over 14237.00 frames. ], tot_loss[loss=0.08329, simple_loss=0.1031, pruned_loss=0.02128, audio_tagging_loss=0.01046, over 3046190.22 frames. ], batch size: 57, lr: 6.16e-03, grad_scale: 16.0 2023-11-20 01:13:15,848 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.17 vs. limit=22.5 2023-11-20 01:13:16,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=881206.6666666666, ans=0.125 2023-11-20 01:13:29,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=881273.3333333334, ans=0.125 2023-11-20 01:13:33,296 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 132200 2023-11-20 01:13:50,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=881406.6666666666, ans=0.2 2023-11-20 01:14:07,435 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.67 vs. limit=15.0 2023-11-20 01:14:10,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=881473.3333333334, ans=0.125 2023-11-20 01:14:13,593 INFO [train_asr.py:1262] (1/4) Epoch 11, batch 12000, loss[loss=0.09456, simple_loss=0.1174, pruned_loss=0.02693, audio_tagging_loss=0.008948, over 14978.00 frames. ], tot_loss[loss=0.08384, simple_loss=0.1037, pruned_loss=0.02151, audio_tagging_loss=0.01048, over 3050581.85 frames. ], batch size: 56, lr: 6.15e-03, grad_scale: 32.0 2023-11-20 01:14:13,594 INFO [train_asr.py:1285] (1/4) Computing validation loss 2023-11-20 01:14:57,665 INFO [train_asr.py:1294] (1/4) Epoch 11, validation: loss=0.06362, simple_loss=0.05468, pruned_loss=0.006127, audio_tagging_loss=0.03015, over 4681554.00 frames. 2023-11-20 01:14:57,666 INFO [train_asr.py:1295] (1/4) Maximum memory allocated so far is 26082MB 2023-11-20 01:15:17,680 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 132250 2023-11-20 01:15:20,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=881673.3333333334, ans=0.0 2023-11-20 01:15:24,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=881673.3333333334, ans=0.07 2023-11-20 01:15:26,800 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.26 vs. limit=15.0 2023-11-20 01:16:05,085 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 0, loss[loss=0.1126, simple_loss=0.1368, pruned_loss=0.02642, audio_tagging_loss=0.01778, over 15368.00 frames. ], tot_loss[loss=0.1126, simple_loss=0.1368, pruned_loss=0.02642, audio_tagging_loss=0.01778, over 15368.00 frames. ], batch size: 56, lr: 5.90e-03, grad_scale: 32.0 2023-11-20 01:16:05,085 INFO [train_asr.py:1285] (1/4) Computing validation loss 2023-11-20 01:16:33,937 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.5015, 2.6650, 3.7672, 3.1881], device='cuda:1') 2023-11-20 01:16:42,302 INFO [train_asr.py:1294] (1/4) Epoch 12, validation: loss=0.06246, simple_loss=0.05467, pruned_loss=0.006079, audio_tagging_loss=0.02904, over 4681554.00 frames. 2023-11-20 01:16:42,303 INFO [train_asr.py:1295] (1/4) Maximum memory allocated so far is 26082MB 2023-11-20 01:16:42,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=881720.0, ans=0.0 2023-11-20 01:16:51,495 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.342e+01 8.202e+01 8.941e+01 9.888e+01 1.289e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-20 01:16:55,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=881786.6666666666, ans=0.125 2023-11-20 01:17:18,951 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.97 vs. limit=6.0 2023-11-20 01:17:34,395 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 132300 2023-11-20 01:17:38,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=881986.6666666666, ans=0.0 2023-11-20 01:17:41,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=881986.6666666666, ans=0.125 2023-11-20 01:17:43,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=881986.6666666666, ans=0.125 2023-11-20 01:17:47,171 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 50, loss[loss=0.08882, simple_loss=0.1082, pruned_loss=0.01712, audio_tagging_loss=0.0176, over 14466.00 frames. ], tot_loss[loss=0.09343, simple_loss=0.1061, pruned_loss=0.02124, audio_tagging_loss=0.01912, over 682583.20 frames. ], batch size: 55, lr: 5.89e-03, grad_scale: 32.0 2023-11-20 01:18:00,176 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.48 vs. limit=15.0 2023-11-20 01:18:10,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=882120.0, ans=0.125 2023-11-20 01:18:30,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff2.min_abs, batch_count=882253.3333333334, ans=0.1 2023-11-20 01:18:35,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=882253.3333333334, ans=0.2 2023-11-20 01:18:39,742 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 132350 2023-11-20 01:18:41,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=882320.0, ans=0.2 2023-11-20 01:18:47,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=882320.0, ans=0.125 2023-11-20 01:18:52,566 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 100, loss[loss=0.08573, simple_loss=0.1005, pruned_loss=0.0166, audio_tagging_loss=0.01889, over 15857.00 frames. ], tot_loss[loss=0.09112, simple_loss=0.1032, pruned_loss=0.02086, audio_tagging_loss=0.01865, over 1210114.90 frames. ], batch size: 60, lr: 5.89e-03, grad_scale: 32.0 2023-11-20 01:18:55,148 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.34 vs. limit=15.0 2023-11-20 01:19:01,655 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.733e+01 8.794e+01 9.349e+01 1.020e+02 1.692e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-20 01:19:13,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=882453.3333333334, ans=0.05 2023-11-20 01:19:24,154 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.43 vs. limit=15.0 2023-11-20 01:19:28,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=882520.0, ans=0.2 2023-11-20 01:19:44,807 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 132400 2023-11-20 01:19:57,296 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 150, loss[loss=0.07038, simple_loss=0.08532, pruned_loss=0.01659, audio_tagging_loss=0.01113, over 16385.00 frames. ], tot_loss[loss=0.08903, simple_loss=0.1028, pruned_loss=0.02084, audio_tagging_loss=0.01678, over 1627533.98 frames. ], batch size: 63, lr: 5.89e-03, grad_scale: 32.0 2023-11-20 01:20:25,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=882853.3333333334, ans=0.2 2023-11-20 01:20:27,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=882853.3333333334, ans=0.2 2023-11-20 01:20:28,938 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.24 vs. limit=15.0 2023-11-20 01:20:38,595 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.38 vs. limit=15.0 2023-11-20 01:20:39,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=882920.0, ans=0.1 2023-11-20 01:20:49,235 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 132450 2023-11-20 01:20:53,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=882986.6666666666, ans=0.0 2023-11-20 01:21:00,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=882986.6666666666, ans=0.0 2023-11-20 01:21:02,302 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 200, loss[loss=0.07202, simple_loss=0.09248, pruned_loss=0.01409, audio_tagging_loss=0.01168, over 15110.00 frames. ], tot_loss[loss=0.0866, simple_loss=0.102, pruned_loss=0.02078, audio_tagging_loss=0.01481, over 1942256.44 frames. ], batch size: 57, lr: 5.89e-03, grad_scale: 32.0 2023-11-20 01:21:11,567 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.067e+01 8.201e+01 8.761e+01 9.540e+01 1.328e+02, threshold=1.752e+02, percent-clipped=0.0 2023-11-20 01:21:48,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=883253.3333333334, ans=0.1 2023-11-20 01:21:50,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=883253.3333333334, ans=0.1 2023-11-20 01:21:54,054 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 132500 2023-11-20 01:22:06,772 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 250, loss[loss=0.1083, simple_loss=0.145, pruned_loss=0.03013, audio_tagging_loss=0.00563, over 16246.00 frames. ], tot_loss[loss=0.08641, simple_loss=0.1032, pruned_loss=0.02135, audio_tagging_loss=0.01343, over 2189290.25 frames. ], batch size: 56, lr: 5.89e-03, grad_scale: 32.0 2023-11-20 01:22:20,583 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.17 vs. limit=15.0 2023-11-20 01:22:26,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=883453.3333333334, ans=0.125 2023-11-20 01:22:30,427 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.41 vs. limit=22.5 2023-11-20 01:22:34,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=883520.0, ans=0.2 2023-11-20 01:22:35,635 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=883520.0, ans=0.0 2023-11-20 01:22:36,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=883520.0, ans=0.2 2023-11-20 01:22:50,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=883586.6666666666, ans=0.2 2023-11-20 01:22:54,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=883586.6666666666, ans=0.1 2023-11-20 01:22:58,128 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 132550 2023-11-20 01:23:11,522 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 300, loss[loss=0.09505, simple_loss=0.1196, pruned_loss=0.02445, audio_tagging_loss=0.01083, over 16846.00 frames. ], tot_loss[loss=0.08564, simple_loss=0.1035, pruned_loss=0.02144, audio_tagging_loss=0.01246, over 2379032.60 frames. ], batch size: 62, lr: 5.89e-03, grad_scale: 32.0 2023-11-20 01:23:20,185 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.079e+01 8.204e+01 9.028e+01 9.850e+01 1.789e+02, threshold=1.806e+02, percent-clipped=1.0 2023-11-20 01:23:30,109 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.48 vs. limit=6.0 2023-11-20 01:23:37,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=883853.3333333334, ans=0.125 2023-11-20 01:23:46,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=883853.3333333334, ans=0.125 2023-11-20 01:24:03,120 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 132600 2023-11-20 01:24:04,619 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 01:24:14,030 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 01:24:16,386 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 350, loss[loss=0.06525, simple_loss=0.08091, pruned_loss=0.01494, audio_tagging_loss=0.009855, over 14879.00 frames. ], tot_loss[loss=0.08511, simple_loss=0.1036, pruned_loss=0.02144, audio_tagging_loss=0.01188, over 2524978.56 frames. ], batch size: 54, lr: 5.89e-03, grad_scale: 32.0 2023-11-20 01:24:49,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=884186.6666666666, ans=0.0 2023-11-20 01:24:56,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=884253.3333333334, ans=0.0 2023-11-20 01:24:57,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=884253.3333333334, ans=0.0 2023-11-20 01:24:58,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=884253.3333333334, ans=0.125 2023-11-20 01:25:08,165 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 132650 2023-11-20 01:25:21,018 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 400, loss[loss=0.09926, simple_loss=0.1293, pruned_loss=0.0267, audio_tagging_loss=0.007912, over 15147.00 frames. ], tot_loss[loss=0.08366, simple_loss=0.1023, pruned_loss=0.02103, audio_tagging_loss=0.0115, over 2638764.35 frames. ], batch size: 57, lr: 5.89e-03, grad_scale: 32.0 2023-11-20 01:25:30,273 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.352e+01 8.151e+01 8.736e+01 9.522e+01 1.340e+02, threshold=1.747e+02, percent-clipped=0.0 2023-11-20 01:25:55,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=884520.0, ans=0.025 2023-11-20 01:26:13,042 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 132700 2023-11-20 01:26:26,601 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 450, loss[loss=0.09158, simple_loss=0.1169, pruned_loss=0.02339, audio_tagging_loss=0.009739, over 15154.00 frames. ], tot_loss[loss=0.08431, simple_loss=0.1036, pruned_loss=0.02137, audio_tagging_loss=0.01114, over 2730563.44 frames. ], batch size: 56, lr: 5.89e-03, grad_scale: 32.0 2023-11-20 01:26:32,039 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.21 vs. limit=6.0 2023-11-20 01:26:34,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=884720.0, ans=0.125 2023-11-20 01:26:48,716 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.49 vs. limit=6.0 2023-11-20 01:27:18,849 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 132750 2023-11-20 01:27:23,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=884986.6666666666, ans=0.125 2023-11-20 01:27:27,778 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 01:27:31,591 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 500, loss[loss=0.0616, simple_loss=0.06578, pruned_loss=0.01476, audio_tagging_loss=0.01396, over 14407.00 frames. ], tot_loss[loss=0.08333, simple_loss=0.1026, pruned_loss=0.02112, audio_tagging_loss=0.01091, over 2804591.59 frames. ], batch size: 56, lr: 5.88e-03, grad_scale: 32.0 2023-11-20 01:27:31,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=885053.3333333334, ans=0.125 2023-11-20 01:27:38,196 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.46 vs. limit=15.0 2023-11-20 01:27:40,024 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.508e+01 8.200e+01 8.678e+01 9.366e+01 1.155e+02, threshold=1.736e+02, percent-clipped=0.0 2023-11-20 01:27:57,024 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.71 vs. limit=15.0 2023-11-20 01:27:58,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=885186.6666666666, ans=0.125 2023-11-20 01:28:05,638 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.20 vs. limit=15.0 2023-11-20 01:28:10,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=885253.3333333334, ans=0.1 2023-11-20 01:28:12,462 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.72 vs. limit=15.0 2023-11-20 01:28:23,643 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 132800 2023-11-20 01:28:36,871 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 550, loss[loss=0.09, simple_loss=0.1277, pruned_loss=0.01945, audio_tagging_loss=0.006687, over 14555.00 frames. ], tot_loss[loss=0.08397, simple_loss=0.1036, pruned_loss=0.02142, audio_tagging_loss=0.01073, over 2856580.35 frames. ], batch size: 54, lr: 5.88e-03, grad_scale: 32.0 2023-11-20 01:28:47,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=885386.6666666666, ans=0.125 2023-11-20 01:28:47,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=885386.6666666666, ans=0.1 2023-11-20 01:29:10,028 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.73 vs. limit=15.0 2023-11-20 01:29:13,427 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.04 vs. limit=15.0 2023-11-20 01:29:28,464 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 132850 2023-11-20 01:29:34,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=885653.3333333334, ans=0.5 2023-11-20 01:29:39,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=885653.3333333334, ans=0.1 2023-11-20 01:29:41,406 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 600, loss[loss=0.08196, simple_loss=0.1088, pruned_loss=0.0172, audio_tagging_loss=0.01035, over 15443.00 frames. ], tot_loss[loss=0.08379, simple_loss=0.1034, pruned_loss=0.02136, audio_tagging_loss=0.01072, over 2899402.61 frames. ], batch size: 59, lr: 5.88e-03, grad_scale: 32.0 2023-11-20 01:29:50,461 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.996e+01 8.266e+01 9.047e+01 9.748e+01 1.324e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-20 01:29:58,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=885786.6666666666, ans=0.125 2023-11-20 01:30:01,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=885786.6666666666, ans=0.0 2023-11-20 01:30:32,874 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 132900 2023-11-20 01:30:33,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=885986.6666666666, ans=0.125 2023-11-20 01:30:45,503 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 650, loss[loss=0.06352, simple_loss=0.08182, pruned_loss=0.0135, audio_tagging_loss=0.009108, over 15175.00 frames. ], tot_loss[loss=0.08387, simple_loss=0.1038, pruned_loss=0.02142, audio_tagging_loss=0.01054, over 2932017.26 frames. ], batch size: 58, lr: 5.88e-03, grad_scale: 32.0 2023-11-20 01:30:47,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=886053.3333333334, ans=0.05 2023-11-20 01:30:51,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=886053.3333333334, ans=0.125 2023-11-20 01:30:52,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=886053.3333333334, ans=0.0 2023-11-20 01:30:58,311 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.24 vs. limit=15.0 2023-11-20 01:31:22,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=886186.6666666666, ans=0.1 2023-11-20 01:31:26,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=886253.3333333334, ans=0.1 2023-11-20 01:31:38,583 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 132950 2023-11-20 01:31:43,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=886320.0, ans=0.125 2023-11-20 01:31:46,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=886320.0, ans=0.025 2023-11-20 01:31:51,422 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 700, loss[loss=0.07736, simple_loss=0.09777, pruned_loss=0.01966, audio_tagging_loss=0.008814, over 14188.00 frames. ], tot_loss[loss=0.08307, simple_loss=0.103, pruned_loss=0.02106, audio_tagging_loss=0.01052, over 2957918.62 frames. ], batch size: 53, lr: 5.88e-03, grad_scale: 32.0 2023-11-20 01:32:00,261 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.620e+01 8.108e+01 8.721e+01 9.361e+01 1.160e+02, threshold=1.744e+02, percent-clipped=0.0 2023-11-20 01:32:07,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=886453.3333333334, ans=0.0 2023-11-20 01:32:25,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=886520.0, ans=0.0 2023-11-20 01:32:43,713 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 133000 2023-11-20 01:32:43,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=886653.3333333334, ans=0.125 2023-11-20 01:32:46,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=886653.3333333334, ans=0.2 2023-11-20 01:32:51,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=886653.3333333334, ans=0.125 2023-11-20 01:32:56,797 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 750, loss[loss=0.08836, simple_loss=0.1092, pruned_loss=0.02299, audio_tagging_loss=0.01079, over 16502.00 frames. ], tot_loss[loss=0.0838, simple_loss=0.1037, pruned_loss=0.02148, audio_tagging_loss=0.01044, over 2975138.20 frames. ], batch size: 61, lr: 5.88e-03, grad_scale: 32.0 2023-11-20 01:32:59,937 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.56 vs. limit=10.0 2023-11-20 01:33:00,814 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 01:33:01,225 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.48 vs. limit=12.0 2023-11-20 01:33:02,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=886720.0, ans=0.1 2023-11-20 01:33:03,905 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.15 vs. limit=22.5 2023-11-20 01:33:09,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=886786.6666666666, ans=0.125 2023-11-20 01:33:19,642 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.05 vs. limit=15.0 2023-11-20 01:33:36,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=886920.0, ans=0.0 2023-11-20 01:33:40,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=886920.0, ans=0.07 2023-11-20 01:33:48,611 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 133050 2023-11-20 01:33:56,330 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.83 vs. limit=15.0 2023-11-20 01:34:00,836 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 800, loss[loss=0.08507, simple_loss=0.1033, pruned_loss=0.02142, audio_tagging_loss=0.01201, over 15545.00 frames. ], tot_loss[loss=0.08393, simple_loss=0.104, pruned_loss=0.02141, audio_tagging_loss=0.01051, over 2994261.74 frames. ], batch size: 57, lr: 5.88e-03, grad_scale: 32.0 2023-11-20 01:34:05,470 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 01:34:10,149 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.686e+01 8.461e+01 9.039e+01 1.027e+02 1.682e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-20 01:34:11,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=887053.3333333334, ans=0.125 2023-11-20 01:34:15,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=887120.0, ans=0.125 2023-11-20 01:34:52,120 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 133100 2023-11-20 01:35:00,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=887320.0, ans=0.125 2023-11-20 01:35:05,316 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 850, loss[loss=0.09883, simple_loss=0.1215, pruned_loss=0.02634, audio_tagging_loss=0.01175, over 14623.00 frames. ], tot_loss[loss=0.08363, simple_loss=0.1035, pruned_loss=0.02125, audio_tagging_loss=0.01062, over 3000156.80 frames. ], batch size: 53, lr: 5.88e-03, grad_scale: 32.0 2023-11-20 01:35:08,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=887386.6666666666, ans=0.125 2023-11-20 01:35:13,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=887386.6666666666, ans=0.125 2023-11-20 01:35:34,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten.whitening_limit, batch_count=887520.0, ans=15.0 2023-11-20 01:35:43,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=887586.6666666666, ans=0.2 2023-11-20 01:35:45,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=887586.6666666666, ans=0.1 2023-11-20 01:35:48,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=887586.6666666666, ans=0.125 2023-11-20 01:35:52,684 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.18 vs. limit=15.0 2023-11-20 01:35:57,723 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 133150 2023-11-20 01:36:01,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=887653.3333333334, ans=0.125 2023-11-20 01:36:10,436 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 900, loss[loss=0.09932, simple_loss=0.1263, pruned_loss=0.02621, audio_tagging_loss=0.009931, over 14800.00 frames. ], tot_loss[loss=0.08321, simple_loss=0.1026, pruned_loss=0.02116, audio_tagging_loss=0.01074, over 3007629.20 frames. ], batch size: 55, lr: 5.88e-03, grad_scale: 32.0 2023-11-20 01:36:11,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=887720.0, ans=0.0 2023-11-20 01:36:19,055 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.555e+01 8.224e+01 8.986e+01 9.963e+01 2.180e+02, threshold=1.797e+02, percent-clipped=1.0 2023-11-20 01:36:26,502 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 01:36:45,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=887853.3333333334, ans=0.125 2023-11-20 01:37:01,709 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 133200 2023-11-20 01:37:01,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=887986.6666666666, ans=0.0 2023-11-20 01:37:07,478 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=11.21 vs. limit=12.0 2023-11-20 01:37:14,470 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 950, loss[loss=0.07093, simple_loss=0.08217, pruned_loss=0.01479, audio_tagging_loss=0.01505, over 14963.00 frames. ], tot_loss[loss=0.0824, simple_loss=0.1018, pruned_loss=0.02084, audio_tagging_loss=0.01066, over 3019679.69 frames. ], batch size: 56, lr: 5.87e-03, grad_scale: 32.0 2023-11-20 01:37:14,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=888053.3333333334, ans=0.2 2023-11-20 01:37:21,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=888053.3333333334, ans=0.0 2023-11-20 01:37:26,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=888120.0, ans=0.2 2023-11-20 01:37:42,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=888186.6666666666, ans=0.2 2023-11-20 01:37:50,296 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.37 vs. limit=15.0 2023-11-20 01:37:53,270 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.58 vs. limit=15.0 2023-11-20 01:37:53,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=888253.3333333334, ans=0.125 2023-11-20 01:38:00,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=888253.3333333334, ans=0.0 2023-11-20 01:38:05,978 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 133250 2023-11-20 01:38:19,467 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 1000, loss[loss=0.1127, simple_loss=0.1397, pruned_loss=0.03548, audio_tagging_loss=0.007332, over 14788.00 frames. ], tot_loss[loss=0.08208, simple_loss=0.1015, pruned_loss=0.02085, audio_tagging_loss=0.01047, over 3028921.24 frames. ], batch size: 53, lr: 5.87e-03, grad_scale: 32.0 2023-11-20 01:38:28,771 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.591e+01 8.070e+01 8.953e+01 9.480e+01 1.441e+02, threshold=1.791e+02, percent-clipped=0.0 2023-11-20 01:38:32,846 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=888453.3333333334, ans=0.0 2023-11-20 01:38:33,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=888453.3333333334, ans=0.125 2023-11-20 01:38:41,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=888453.3333333334, ans=0.0 2023-11-20 01:38:44,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=888520.0, ans=0.0 2023-11-20 01:38:46,605 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 01:38:49,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=888520.0, ans=0.2 2023-11-20 01:39:11,500 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 133300 2023-11-20 01:39:24,742 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 1050, loss[loss=0.0792, simple_loss=0.09622, pruned_loss=0.0205, audio_tagging_loss=0.01059, over 14755.00 frames. ], tot_loss[loss=0.08114, simple_loss=0.1004, pruned_loss=0.02055, audio_tagging_loss=0.01039, over 3028030.27 frames. ], batch size: 55, lr: 5.87e-03, grad_scale: 32.0 2023-11-20 01:40:01,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=888853.3333333334, ans=0.125 2023-11-20 01:40:16,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=888986.6666666666, ans=0.2 2023-11-20 01:40:17,249 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 133350 2023-11-20 01:40:22,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=888986.6666666666, ans=0.125 2023-11-20 01:40:29,530 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 1100, loss[loss=0.1037, simple_loss=0.1247, pruned_loss=0.03167, audio_tagging_loss=0.009646, over 15627.00 frames. ], tot_loss[loss=0.08092, simple_loss=0.1002, pruned_loss=0.02061, audio_tagging_loss=0.01021, over 3033665.42 frames. ], batch size: 56, lr: 5.87e-03, grad_scale: 32.0 2023-11-20 01:40:32,066 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 01:40:38,587 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.984e+01 8.052e+01 8.709e+01 9.479e+01 1.259e+02, threshold=1.742e+02, percent-clipped=0.0 2023-11-20 01:40:41,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=889120.0, ans=0.125 2023-11-20 01:41:07,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=889253.3333333334, ans=0.0 2023-11-20 01:41:11,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=889253.3333333334, ans=0.05 2023-11-20 01:41:16,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=889253.3333333334, ans=0.2 2023-11-20 01:41:21,047 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 133400 2023-11-20 01:41:21,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=889320.0, ans=0.0 2023-11-20 01:41:22,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=889320.0, ans=0.125 2023-11-20 01:41:22,635 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=889320.0, ans=0.125 2023-11-20 01:41:34,534 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 1150, loss[loss=0.0741, simple_loss=0.0828, pruned_loss=0.02136, audio_tagging_loss=0.01134, over 15011.00 frames. ], tot_loss[loss=0.08061, simple_loss=0.1, pruned_loss=0.02043, audio_tagging_loss=0.01017, over 3031014.57 frames. ], batch size: 57, lr: 5.87e-03, grad_scale: 32.0 2023-11-20 01:41:49,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=889453.3333333334, ans=0.125 2023-11-20 01:42:07,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=889520.0, ans=0.0 2023-11-20 01:42:08,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=889520.0, ans=0.0 2023-11-20 01:42:11,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=889586.6666666666, ans=0.0 2023-11-20 01:42:25,906 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 133450 2023-11-20 01:42:39,269 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 1200, loss[loss=0.09546, simple_loss=0.1195, pruned_loss=0.02713, audio_tagging_loss=0.008597, over 15845.00 frames. ], tot_loss[loss=0.08117, simple_loss=0.1006, pruned_loss=0.0207, audio_tagging_loss=0.01019, over 3033406.45 frames. ], batch size: 58, lr: 5.87e-03, grad_scale: 32.0 2023-11-20 01:42:48,423 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.434e+01 8.251e+01 9.001e+01 9.736e+01 1.493e+02, threshold=1.800e+02, percent-clipped=0.0 2023-11-20 01:42:52,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=889786.6666666666, ans=0.0 2023-11-20 01:42:55,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=889786.6666666666, ans=0.1 2023-11-20 01:43:18,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=889920.0, ans=0.1 2023-11-20 01:43:30,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=889986.6666666666, ans=0.1 2023-11-20 01:43:31,709 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 133500 2023-11-20 01:43:42,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=890053.3333333334, ans=0.125 2023-11-20 01:43:43,777 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 1250, loss[loss=0.06944, simple_loss=0.09157, pruned_loss=0.01469, audio_tagging_loss=0.008969, over 15885.00 frames. ], tot_loss[loss=0.08129, simple_loss=0.101, pruned_loss=0.02066, audio_tagging_loss=0.01011, over 3039042.66 frames. ], batch size: 63, lr: 5.87e-03, grad_scale: 32.0 2023-11-20 01:44:16,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=890186.6666666666, ans=0.125 2023-11-20 01:44:35,361 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 133550 2023-11-20 01:44:38,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=890320.0, ans=0.125 2023-11-20 01:44:43,485 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 01:44:48,074 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 1300, loss[loss=0.07399, simple_loss=0.09052, pruned_loss=0.0186, audio_tagging_loss=0.01013, over 14827.00 frames. ], tot_loss[loss=0.08082, simple_loss=0.1005, pruned_loss=0.02045, audio_tagging_loss=0.01011, over 3036784.90 frames. ], batch size: 59, lr: 5.87e-03, grad_scale: 64.0 2023-11-20 01:44:57,231 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.948e+01 8.375e+01 9.143e+01 9.896e+01 1.258e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-20 01:45:10,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=890453.3333333334, ans=0.1 2023-11-20 01:45:16,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=890520.0, ans=0.125 2023-11-20 01:45:39,247 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 133600 2023-11-20 01:45:52,930 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 1350, loss[loss=0.07967, simple_loss=0.1021, pruned_loss=0.01924, audio_tagging_loss=0.009354, over 15090.00 frames. ], tot_loss[loss=0.08108, simple_loss=0.1007, pruned_loss=0.02056, audio_tagging_loss=0.01016, over 3035789.66 frames. ], batch size: 56, lr: 5.87e-03, grad_scale: 32.0 2023-11-20 01:46:31,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=890920.0, ans=0.0 2023-11-20 01:46:33,710 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=890920.0, ans=0.1 2023-11-20 01:46:34,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=890920.0, ans=0.125 2023-11-20 01:46:38,397 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.02 vs. limit=15.0 2023-11-20 01:46:41,263 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 01:46:41,986 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.65 vs. limit=15.0 2023-11-20 01:46:45,177 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 133650 2023-11-20 01:46:58,770 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 1400, loss[loss=0.08536, simple_loss=0.09226, pruned_loss=0.02705, audio_tagging_loss=0.01219, over 14927.00 frames. ], tot_loss[loss=0.08041, simple_loss=0.09963, pruned_loss=0.02032, audio_tagging_loss=0.01028, over 3033865.77 frames. ], batch size: 56, lr: 5.87e-03, grad_scale: 32.0 2023-11-20 01:47:08,565 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.444e+01 7.927e+01 8.547e+01 9.494e+01 1.207e+02, threshold=1.709e+02, percent-clipped=0.0 2023-11-20 01:47:17,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=891120.0, ans=0.125 2023-11-20 01:47:26,637 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.06 vs. limit=15.0 2023-11-20 01:47:49,862 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 133700 2023-11-20 01:47:57,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=891320.0, ans=0.09899494936611666 2023-11-20 01:48:02,917 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 1450, loss[loss=0.08661, simple_loss=0.111, pruned_loss=0.02073, audio_tagging_loss=0.0104, over 15600.00 frames. ], tot_loss[loss=0.08169, simple_loss=0.1011, pruned_loss=0.02082, audio_tagging_loss=0.01034, over 3039773.15 frames. ], batch size: 61, lr: 5.86e-03, grad_scale: 16.0 2023-11-20 01:48:33,854 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.20 vs. limit=15.0 2023-11-20 01:48:54,315 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 133750 2023-11-20 01:48:56,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=891653.3333333334, ans=0.1 2023-11-20 01:49:01,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=891653.3333333334, ans=0.0 2023-11-20 01:49:04,185 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.45 vs. limit=15.0 2023-11-20 01:49:04,922 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff3.min_abs, batch_count=891653.3333333334, ans=0.2 2023-11-20 01:49:07,027 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 1500, loss[loss=0.132, simple_loss=0.162, pruned_loss=0.04172, audio_tagging_loss=0.009307, over 15569.00 frames. ], tot_loss[loss=0.08145, simple_loss=0.1006, pruned_loss=0.02069, audio_tagging_loss=0.01048, over 3042235.29 frames. ], batch size: 54, lr: 5.86e-03, grad_scale: 16.0 2023-11-20 01:49:17,138 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.90 vs. limit=15.0 2023-11-20 01:49:18,846 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 5.912e+01 7.823e+01 8.560e+01 9.381e+01 1.533e+02, threshold=1.712e+02, percent-clipped=0.0 2023-11-20 01:49:31,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=891853.3333333334, ans=0.0 2023-11-20 01:49:44,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=891920.0, ans=0.0 2023-11-20 01:49:48,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=891920.0, ans=0.125 2023-11-20 01:49:59,152 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 133800 2023-11-20 01:50:10,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=891986.6666666666, ans=0.0 2023-11-20 01:50:12,765 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 1550, loss[loss=0.07724, simple_loss=0.09224, pruned_loss=0.01844, audio_tagging_loss=0.01268, over 14804.00 frames. ], tot_loss[loss=0.08138, simple_loss=0.1005, pruned_loss=0.02054, audio_tagging_loss=0.01057, over 3046489.39 frames. ], batch size: 56, lr: 5.86e-03, grad_scale: 16.0 2023-11-20 01:50:24,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=892120.0, ans=0.1 2023-11-20 01:50:26,856 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.66 vs. limit=15.0 2023-11-20 01:50:57,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=892253.3333333334, ans=0.0 2023-11-20 01:50:59,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=892253.3333333334, ans=0.07 2023-11-20 01:51:04,175 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 133850 2023-11-20 01:51:16,380 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 1600, loss[loss=0.08519, simple_loss=0.1012, pruned_loss=0.02352, audio_tagging_loss=0.01106, over 14899.00 frames. ], tot_loss[loss=0.08182, simple_loss=0.1009, pruned_loss=0.02077, audio_tagging_loss=0.0106, over 3039881.48 frames. ], batch size: 57, lr: 5.86e-03, grad_scale: 32.0 2023-11-20 01:51:18,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=892386.6666666666, ans=0.1 2023-11-20 01:51:18,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=892386.6666666666, ans=0.0 2023-11-20 01:51:28,066 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.791e+01 8.075e+01 8.775e+01 9.622e+01 1.213e+02, threshold=1.755e+02, percent-clipped=0.0 2023-11-20 01:51:33,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=892453.3333333334, ans=0.025 2023-11-20 01:51:44,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=892520.0, ans=0.125 2023-11-20 01:52:01,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=892586.6666666666, ans=0.125 2023-11-20 01:52:09,104 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 133900 2023-11-20 01:52:15,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=892653.3333333334, ans=0.125 2023-11-20 01:52:22,010 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 1650, loss[loss=0.09141, simple_loss=0.1167, pruned_loss=0.0236, audio_tagging_loss=0.009446, over 15692.00 frames. ], tot_loss[loss=0.08227, simple_loss=0.1017, pruned_loss=0.02087, audio_tagging_loss=0.01054, over 3043048.99 frames. ], batch size: 57, lr: 5.86e-03, grad_scale: 32.0 2023-11-20 01:52:36,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=892786.6666666666, ans=0.1 2023-11-20 01:52:38,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=892786.6666666666, ans=0.125 2023-11-20 01:52:44,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=892786.6666666666, ans=0.07 2023-11-20 01:53:13,623 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 133950 2023-11-20 01:53:17,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=892986.6666666666, ans=0.1 2023-11-20 01:53:17,660 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=892986.6666666666, ans=0.1 2023-11-20 01:53:26,565 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 1700, loss[loss=0.07575, simple_loss=0.09839, pruned_loss=0.0163, audio_tagging_loss=0.01026, over 15148.00 frames. ], tot_loss[loss=0.08197, simple_loss=0.1014, pruned_loss=0.02073, audio_tagging_loss=0.01055, over 3048045.35 frames. ], batch size: 54, lr: 5.86e-03, grad_scale: 32.0 2023-11-20 01:53:38,254 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.212e+01 8.118e+01 8.650e+01 9.250e+01 1.178e+02, threshold=1.730e+02, percent-clipped=0.0 2023-11-20 01:53:42,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=893120.0, ans=0.125 2023-11-20 01:54:01,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=893186.6666666666, ans=0.125 2023-11-20 01:54:02,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=893186.6666666666, ans=0.125 2023-11-20 01:54:05,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=893253.3333333334, ans=0.125 2023-11-20 01:54:10,668 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 01:54:18,417 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 134000 2023-11-20 01:54:31,560 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 1750, loss[loss=0.07712, simple_loss=0.09613, pruned_loss=0.01871, audio_tagging_loss=0.01035, over 15670.00 frames. ], tot_loss[loss=0.08169, simple_loss=0.1013, pruned_loss=0.02051, audio_tagging_loss=0.01052, over 3050908.19 frames. ], batch size: 56, lr: 5.86e-03, grad_scale: 32.0 2023-11-20 01:54:31,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=893386.6666666666, ans=0.2 2023-11-20 01:54:41,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=893386.6666666666, ans=0.0 2023-11-20 01:54:52,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=893453.3333333334, ans=0.1 2023-11-20 01:55:23,912 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 134050 2023-11-20 01:55:35,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=893720.0, ans=0.1 2023-11-20 01:55:36,241 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 1800, loss[loss=0.05799, simple_loss=0.07202, pruned_loss=0.01126, audio_tagging_loss=0.01073, over 15606.00 frames. ], tot_loss[loss=0.08198, simple_loss=0.1019, pruned_loss=0.02061, audio_tagging_loss=0.01043, over 3052282.15 frames. ], batch size: 59, lr: 5.86e-03, grad_scale: 32.0 2023-11-20 01:55:43,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=893720.0, ans=0.0 2023-11-20 01:55:46,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=893720.0, ans=0.0 2023-11-20 01:55:47,804 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.798e+01 8.080e+01 8.622e+01 9.512e+01 1.674e+02, threshold=1.724e+02, percent-clipped=0.0 2023-11-20 01:56:13,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=893853.3333333334, ans=0.2 2023-11-20 01:56:28,513 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 134100 2023-11-20 01:56:34,140 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.24 vs. limit=15.0 2023-11-20 01:56:36,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=893986.6666666666, ans=10.0 2023-11-20 01:56:41,479 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 1850, loss[loss=0.06703, simple_loss=0.07803, pruned_loss=0.01666, audio_tagging_loss=0.01135, over 14545.00 frames. ], tot_loss[loss=0.08171, simple_loss=0.1019, pruned_loss=0.02055, audio_tagging_loss=0.01021, over 3057779.73 frames. ], batch size: 55, lr: 5.86e-03, grad_scale: 32.0 2023-11-20 01:56:54,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=894120.0, ans=0.2 2023-11-20 01:57:07,696 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 01:57:07,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=894186.6666666666, ans=0.125 2023-11-20 01:57:22,157 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.01 vs. limit=15.0 2023-11-20 01:57:32,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=894320.0, ans=0.2 2023-11-20 01:57:33,249 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 134150 2023-11-20 01:57:40,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=894320.0, ans=0.125 2023-11-20 01:57:45,556 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 1900, loss[loss=0.1016, simple_loss=0.1243, pruned_loss=0.02746, audio_tagging_loss=0.012, over 14581.00 frames. ], tot_loss[loss=0.08182, simple_loss=0.102, pruned_loss=0.0206, audio_tagging_loss=0.01021, over 3051109.89 frames. ], batch size: 55, lr: 5.85e-03, grad_scale: 32.0 2023-11-20 01:57:57,407 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.17 vs. limit=15.0 2023-11-20 01:57:59,156 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.930e+01 8.176e+01 8.935e+01 9.422e+01 1.185e+02, threshold=1.787e+02, percent-clipped=0.0 2023-11-20 01:58:03,506 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.39 vs. limit=15.0 2023-11-20 01:58:19,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=894520.0, ans=0.0 2023-11-20 01:58:35,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=894586.6666666666, ans=0.07 2023-11-20 01:58:38,395 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 134200 2023-11-20 01:58:43,177 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.03 vs. limit=22.5 2023-11-20 01:58:48,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=894653.3333333334, ans=0.0 2023-11-20 01:58:51,054 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 1950, loss[loss=0.07949, simple_loss=0.1001, pruned_loss=0.01795, audio_tagging_loss=0.01148, over 15165.00 frames. ], tot_loss[loss=0.08211, simple_loss=0.1022, pruned_loss=0.02079, audio_tagging_loss=0.01023, over 3055285.44 frames. ], batch size: 56, lr: 5.85e-03, grad_scale: 16.0 2023-11-20 01:59:14,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=894786.6666666666, ans=0.0 2023-11-20 01:59:16,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=894853.3333333334, ans=10.0 2023-11-20 01:59:19,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=894853.3333333334, ans=0.2 2023-11-20 01:59:38,968 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.36 vs. limit=15.0 2023-11-20 01:59:42,021 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 134250 2023-11-20 01:59:54,806 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 2000, loss[loss=0.1047, simple_loss=0.13, pruned_loss=0.03, audio_tagging_loss=0.009681, over 14478.00 frames. ], tot_loss[loss=0.08207, simple_loss=0.1021, pruned_loss=0.02078, audio_tagging_loss=0.01024, over 3055793.87 frames. ], batch size: 53, lr: 5.85e-03, grad_scale: 32.0 2023-11-20 02:00:02,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=895053.3333333334, ans=0.0 2023-11-20 02:00:08,121 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.403e+01 7.896e+01 8.593e+01 9.449e+01 1.399e+02, threshold=1.719e+02, percent-clipped=0.0 2023-11-20 02:00:11,512 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.71 vs. limit=15.0 2023-11-20 02:00:21,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=895186.6666666666, ans=0.125 2023-11-20 02:00:23,482 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.83 vs. limit=15.0 2023-11-20 02:00:47,370 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 134300 2023-11-20 02:00:48,100 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.56 vs. limit=22.5 2023-11-20 02:00:59,944 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 2050, loss[loss=0.08392, simple_loss=0.1062, pruned_loss=0.02208, audio_tagging_loss=0.008721, over 14146.00 frames. ], tot_loss[loss=0.08218, simple_loss=0.1025, pruned_loss=0.02083, audio_tagging_loss=0.0101, over 3055227.85 frames. ], batch size: 55, lr: 5.85e-03, grad_scale: 32.0 2023-11-20 02:01:17,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=895453.3333333334, ans=0.0 2023-11-20 02:01:19,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=895453.3333333334, ans=0.125 2023-11-20 02:01:51,612 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 134350 2023-11-20 02:01:51,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=895653.3333333334, ans=0.125 2023-11-20 02:02:03,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=895653.3333333334, ans=0.0 2023-11-20 02:02:05,144 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 2100, loss[loss=0.06856, simple_loss=0.0779, pruned_loss=0.01581, audio_tagging_loss=0.01381, over 14372.00 frames. ], tot_loss[loss=0.08243, simple_loss=0.1027, pruned_loss=0.02095, audio_tagging_loss=0.01012, over 3050098.06 frames. ], batch size: 54, lr: 5.85e-03, grad_scale: 16.0 2023-11-20 02:02:10,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=895720.0, ans=0.0 2023-11-20 02:02:12,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=895720.0, ans=0.125 2023-11-20 02:02:19,149 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.798e+01 8.312e+01 8.947e+01 9.682e+01 1.152e+02, threshold=1.789e+02, percent-clipped=0.0 2023-11-20 02:02:29,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=895853.3333333334, ans=0.1 2023-11-20 02:02:40,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=895853.3333333334, ans=0.0 2023-11-20 02:02:47,809 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.35 vs. limit=12.0 2023-11-20 02:02:56,504 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 134400 2023-11-20 02:03:09,549 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 2150, loss[loss=0.07971, simple_loss=0.08395, pruned_loss=0.02533, audio_tagging_loss=0.0124, over 15432.00 frames. ], tot_loss[loss=0.08235, simple_loss=0.1024, pruned_loss=0.02101, audio_tagging_loss=0.01014, over 3052732.20 frames. ], batch size: 58, lr: 5.85e-03, grad_scale: 16.0 2023-11-20 02:03:34,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=896186.6666666666, ans=0.025 2023-11-20 02:03:45,555 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 02:03:46,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=896186.6666666666, ans=0.2 2023-11-20 02:03:48,928 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 02:04:01,683 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 134450 2023-11-20 02:04:01,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=896320.0, ans=0.125 2023-11-20 02:04:14,507 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 2200, loss[loss=0.09097, simple_loss=0.1055, pruned_loss=0.02667, audio_tagging_loss=0.01155, over 15537.00 frames. ], tot_loss[loss=0.08291, simple_loss=0.1034, pruned_loss=0.02121, audio_tagging_loss=0.01002, over 3053957.92 frames. ], batch size: 58, lr: 5.85e-03, grad_scale: 16.0 2023-11-20 02:04:23,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=896386.6666666666, ans=0.0 2023-11-20 02:04:28,768 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.245e+01 8.199e+01 8.937e+01 9.521e+01 1.153e+02, threshold=1.787e+02, percent-clipped=0.0 2023-11-20 02:04:29,431 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.09 vs. limit=12.0 2023-11-20 02:04:42,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=896520.0, ans=0.125 2023-11-20 02:05:06,388 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 134500 2023-11-20 02:05:19,201 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 2250, loss[loss=0.07409, simple_loss=0.08918, pruned_loss=0.0165, audio_tagging_loss=0.013, over 14357.00 frames. ], tot_loss[loss=0.08275, simple_loss=0.1029, pruned_loss=0.02114, audio_tagging_loss=0.01016, over 3048083.18 frames. ], batch size: 55, lr: 5.85e-03, grad_scale: 16.0 2023-11-20 02:05:20,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=896720.0, ans=0.0 2023-11-20 02:05:20,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=896720.0, ans=0.0 2023-11-20 02:05:24,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=896720.0, ans=0.125 2023-11-20 02:05:37,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=896786.6666666666, ans=0.0 2023-11-20 02:05:51,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=896853.3333333334, ans=0.0 2023-11-20 02:05:55,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=896853.3333333334, ans=0.2 2023-11-20 02:06:10,885 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 134550 2023-11-20 02:06:19,933 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.54 vs. limit=15.0 2023-11-20 02:06:20,008 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.47 vs. limit=12.0 2023-11-20 02:06:24,381 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 2300, loss[loss=0.1017, simple_loss=0.1135, pruned_loss=0.03413, audio_tagging_loss=0.01083, over 14682.00 frames. ], tot_loss[loss=0.08303, simple_loss=0.1032, pruned_loss=0.02132, audio_tagging_loss=0.01013, over 3041907.73 frames. ], batch size: 54, lr: 5.85e-03, grad_scale: 16.0 2023-11-20 02:06:38,523 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.838e+01 8.171e+01 8.997e+01 9.810e+01 1.855e+02, threshold=1.799e+02, percent-clipped=1.0 2023-11-20 02:06:43,093 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.80 vs. limit=12.0 2023-11-20 02:06:47,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=897120.0, ans=0.0 2023-11-20 02:07:15,452 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 134600 2023-11-20 02:07:20,749 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 02:07:27,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=897386.6666666666, ans=0.04949747468305833 2023-11-20 02:07:28,732 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 2350, loss[loss=0.0811, simple_loss=0.09674, pruned_loss=0.02245, audio_tagging_loss=0.01028, over 15689.00 frames. ], tot_loss[loss=0.08318, simple_loss=0.1034, pruned_loss=0.02132, audio_tagging_loss=0.01018, over 3042320.34 frames. ], batch size: 59, lr: 5.84e-03, grad_scale: 16.0 2023-11-20 02:08:07,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=897586.6666666666, ans=10.0 2023-11-20 02:08:15,117 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.38 vs. limit=15.0 2023-11-20 02:08:20,684 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 134650 2023-11-20 02:08:20,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=897653.3333333334, ans=0.125 2023-11-20 02:08:30,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=897653.3333333334, ans=0.1 2023-11-20 02:08:33,525 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 2400, loss[loss=0.08065, simple_loss=0.1031, pruned_loss=0.02107, audio_tagging_loss=0.008031, over 14501.00 frames. ], tot_loss[loss=0.0832, simple_loss=0.1031, pruned_loss=0.0213, audio_tagging_loss=0.01038, over 3040657.87 frames. ], batch size: 54, lr: 5.84e-03, grad_scale: 32.0 2023-11-20 02:08:45,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=897786.6666666666, ans=0.1 2023-11-20 02:08:47,717 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.416e+01 8.072e+01 8.719e+01 9.774e+01 1.313e+02, threshold=1.744e+02, percent-clipped=0.0 2023-11-20 02:08:48,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=897786.6666666666, ans=0.0 2023-11-20 02:08:59,480 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 02:09:02,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=897853.3333333334, ans=0.0 2023-11-20 02:09:24,701 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 134700 2023-11-20 02:09:37,367 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 2450, loss[loss=0.07319, simple_loss=0.09302, pruned_loss=0.01512, audio_tagging_loss=0.01156, over 14569.00 frames. ], tot_loss[loss=0.08239, simple_loss=0.1017, pruned_loss=0.02095, audio_tagging_loss=0.0106, over 3045782.42 frames. ], batch size: 56, lr: 5.84e-03, grad_scale: 16.0 2023-11-20 02:09:44,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=898053.3333333334, ans=0.0 2023-11-20 02:10:11,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=898186.6666666666, ans=0.125 2023-11-20 02:10:14,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=898186.6666666666, ans=0.125 2023-11-20 02:10:22,729 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.58 vs. limit=15.0 2023-11-20 02:10:27,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=898253.3333333334, ans=0.125 2023-11-20 02:10:29,979 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 134750 2023-11-20 02:10:37,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=898320.0, ans=0.125 2023-11-20 02:10:42,741 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 2500, loss[loss=0.1031, simple_loss=0.1245, pruned_loss=0.02909, audio_tagging_loss=0.01173, over 15937.00 frames. ], tot_loss[loss=0.08287, simple_loss=0.1024, pruned_loss=0.02105, audio_tagging_loss=0.0106, over 3048297.49 frames. ], batch size: 59, lr: 5.84e-03, grad_scale: 16.0 2023-11-20 02:10:47,169 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.21 vs. limit=10.0 2023-11-20 02:10:57,313 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.817e+01 7.993e+01 8.796e+01 9.570e+01 1.207e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-20 02:11:07,240 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.21 vs. limit=22.5 2023-11-20 02:11:12,697 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.86 vs. limit=10.0 2023-11-20 02:11:34,140 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 134800 2023-11-20 02:11:46,887 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 2550, loss[loss=0.0677, simple_loss=0.09563, pruned_loss=0.01289, audio_tagging_loss=0.006997, over 15665.00 frames. ], tot_loss[loss=0.08178, simple_loss=0.1012, pruned_loss=0.02076, audio_tagging_loss=0.01044, over 3050516.28 frames. ], batch size: 56, lr: 5.84e-03, grad_scale: 16.0 2023-11-20 02:12:05,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=898786.6666666666, ans=0.1 2023-11-20 02:12:22,035 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 02:12:39,668 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 134850 2023-11-20 02:12:52,578 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 2600, loss[loss=0.06851, simple_loss=0.09398, pruned_loss=0.01249, audio_tagging_loss=0.009029, over 15536.00 frames. ], tot_loss[loss=0.08138, simple_loss=0.1008, pruned_loss=0.02067, audio_tagging_loss=0.01028, over 3051430.46 frames. ], batch size: 57, lr: 5.84e-03, grad_scale: 16.0 2023-11-20 02:13:03,518 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.40 vs. limit=15.0 2023-11-20 02:13:08,515 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.034e+01 8.150e+01 8.781e+01 9.502e+01 1.826e+02, threshold=1.756e+02, percent-clipped=1.0 2023-11-20 02:13:31,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=899253.3333333334, ans=0.125 2023-11-20 02:13:36,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=899253.3333333334, ans=0.0 2023-11-20 02:13:44,905 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 134900 2023-11-20 02:13:58,371 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 2650, loss[loss=0.08492, simple_loss=0.1054, pruned_loss=0.02215, audio_tagging_loss=0.01007, over 14506.00 frames. ], tot_loss[loss=0.08213, simple_loss=0.102, pruned_loss=0.02096, audio_tagging_loss=0.01016, over 3048437.38 frames. ], batch size: 56, lr: 5.84e-03, grad_scale: 16.0 2023-11-20 02:14:01,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=899386.6666666666, ans=0.0 2023-11-20 02:14:08,346 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=899386.6666666666, ans=0.125 2023-11-20 02:14:21,682 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.81 vs. limit=15.0 2023-11-20 02:14:26,705 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.25 vs. limit=15.0 2023-11-20 02:14:49,801 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 134950 2023-11-20 02:15:02,148 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 2700, loss[loss=0.09314, simple_loss=0.1133, pruned_loss=0.02572, audio_tagging_loss=0.0108, over 15211.00 frames. ], tot_loss[loss=0.08228, simple_loss=0.1019, pruned_loss=0.02113, audio_tagging_loss=0.01019, over 3046205.04 frames. ], batch size: 59, lr: 5.84e-03, grad_scale: 16.0 2023-11-20 02:15:08,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=899720.0, ans=0.0 2023-11-20 02:15:18,161 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.493e+01 8.480e+01 9.136e+01 9.839e+01 1.399e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-20 02:15:29,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=899853.3333333334, ans=0.2 2023-11-20 02:15:31,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=899853.3333333334, ans=0.0 2023-11-20 02:15:41,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=899920.0, ans=0.0 2023-11-20 02:15:41,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=899920.0, ans=0.125 2023-11-20 02:15:54,283 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 135000 2023-11-20 02:16:07,469 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 2750, loss[loss=0.07557, simple_loss=0.1012, pruned_loss=0.01925, audio_tagging_loss=0.005711, over 14547.00 frames. ], tot_loss[loss=0.0822, simple_loss=0.1021, pruned_loss=0.02101, audio_tagging_loss=0.01016, over 3044775.27 frames. ], batch size: 54, lr: 5.84e-03, grad_scale: 16.0 2023-11-20 02:16:12,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=900053.3333333334, ans=0.0 2023-11-20 02:16:15,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=900053.3333333334, ans=0.07 2023-11-20 02:16:59,877 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 135050 2023-11-20 02:17:03,543 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 02:17:12,175 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 2800, loss[loss=0.08338, simple_loss=0.1134, pruned_loss=0.01958, audio_tagging_loss=0.007116, over 15814.00 frames. ], tot_loss[loss=0.08235, simple_loss=0.1026, pruned_loss=0.021, audio_tagging_loss=0.01005, over 3045747.17 frames. ], batch size: 58, lr: 5.83e-03, grad_scale: 32.0 2023-11-20 02:17:28,026 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.764e+01 8.003e+01 8.693e+01 9.427e+01 1.214e+02, threshold=1.739e+02, percent-clipped=0.0 2023-11-20 02:17:31,423 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=900453.3333333334, ans=0.05 2023-11-20 02:17:33,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=900453.3333333334, ans=0.0 2023-11-20 02:18:04,355 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 135100 2023-11-20 02:18:15,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=900653.3333333334, ans=0.025 2023-11-20 02:18:17,513 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 2850, loss[loss=0.08976, simple_loss=0.1119, pruned_loss=0.02545, audio_tagging_loss=0.00839, over 14910.00 frames. ], tot_loss[loss=0.08248, simple_loss=0.1027, pruned_loss=0.02111, audio_tagging_loss=0.009995, over 3039189.18 frames. ], batch size: 57, lr: 5.83e-03, grad_scale: 32.0 2023-11-20 02:18:25,991 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.09 vs. limit=15.0 2023-11-20 02:18:32,166 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.97 vs. limit=15.0 2023-11-20 02:18:50,481 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.44 vs. limit=15.0 2023-11-20 02:19:03,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=900920.0, ans=0.125 2023-11-20 02:19:09,309 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 135150 2023-11-20 02:19:22,090 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 2900, loss[loss=0.07699, simple_loss=0.09319, pruned_loss=0.01925, audio_tagging_loss=0.01115, over 16264.00 frames. ], tot_loss[loss=0.08259, simple_loss=0.1028, pruned_loss=0.02126, audio_tagging_loss=0.009909, over 3036799.76 frames. ], batch size: 64, lr: 5.83e-03, grad_scale: 32.0 2023-11-20 02:19:22,504 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 02:19:28,459 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.08 vs. limit=6.0 2023-11-20 02:19:34,205 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.76 vs. limit=15.0 2023-11-20 02:19:37,584 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.047e+01 8.169e+01 9.046e+01 9.875e+01 2.052e+02, threshold=1.809e+02, percent-clipped=1.0 2023-11-20 02:19:57,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=901186.6666666666, ans=0.015 2023-11-20 02:20:13,519 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 135200 2023-11-20 02:20:26,773 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 2950, loss[loss=0.07821, simple_loss=0.0937, pruned_loss=0.01873, audio_tagging_loss=0.01263, over 15870.00 frames. ], tot_loss[loss=0.08347, simple_loss=0.104, pruned_loss=0.02156, audio_tagging_loss=0.009913, over 3041332.58 frames. ], batch size: 58, lr: 5.83e-03, grad_scale: 32.0 2023-11-20 02:20:48,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=901453.3333333334, ans=0.0 2023-11-20 02:20:48,896 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.04 vs. limit=15.0 2023-11-20 02:21:08,255 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.52 vs. limit=15.0 2023-11-20 02:21:18,796 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 135250 2023-11-20 02:21:31,809 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 3000, loss[loss=0.1176, simple_loss=0.1448, pruned_loss=0.03544, audio_tagging_loss=0.009732, over 16299.00 frames. ], tot_loss[loss=0.08403, simple_loss=0.1047, pruned_loss=0.02179, audio_tagging_loss=0.009903, over 3044283.52 frames. ], batch size: 56, lr: 5.83e-03, grad_scale: 16.0 2023-11-20 02:21:31,810 INFO [train_asr.py:1285] (1/4) Computing validation loss 2023-11-20 02:22:01,094 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.7237, 0.3107, 3.3564, 3.1516, 2.4044, 2.9053, 3.1370, 3.1454], device='cuda:1') 2023-11-20 02:22:13,519 INFO [train_asr.py:1294] (1/4) Epoch 12, validation: loss=0.0631, simple_loss=0.05442, pruned_loss=0.006068, audio_tagging_loss=0.02982, over 4681554.00 frames. 2023-11-20 02:22:13,520 INFO [train_asr.py:1295] (1/4) Maximum memory allocated so far is 26082MB 2023-11-20 02:22:29,864 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.617e+01 8.337e+01 8.935e+01 9.823e+01 2.024e+02, threshold=1.787e+02, percent-clipped=1.0 2023-11-20 02:22:41,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=901853.3333333334, ans=0.0 2023-11-20 02:22:51,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=901920.0, ans=0.0 2023-11-20 02:22:52,990 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.49 vs. limit=15.0 2023-11-20 02:23:05,040 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 135300 2023-11-20 02:23:17,197 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 3050, loss[loss=0.09317, simple_loss=0.1187, pruned_loss=0.02607, audio_tagging_loss=0.007726, over 15244.00 frames. ], tot_loss[loss=0.08348, simple_loss=0.1038, pruned_loss=0.02158, audio_tagging_loss=0.009998, over 3045441.94 frames. ], batch size: 57, lr: 5.83e-03, grad_scale: 16.0 2023-11-20 02:23:17,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=902053.3333333334, ans=0.2 2023-11-20 02:23:56,515 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 02:24:03,432 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.50 vs. limit=12.0 2023-11-20 02:24:09,901 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 135350 2023-11-20 02:24:10,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=902320.0, ans=0.125 2023-11-20 02:24:22,083 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 3100, loss[loss=0.06388, simple_loss=0.08228, pruned_loss=0.0113, audio_tagging_loss=0.01144, over 14962.00 frames. ], tot_loss[loss=0.08336, simple_loss=0.1036, pruned_loss=0.02142, audio_tagging_loss=0.01016, over 3048536.17 frames. ], batch size: 56, lr: 5.83e-03, grad_scale: 16.0 2023-11-20 02:24:30,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=902386.6666666666, ans=0.125 2023-11-20 02:24:39,886 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.062e+01 8.132e+01 9.002e+01 1.001e+02 1.327e+02, threshold=1.800e+02, percent-clipped=0.0 2023-11-20 02:25:00,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=902586.6666666666, ans=0.125 2023-11-20 02:25:10,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=902586.6666666666, ans=0.5 2023-11-20 02:25:13,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=902653.3333333334, ans=0.0 2023-11-20 02:25:14,132 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 135400 2023-11-20 02:25:22,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=902653.3333333334, ans=0.125 2023-11-20 02:25:27,956 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 3150, loss[loss=0.06719, simple_loss=0.07855, pruned_loss=0.01416, audio_tagging_loss=0.01376, over 16489.00 frames. ], tot_loss[loss=0.08366, simple_loss=0.1037, pruned_loss=0.02153, audio_tagging_loss=0.01026, over 3038574.07 frames. ], batch size: 64, lr: 5.83e-03, grad_scale: 16.0 2023-11-20 02:25:35,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=902720.0, ans=0.125 2023-11-20 02:25:41,212 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 02:26:11,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=902920.0, ans=0.02 2023-11-20 02:26:11,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=902920.0, ans=0.125 2023-11-20 02:26:14,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=902920.0, ans=0.0 2023-11-20 02:26:20,952 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 135450 2023-11-20 02:26:32,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=903053.3333333334, ans=0.04949747468305833 2023-11-20 02:26:33,312 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 3200, loss[loss=0.08375, simple_loss=0.1012, pruned_loss=0.02088, audio_tagging_loss=0.01226, over 15435.00 frames. ], tot_loss[loss=0.08364, simple_loss=0.1035, pruned_loss=0.02148, audio_tagging_loss=0.01042, over 3036796.56 frames. ], batch size: 57, lr: 5.83e-03, grad_scale: 32.0 2023-11-20 02:26:46,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=903120.0, ans=0.0 2023-11-20 02:26:47,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=903120.0, ans=0.1 2023-11-20 02:26:49,839 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.506e+01 8.472e+01 8.960e+01 9.869e+01 1.272e+02, threshold=1.792e+02, percent-clipped=0.0 2023-11-20 02:26:52,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=903120.0, ans=0.2 2023-11-20 02:27:16,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=903253.3333333334, ans=0.0 2023-11-20 02:27:25,494 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 135500 2023-11-20 02:27:28,986 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.79 vs. limit=12.0 2023-11-20 02:27:38,279 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 3250, loss[loss=0.07263, simple_loss=0.08538, pruned_loss=0.01825, audio_tagging_loss=0.0117, over 14944.00 frames. ], tot_loss[loss=0.08323, simple_loss=0.1031, pruned_loss=0.02119, audio_tagging_loss=0.01048, over 3031425.66 frames. ], batch size: 55, lr: 5.82e-03, grad_scale: 32.0 2023-11-20 02:27:50,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=903453.3333333334, ans=0.2 2023-11-20 02:28:16,138 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.04 vs. limit=15.0 2023-11-20 02:28:17,859 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.49 vs. limit=15.0 2023-11-20 02:28:18,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=903586.6666666666, ans=0.125 2023-11-20 02:28:18,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=903586.6666666666, ans=0.125 2023-11-20 02:28:19,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=903586.6666666666, ans=0.0 2023-11-20 02:28:29,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=903653.3333333334, ans=0.1 2023-11-20 02:28:30,472 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 135550 2023-11-20 02:28:30,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=903653.3333333334, ans=0.0 2023-11-20 02:28:43,513 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 3300, loss[loss=0.07051, simple_loss=0.08595, pruned_loss=0.01738, audio_tagging_loss=0.01015, over 14962.00 frames. ], tot_loss[loss=0.08288, simple_loss=0.1024, pruned_loss=0.02102, audio_tagging_loss=0.01065, over 3039197.09 frames. ], batch size: 60, lr: 5.82e-03, grad_scale: 32.0 2023-11-20 02:28:45,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=903720.0, ans=0.1 2023-11-20 02:28:50,611 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.00 vs. limit=15.0 2023-11-20 02:28:59,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=903786.6666666666, ans=0.0 2023-11-20 02:29:00,619 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.558e+01 8.302e+01 8.686e+01 9.682e+01 1.210e+02, threshold=1.737e+02, percent-clipped=0.0 2023-11-20 02:29:02,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=903786.6666666666, ans=0.09899494936611666 2023-11-20 02:29:15,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=903853.3333333334, ans=0.125 2023-11-20 02:29:16,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=903853.3333333334, ans=0.125 2023-11-20 02:29:22,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=903920.0, ans=0.0 2023-11-20 02:29:35,799 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 135600 2023-11-20 02:29:38,202 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.76 vs. limit=15.0 2023-11-20 02:29:48,784 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 3350, loss[loss=0.08129, simple_loss=0.1066, pruned_loss=0.01919, audio_tagging_loss=0.00883, over 15063.00 frames. ], tot_loss[loss=0.08314, simple_loss=0.1028, pruned_loss=0.02125, audio_tagging_loss=0.01051, over 3036964.73 frames. ], batch size: 56, lr: 5.82e-03, grad_scale: 32.0 2023-11-20 02:29:50,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=904053.3333333334, ans=10.0 2023-11-20 02:30:18,992 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.16 vs. limit=22.5 2023-11-20 02:30:31,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=904253.3333333334, ans=0.125 2023-11-20 02:30:33,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=904253.3333333334, ans=0.125 2023-11-20 02:30:34,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=904253.3333333334, ans=0.125 2023-11-20 02:30:39,899 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 135650 2023-11-20 02:30:44,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=904320.0, ans=0.125 2023-11-20 02:30:52,699 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 3400, loss[loss=0.1, simple_loss=0.1303, pruned_loss=0.02672, audio_tagging_loss=0.008115, over 15224.00 frames. ], tot_loss[loss=0.08332, simple_loss=0.103, pruned_loss=0.02147, audio_tagging_loss=0.01035, over 3035623.69 frames. ], batch size: 55, lr: 5.82e-03, grad_scale: 32.0 2023-11-20 02:31:09,369 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.968e+01 8.145e+01 8.862e+01 9.550e+01 1.351e+02, threshold=1.772e+02, percent-clipped=0.0 2023-11-20 02:31:38,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=904586.6666666666, ans=0.125 2023-11-20 02:31:44,800 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 135700 2023-11-20 02:31:52,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=904653.3333333334, ans=0.125 2023-11-20 02:31:57,639 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 3450, loss[loss=0.08059, simple_loss=0.09657, pruned_loss=0.02334, audio_tagging_loss=0.008958, over 14723.00 frames. ], tot_loss[loss=0.0838, simple_loss=0.1039, pruned_loss=0.02168, audio_tagging_loss=0.01016, over 3039391.50 frames. ], batch size: 59, lr: 5.82e-03, grad_scale: 32.0 2023-11-20 02:31:59,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=904720.0, ans=0.1 2023-11-20 02:32:02,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=904720.0, ans=0.1 2023-11-20 02:32:24,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=904853.3333333334, ans=0.0 2023-11-20 02:32:47,827 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.98 vs. limit=15.0 2023-11-20 02:32:49,689 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 135750 2023-11-20 02:32:55,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.whiten.whitening_limit, batch_count=904986.6666666666, ans=12.0 2023-11-20 02:33:03,085 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 3500, loss[loss=0.07289, simple_loss=0.08603, pruned_loss=0.01692, audio_tagging_loss=0.01296, over 15759.00 frames. ], tot_loss[loss=0.08339, simple_loss=0.1037, pruned_loss=0.02145, audio_tagging_loss=0.01008, over 3041620.40 frames. ], batch size: 57, lr: 5.82e-03, grad_scale: 32.0 2023-11-20 02:33:13,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=905053.3333333334, ans=0.125 2023-11-20 02:33:18,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=905120.0, ans=0.05 2023-11-20 02:33:19,507 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.046e+01 8.042e+01 8.771e+01 9.579e+01 1.310e+02, threshold=1.754e+02, percent-clipped=0.0 2023-11-20 02:33:36,429 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 02:33:37,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=905186.6666666666, ans=0.0 2023-11-20 02:33:55,060 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 135800 2023-11-20 02:34:08,281 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 3550, loss[loss=0.1016, simple_loss=0.1278, pruned_loss=0.02755, audio_tagging_loss=0.01011, over 15467.00 frames. ], tot_loss[loss=0.08306, simple_loss=0.1034, pruned_loss=0.02133, audio_tagging_loss=0.01005, over 3043597.78 frames. ], batch size: 57, lr: 5.82e-03, grad_scale: 32.0 2023-11-20 02:34:10,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=905386.6666666666, ans=0.0 2023-11-20 02:34:59,928 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 135850 2023-11-20 02:35:00,469 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.26 vs. limit=15.0 2023-11-20 02:35:01,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=905653.3333333334, ans=0.2 2023-11-20 02:35:12,746 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 3600, loss[loss=0.09343, simple_loss=0.1261, pruned_loss=0.02531, audio_tagging_loss=0.005062, over 15840.00 frames. ], tot_loss[loss=0.08285, simple_loss=0.1031, pruned_loss=0.02123, audio_tagging_loss=0.01006, over 3044802.56 frames. ], batch size: 57, lr: 5.82e-03, grad_scale: 32.0 2023-11-20 02:35:20,813 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.87 vs. limit=6.0 2023-11-20 02:35:28,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=905786.6666666666, ans=0.0 2023-11-20 02:35:29,264 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.874e+01 8.123e+01 9.173e+01 1.010e+02 1.525e+02, threshold=1.835e+02, percent-clipped=0.0 2023-11-20 02:35:43,887 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=13.37 vs. limit=15.0 2023-11-20 02:35:57,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=905920.0, ans=0.0 2023-11-20 02:35:58,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=905920.0, ans=0.125 2023-11-20 02:36:04,307 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 135900 2023-11-20 02:36:11,775 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.57 vs. limit=15.0 2023-11-20 02:36:17,244 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 3650, loss[loss=0.08278, simple_loss=0.1076, pruned_loss=0.01906, audio_tagging_loss=0.009904, over 15392.00 frames. ], tot_loss[loss=0.08341, simple_loss=0.1041, pruned_loss=0.02138, audio_tagging_loss=0.009979, over 3042857.78 frames. ], batch size: 57, lr: 5.82e-03, grad_scale: 32.0 2023-11-20 02:36:20,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=906053.3333333334, ans=0.1 2023-11-20 02:36:25,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=906053.3333333334, ans=0.2 2023-11-20 02:36:52,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=906186.6666666666, ans=0.125 2023-11-20 02:37:09,384 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 135950 2023-11-20 02:37:16,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=906320.0, ans=0.125 2023-11-20 02:37:20,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=906320.0, ans=0.125 2023-11-20 02:37:22,737 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 3700, loss[loss=0.07206, simple_loss=0.08401, pruned_loss=0.01982, audio_tagging_loss=0.01024, over 13296.00 frames. ], tot_loss[loss=0.08353, simple_loss=0.1043, pruned_loss=0.0214, audio_tagging_loss=0.01, over 3048290.07 frames. ], batch size: 52, lr: 5.82e-03, grad_scale: 32.0 2023-11-20 02:37:33,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=906453.3333333334, ans=0.0 2023-11-20 02:37:38,491 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.777e+01 8.262e+01 8.837e+01 9.420e+01 1.280e+02, threshold=1.767e+02, percent-clipped=0.0 2023-11-20 02:37:44,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=906453.3333333334, ans=0.125 2023-11-20 02:37:50,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=906520.0, ans=0.5 2023-11-20 02:37:58,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=906520.0, ans=0.07 2023-11-20 02:38:13,895 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 136000 2023-11-20 02:38:28,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=906720.0, ans=0.125 2023-11-20 02:38:29,418 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 3750, loss[loss=0.07863, simple_loss=0.08705, pruned_loss=0.02302, audio_tagging_loss=0.01208, over 15345.00 frames. ], tot_loss[loss=0.08321, simple_loss=0.1035, pruned_loss=0.02133, audio_tagging_loss=0.01011, over 3043760.48 frames. ], batch size: 59, lr: 5.81e-03, grad_scale: 16.0 2023-11-20 02:38:47,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=906786.6666666666, ans=0.125 2023-11-20 02:39:06,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=906853.3333333334, ans=0.0 2023-11-20 02:39:07,080 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.32 vs. limit=15.0 2023-11-20 02:39:16,261 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 02:39:17,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=906920.0, ans=0.125 2023-11-20 02:39:21,896 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 136050 2023-11-20 02:39:23,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=906986.6666666666, ans=0.1 2023-11-20 02:39:34,592 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 3800, loss[loss=0.08129, simple_loss=0.1005, pruned_loss=0.02008, audio_tagging_loss=0.01097, over 15840.00 frames. ], tot_loss[loss=0.08368, simple_loss=0.1043, pruned_loss=0.02144, audio_tagging_loss=0.0101, over 3045655.28 frames. ], batch size: 57, lr: 5.81e-03, grad_scale: 16.0 2023-11-20 02:39:37,643 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.51 vs. limit=6.0 2023-11-20 02:39:52,986 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.871e+01 8.426e+01 8.980e+01 9.690e+01 1.284e+02, threshold=1.796e+02, percent-clipped=0.0 2023-11-20 02:40:02,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=907186.6666666666, ans=0.0 2023-11-20 02:40:06,141 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.69 vs. limit=22.5 2023-11-20 02:40:09,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=907186.6666666666, ans=0.1 2023-11-20 02:40:13,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=907253.3333333334, ans=0.0 2023-11-20 02:40:26,335 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 136100 2023-11-20 02:40:27,192 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.66 vs. limit=15.0 2023-11-20 02:40:27,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=907320.0, ans=0.1 2023-11-20 02:40:29,539 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.63 vs. limit=15.0 2023-11-20 02:40:34,160 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.01 vs. limit=22.5 2023-11-20 02:40:39,674 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 3850, loss[loss=0.07072, simple_loss=0.08931, pruned_loss=0.01539, audio_tagging_loss=0.01067, over 14981.00 frames. ], tot_loss[loss=0.0835, simple_loss=0.1039, pruned_loss=0.02135, audio_tagging_loss=0.01021, over 3050190.92 frames. ], batch size: 58, lr: 5.81e-03, grad_scale: 16.0 2023-11-20 02:40:53,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=907453.3333333334, ans=0.2 2023-11-20 02:40:56,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=907453.3333333334, ans=0.0 2023-11-20 02:41:31,400 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 136150 2023-11-20 02:41:32,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=907653.3333333334, ans=0.125 2023-11-20 02:41:36,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=907653.3333333334, ans=0.125 2023-11-20 02:41:40,235 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 02:41:43,687 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 3900, loss[loss=0.08491, simple_loss=0.1064, pruned_loss=0.02146, audio_tagging_loss=0.01024, over 14633.00 frames. ], tot_loss[loss=0.08372, simple_loss=0.1036, pruned_loss=0.02162, audio_tagging_loss=0.01028, over 3042366.69 frames. ], batch size: 56, lr: 5.81e-03, grad_scale: 16.0 2023-11-20 02:41:50,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=907720.0, ans=0.125 2023-11-20 02:42:02,320 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.823e+01 8.198e+01 8.910e+01 9.669e+01 1.262e+02, threshold=1.782e+02, percent-clipped=0.0 2023-11-20 02:42:23,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=907920.0, ans=0.025 2023-11-20 02:42:34,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=907986.6666666666, ans=0.125 2023-11-20 02:42:35,721 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 136200 2023-11-20 02:42:39,322 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.49 vs. limit=15.0 2023-11-20 02:42:42,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=907986.6666666666, ans=0.1 2023-11-20 02:42:48,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=908053.3333333334, ans=0.125 2023-11-20 02:42:49,270 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 3950, loss[loss=0.08592, simple_loss=0.1005, pruned_loss=0.02293, audio_tagging_loss=0.01273, over 14678.00 frames. ], tot_loss[loss=0.0838, simple_loss=0.1037, pruned_loss=0.02159, audio_tagging_loss=0.01034, over 3041474.59 frames. ], batch size: 56, lr: 5.81e-03, grad_scale: 16.0 2023-11-20 02:43:00,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=908120.0, ans=0.0 2023-11-20 02:43:20,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=908186.6666666666, ans=10.0 2023-11-20 02:43:22,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=908186.6666666666, ans=0.0 2023-11-20 02:43:32,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=908253.3333333334, ans=0.0 2023-11-20 02:43:38,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=908253.3333333334, ans=0.2 2023-11-20 02:43:40,669 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 136250 2023-11-20 02:43:42,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=908320.0, ans=0.125 2023-11-20 02:43:48,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=908320.0, ans=0.0 2023-11-20 02:43:52,744 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 4000, loss[loss=0.09036, simple_loss=0.1155, pruned_loss=0.02342, audio_tagging_loss=0.00921, over 14259.00 frames. ], tot_loss[loss=0.08437, simple_loss=0.1045, pruned_loss=0.02176, audio_tagging_loss=0.01037, over 3036319.72 frames. ], batch size: 53, lr: 5.81e-03, grad_scale: 32.0 2023-11-20 02:44:11,155 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.441e+01 8.168e+01 8.877e+01 9.706e+01 2.567e+02, threshold=1.775e+02, percent-clipped=1.0 2023-11-20 02:44:14,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=908453.3333333334, ans=0.125 2023-11-20 02:44:18,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=908520.0, ans=0.0 2023-11-20 02:44:35,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=908586.6666666666, ans=0.95 2023-11-20 02:44:44,754 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 136300 2023-11-20 02:44:56,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=908720.0, ans=0.2 2023-11-20 02:44:57,476 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 4050, loss[loss=0.06935, simple_loss=0.08092, pruned_loss=0.01811, audio_tagging_loss=0.01078, over 14884.00 frames. ], tot_loss[loss=0.08384, simple_loss=0.1035, pruned_loss=0.02153, audio_tagging_loss=0.01055, over 3040943.57 frames. ], batch size: 57, lr: 5.81e-03, grad_scale: 32.0 2023-11-20 02:44:57,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=908720.0, ans=0.0 2023-11-20 02:45:01,190 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 02:45:19,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=908786.6666666666, ans=0.0 2023-11-20 02:45:26,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=908853.3333333334, ans=0.1 2023-11-20 02:45:43,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=908920.0, ans=0.0 2023-11-20 02:45:49,226 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 136350 2023-11-20 02:46:01,976 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 4100, loss[loss=0.08172, simple_loss=0.1005, pruned_loss=0.02241, audio_tagging_loss=0.009058, over 15443.00 frames. ], tot_loss[loss=0.08355, simple_loss=0.1035, pruned_loss=0.0213, audio_tagging_loss=0.01052, over 3044288.65 frames. ], batch size: 57, lr: 5.81e-03, grad_scale: 32.0 2023-11-20 02:46:02,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=909053.3333333334, ans=0.2 2023-11-20 02:46:18,016 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.61 vs. limit=15.0 2023-11-20 02:46:19,906 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.593e+01 8.318e+01 8.893e+01 9.801e+01 1.256e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-20 02:46:37,957 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.81 vs. limit=15.0 2023-11-20 02:46:39,140 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.59 vs. limit=15.0 2023-11-20 02:46:41,774 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.61 vs. limit=6.0 2023-11-20 02:46:54,582 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 136400 2023-11-20 02:47:07,357 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 4150, loss[loss=0.1009, simple_loss=0.1258, pruned_loss=0.02754, audio_tagging_loss=0.0105, over 15074.00 frames. ], tot_loss[loss=0.08379, simple_loss=0.1041, pruned_loss=0.02142, audio_tagging_loss=0.01033, over 3050906.05 frames. ], batch size: 56, lr: 5.81e-03, grad_scale: 32.0 2023-11-20 02:47:15,832 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.05 vs. limit=15.0 2023-11-20 02:47:28,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=909453.3333333334, ans=0.09899494936611666 2023-11-20 02:47:55,260 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 02:47:59,035 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 136450 2023-11-20 02:47:59,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=909653.3333333334, ans=0.0 2023-11-20 02:48:08,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=909653.3333333334, ans=0.125 2023-11-20 02:48:12,031 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 4200, loss[loss=0.08178, simple_loss=0.1019, pruned_loss=0.02365, audio_tagging_loss=0.007199, over 15437.00 frames. ], tot_loss[loss=0.08339, simple_loss=0.1036, pruned_loss=0.02136, audio_tagging_loss=0.01022, over 3052042.60 frames. ], batch size: 57, lr: 5.80e-03, grad_scale: 32.0 2023-11-20 02:48:28,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=909786.6666666666, ans=0.125 2023-11-20 02:48:30,384 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.105e+01 8.354e+01 9.339e+01 1.014e+02 1.353e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-20 02:48:32,913 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.26 vs. limit=15.0 2023-11-20 02:48:41,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=909853.3333333334, ans=0.0 2023-11-20 02:48:45,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=909853.3333333334, ans=0.125 2023-11-20 02:49:03,994 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 136500 2023-11-20 02:49:16,907 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 4250, loss[loss=0.1066, simple_loss=0.1345, pruned_loss=0.03268, audio_tagging_loss=0.006604, over 15269.00 frames. ], tot_loss[loss=0.08322, simple_loss=0.1037, pruned_loss=0.02131, audio_tagging_loss=0.01006, over 3057664.52 frames. ], batch size: 56, lr: 5.80e-03, grad_scale: 32.0 2023-11-20 02:49:32,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=910120.0, ans=0.0 2023-11-20 02:49:34,190 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.39 vs. limit=22.5 2023-11-20 02:49:46,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=910186.6666666666, ans=0.125 2023-11-20 02:49:48,375 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.23 vs. limit=12.0 2023-11-20 02:49:49,480 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.37 vs. limit=15.0 2023-11-20 02:49:50,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=910186.6666666666, ans=0.0 2023-11-20 02:50:08,486 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 136550 2023-11-20 02:50:17,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=910320.0, ans=0.125 2023-11-20 02:50:18,194 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.53 vs. limit=15.0 2023-11-20 02:50:21,277 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 4300, loss[loss=0.0989, simple_loss=0.1241, pruned_loss=0.02693, audio_tagging_loss=0.0099, over 14520.00 frames. ], tot_loss[loss=0.08275, simple_loss=0.1035, pruned_loss=0.02108, audio_tagging_loss=0.009931, over 3054992.98 frames. ], batch size: 54, lr: 5.80e-03, grad_scale: 16.0 2023-11-20 02:50:22,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=910386.6666666666, ans=0.125 2023-11-20 02:50:25,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=910386.6666666666, ans=0.125 2023-11-20 02:50:40,211 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.215e+01 8.438e+01 9.081e+01 9.991e+01 1.404e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-20 02:50:45,073 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.88 vs. limit=15.0 2023-11-20 02:50:50,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=910520.0, ans=0.125 2023-11-20 02:51:00,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=910586.6666666666, ans=0.0 2023-11-20 02:51:05,716 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.59 vs. limit=15.0 2023-11-20 02:51:12,580 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 136600 2023-11-20 02:51:13,103 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.49 vs. limit=6.0 2023-11-20 02:51:23,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=910653.3333333334, ans=0.5 2023-11-20 02:51:25,592 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 4350, loss[loss=0.07603, simple_loss=0.09358, pruned_loss=0.01954, audio_tagging_loss=0.009699, over 15788.00 frames. ], tot_loss[loss=0.08346, simple_loss=0.1045, pruned_loss=0.02129, audio_tagging_loss=0.009912, over 3051748.99 frames. ], batch size: 59, lr: 5.80e-03, grad_scale: 16.0 2023-11-20 02:51:32,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=910720.0, ans=0.125 2023-11-20 02:52:17,048 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 136650 2023-11-20 02:52:24,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=910986.6666666666, ans=0.125 2023-11-20 02:52:28,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=910986.6666666666, ans=0.1 2023-11-20 02:52:30,325 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 4400, loss[loss=0.06674, simple_loss=0.08958, pruned_loss=0.01271, audio_tagging_loss=0.009237, over 14295.00 frames. ], tot_loss[loss=0.08295, simple_loss=0.1038, pruned_loss=0.02114, audio_tagging_loss=0.009917, over 3046033.43 frames. ], batch size: 54, lr: 5.80e-03, grad_scale: 32.0 2023-11-20 02:52:34,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=911053.3333333334, ans=0.035 2023-11-20 02:52:38,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=911053.3333333334, ans=0.1 2023-11-20 02:52:43,039 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.38 vs. limit=15.0 2023-11-20 02:52:49,547 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.680e+01 7.985e+01 8.679e+01 9.460e+01 1.350e+02, threshold=1.736e+02, percent-clipped=0.0 2023-11-20 02:53:12,744 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.41 vs. limit=15.0 2023-11-20 02:53:20,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=911320.0, ans=0.2 2023-11-20 02:53:21,549 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 136700 2023-11-20 02:53:23,275 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.72 vs. limit=22.5 2023-11-20 02:53:26,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=911320.0, ans=0.125 2023-11-20 02:53:33,764 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.16 vs. limit=12.0 2023-11-20 02:53:34,329 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 4450, loss[loss=0.07128, simple_loss=0.06955, pruned_loss=0.02055, audio_tagging_loss=0.01596, over 15121.00 frames. ], tot_loss[loss=0.08295, simple_loss=0.1037, pruned_loss=0.02124, audio_tagging_loss=0.009875, over 3050565.67 frames. ], batch size: 59, lr: 5.80e-03, grad_scale: 32.0 2023-11-20 02:53:48,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=911453.3333333334, ans=0.125 2023-11-20 02:54:11,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=911586.6666666666, ans=0.1 2023-11-20 02:54:19,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=911586.6666666666, ans=0.1 2023-11-20 02:54:25,567 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 136750 2023-11-20 02:54:34,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=911653.3333333334, ans=0.125 2023-11-20 02:54:36,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=911653.3333333334, ans=0.1 2023-11-20 02:54:37,812 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.57 vs. limit=15.0 2023-11-20 02:54:37,878 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.49 vs. limit=15.0 2023-11-20 02:54:38,256 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 4500, loss[loss=0.05991, simple_loss=0.06539, pruned_loss=0.01368, audio_tagging_loss=0.01354, over 14037.00 frames. ], tot_loss[loss=0.08271, simple_loss=0.1033, pruned_loss=0.02113, audio_tagging_loss=0.009954, over 3052838.23 frames. ], batch size: 54, lr: 5.80e-03, grad_scale: 32.0 2023-11-20 02:54:55,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=911786.6666666666, ans=0.125 2023-11-20 02:54:55,712 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.00 vs. limit=22.5 2023-11-20 02:54:57,902 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.729e+01 8.079e+01 8.708e+01 9.513e+01 1.325e+02, threshold=1.742e+02, percent-clipped=0.0 2023-11-20 02:55:12,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=911853.3333333334, ans=0.125 2023-11-20 02:55:17,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=911920.0, ans=0.0 2023-11-20 02:55:28,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=911986.6666666666, ans=0.125 2023-11-20 02:55:29,820 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 136800 2023-11-20 02:55:43,006 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 4550, loss[loss=0.05883, simple_loss=0.06885, pruned_loss=0.01242, audio_tagging_loss=0.01198, over 14293.00 frames. ], tot_loss[loss=0.08236, simple_loss=0.1028, pruned_loss=0.021, audio_tagging_loss=0.009975, over 3044778.54 frames. ], batch size: 54, lr: 5.80e-03, grad_scale: 32.0 2023-11-20 02:55:52,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=912053.3333333334, ans=0.125 2023-11-20 02:55:53,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=912053.3333333334, ans=0.2 2023-11-20 02:56:32,669 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 02:56:32,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=912253.3333333334, ans=0.1 2023-11-20 02:56:35,220 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 136850 2023-11-20 02:56:48,536 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 4600, loss[loss=0.06738, simple_loss=0.07897, pruned_loss=0.01537, audio_tagging_loss=0.01252, over 15744.00 frames. ], tot_loss[loss=0.08262, simple_loss=0.103, pruned_loss=0.02109, audio_tagging_loss=0.01001, over 3046901.63 frames. ], batch size: 61, lr: 5.80e-03, grad_scale: 32.0 2023-11-20 02:56:52,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=912386.6666666666, ans=0.0 2023-11-20 02:56:58,710 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=912386.6666666666, ans=0.0 2023-11-20 02:57:05,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=912453.3333333334, ans=0.1 2023-11-20 02:57:07,920 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.584e+01 7.967e+01 8.569e+01 9.519e+01 1.814e+02, threshold=1.714e+02, percent-clipped=1.0 2023-11-20 02:57:09,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=912453.3333333334, ans=0.0 2023-11-20 02:57:18,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=912520.0, ans=0.1 2023-11-20 02:57:24,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=912520.0, ans=0.0 2023-11-20 02:57:41,178 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 136900 2023-11-20 02:57:53,900 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 4650, loss[loss=0.07151, simple_loss=0.08535, pruned_loss=0.01641, audio_tagging_loss=0.01243, over 15264.00 frames. ], tot_loss[loss=0.08218, simple_loss=0.1023, pruned_loss=0.02091, audio_tagging_loss=0.01014, over 3044788.26 frames. ], batch size: 62, lr: 5.80e-03, grad_scale: 32.0 2023-11-20 02:57:59,619 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.38 vs. limit=12.0 2023-11-20 02:58:26,594 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.59 vs. limit=15.0 2023-11-20 02:58:38,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=912920.0, ans=0.0 2023-11-20 02:58:43,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=912920.0, ans=0.125 2023-11-20 02:58:46,001 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 136950 2023-11-20 02:58:58,291 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 4700, loss[loss=0.06496, simple_loss=0.07128, pruned_loss=0.01616, audio_tagging_loss=0.01315, over 14850.00 frames. ], tot_loss[loss=0.08289, simple_loss=0.1029, pruned_loss=0.02113, audio_tagging_loss=0.01031, over 3047049.97 frames. ], batch size: 58, lr: 5.79e-03, grad_scale: 32.0 2023-11-20 02:59:09,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=913053.3333333334, ans=0.2 2023-11-20 02:59:09,489 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.88 vs. limit=15.0 2023-11-20 02:59:13,743 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.63 vs. limit=22.5 2023-11-20 02:59:17,786 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.469e+01 8.087e+01 8.546e+01 9.453e+01 1.405e+02, threshold=1.709e+02, percent-clipped=0.0 2023-11-20 02:59:19,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=913120.0, ans=0.125 2023-11-20 02:59:32,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=913186.6666666666, ans=0.125 2023-11-20 02:59:39,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=913253.3333333334, ans=0.5 2023-11-20 02:59:49,520 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 137000 2023-11-20 02:59:55,334 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.18 vs. limit=22.5 2023-11-20 03:00:03,308 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 4750, loss[loss=0.07638, simple_loss=0.09587, pruned_loss=0.01925, audio_tagging_loss=0.009202, over 15027.00 frames. ], tot_loss[loss=0.0831, simple_loss=0.1031, pruned_loss=0.02125, audio_tagging_loss=0.01031, over 3051342.38 frames. ], batch size: 58, lr: 5.79e-03, grad_scale: 32.0 2023-11-20 03:00:03,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=913386.6666666666, ans=0.0 2023-11-20 03:00:23,237 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.98 vs. limit=15.0 2023-11-20 03:00:33,849 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2023-11-20 03:00:54,730 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 137050 2023-11-20 03:01:06,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=913720.0, ans=0.2 2023-11-20 03:01:07,324 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 4800, loss[loss=0.07993, simple_loss=0.09968, pruned_loss=0.02034, audio_tagging_loss=0.00975, over 15192.00 frames. ], tot_loss[loss=0.08225, simple_loss=0.1018, pruned_loss=0.02088, audio_tagging_loss=0.01048, over 3050344.98 frames. ], batch size: 55, lr: 5.79e-03, grad_scale: 32.0 2023-11-20 03:01:19,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=913786.6666666666, ans=0.5 2023-11-20 03:01:20,330 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.22 vs. limit=10.0 2023-11-20 03:01:23,852 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.13 vs. limit=15.0 2023-11-20 03:01:25,569 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.113e+01 8.051e+01 8.703e+01 9.476e+01 1.263e+02, threshold=1.741e+02, percent-clipped=0.0 2023-11-20 03:01:29,319 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.08 vs. limit=15.0 2023-11-20 03:01:31,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=913853.3333333334, ans=0.0 2023-11-20 03:01:52,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=913920.0, ans=0.125 2023-11-20 03:01:58,922 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 137100 2023-11-20 03:01:58,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=913986.6666666666, ans=0.125 2023-11-20 03:02:09,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=913986.6666666666, ans=0.125 2023-11-20 03:02:11,285 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 4850, loss[loss=0.09129, simple_loss=0.1184, pruned_loss=0.02155, audio_tagging_loss=0.01053, over 16008.00 frames. ], tot_loss[loss=0.08242, simple_loss=0.1022, pruned_loss=0.02083, audio_tagging_loss=0.01047, over 3052117.78 frames. ], batch size: 57, lr: 5.79e-03, grad_scale: 32.0 2023-11-20 03:02:28,867 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.90 vs. limit=15.0 2023-11-20 03:02:31,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=914120.0, ans=0.125 2023-11-20 03:02:35,767 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.21 vs. limit=10.0 2023-11-20 03:03:02,559 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 137150 2023-11-20 03:03:15,884 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 4900, loss[loss=0.08559, simple_loss=0.1091, pruned_loss=0.0223, audio_tagging_loss=0.008757, over 16743.00 frames. ], tot_loss[loss=0.08298, simple_loss=0.103, pruned_loss=0.02114, audio_tagging_loss=0.01035, over 3049021.22 frames. ], batch size: 62, lr: 5.79e-03, grad_scale: 32.0 2023-11-20 03:03:18,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=914386.6666666666, ans=0.1 2023-11-20 03:03:35,166 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.551e+01 8.052e+01 8.825e+01 9.558e+01 1.326e+02, threshold=1.765e+02, percent-clipped=0.0 2023-11-20 03:04:07,043 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 137200 2023-11-20 03:04:08,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=914653.3333333334, ans=0.1 2023-11-20 03:04:12,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=914653.3333333334, ans=0.1 2023-11-20 03:04:14,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=914653.3333333334, ans=0.125 2023-11-20 03:04:20,294 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 4950, loss[loss=0.09615, simple_loss=0.1277, pruned_loss=0.02344, audio_tagging_loss=0.008852, over 14714.00 frames. ], tot_loss[loss=0.08307, simple_loss=0.1034, pruned_loss=0.02126, audio_tagging_loss=0.01013, over 3049056.56 frames. ], batch size: 55, lr: 5.79e-03, grad_scale: 32.0 2023-11-20 03:04:54,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=914853.3333333334, ans=0.05 2023-11-20 03:05:12,448 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 137250 2023-11-20 03:05:20,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=914986.6666666666, ans=0.05 2023-11-20 03:05:24,396 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 5000, loss[loss=0.0579, simple_loss=0.06989, pruned_loss=0.01181, audio_tagging_loss=0.01116, over 14039.00 frames. ], tot_loss[loss=0.08179, simple_loss=0.1017, pruned_loss=0.02083, audio_tagging_loss=0.0101, over 3047180.78 frames. ], batch size: 54, lr: 5.79e-03, grad_scale: 32.0 2023-11-20 03:05:40,427 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.09 vs. limit=22.5 2023-11-20 03:05:41,200 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 03:05:43,919 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.699e+01 7.904e+01 8.718e+01 9.618e+01 1.428e+02, threshold=1.744e+02, percent-clipped=0.0 2023-11-20 03:06:07,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=915253.3333333334, ans=0.125 2023-11-20 03:06:15,468 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 137300 2023-11-20 03:06:27,228 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=915386.6666666666, ans=0.1 2023-11-20 03:06:27,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=915386.6666666666, ans=0.125 2023-11-20 03:06:28,273 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 5050, loss[loss=0.0932, simple_loss=0.1166, pruned_loss=0.02342, audio_tagging_loss=0.0115, over 15171.00 frames. ], tot_loss[loss=0.08196, simple_loss=0.1023, pruned_loss=0.0208, audio_tagging_loss=0.01004, over 3048741.86 frames. ], batch size: 56, lr: 5.79e-03, grad_scale: 32.0 2023-11-20 03:06:42,818 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.83 vs. limit=12.0 2023-11-20 03:06:52,724 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.67 vs. limit=22.5 2023-11-20 03:06:53,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=915520.0, ans=0.2 2023-11-20 03:07:20,291 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 137350 2023-11-20 03:07:21,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=915653.3333333334, ans=0.1 2023-11-20 03:07:32,323 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 5100, loss[loss=0.04956, simple_loss=0.05965, pruned_loss=0.0113, audio_tagging_loss=0.008433, over 15708.00 frames. ], tot_loss[loss=0.08152, simple_loss=0.1017, pruned_loss=0.0207, audio_tagging_loss=0.009997, over 3037401.32 frames. ], batch size: 61, lr: 5.79e-03, grad_scale: 32.0 2023-11-20 03:07:51,986 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.683e+01 7.866e+01 8.491e+01 9.250e+01 1.522e+02, threshold=1.698e+02, percent-clipped=0.0 2023-11-20 03:08:00,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=915853.3333333334, ans=0.0 2023-11-20 03:08:19,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=915920.0, ans=0.125 2023-11-20 03:08:19,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=915920.0, ans=0.0 2023-11-20 03:08:24,162 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 137400 2023-11-20 03:08:24,607 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.44 vs. limit=15.0 2023-11-20 03:08:27,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=915986.6666666666, ans=10.0 2023-11-20 03:08:34,424 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.62 vs. limit=6.0 2023-11-20 03:08:36,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=916053.3333333334, ans=0.2 2023-11-20 03:08:37,466 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 5150, loss[loss=0.08677, simple_loss=0.1113, pruned_loss=0.01977, audio_tagging_loss=0.01135, over 14659.00 frames. ], tot_loss[loss=0.08192, simple_loss=0.1021, pruned_loss=0.02079, audio_tagging_loss=0.01006, over 3033047.27 frames. ], batch size: 55, lr: 5.78e-03, grad_scale: 32.0 2023-11-20 03:09:01,825 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.76 vs. limit=10.0 2023-11-20 03:09:23,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=916253.3333333334, ans=0.0 2023-11-20 03:09:27,994 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.48 vs. limit=15.0 2023-11-20 03:09:29,561 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 137450 2023-11-20 03:09:42,210 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 5200, loss[loss=0.06967, simple_loss=0.09712, pruned_loss=0.01296, audio_tagging_loss=0.008145, over 15522.00 frames. ], tot_loss[loss=0.08153, simple_loss=0.1017, pruned_loss=0.02056, audio_tagging_loss=0.0101, over 3042318.35 frames. ], batch size: 58, lr: 5.78e-03, grad_scale: 32.0 2023-11-20 03:09:49,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=916386.6666666666, ans=0.125 2023-11-20 03:09:56,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=916453.3333333334, ans=0.0 2023-11-20 03:09:59,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=916453.3333333334, ans=0.2 2023-11-20 03:09:59,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=916453.3333333334, ans=0.125 2023-11-20 03:10:01,268 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.268e+01 8.182e+01 8.792e+01 9.724e+01 1.387e+02, threshold=1.758e+02, percent-clipped=0.0 2023-11-20 03:10:01,945 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.01 vs. limit=10.0 2023-11-20 03:10:05,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=916453.3333333334, ans=0.125 2023-11-20 03:10:05,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=916453.3333333334, ans=10.0 2023-11-20 03:10:33,789 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2023-11-20 03:10:34,166 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 137500 2023-11-20 03:10:37,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=916653.3333333334, ans=0.0 2023-11-20 03:10:46,717 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 5250, loss[loss=0.07736, simple_loss=0.09838, pruned_loss=0.0184, audio_tagging_loss=0.009773, over 15288.00 frames. ], tot_loss[loss=0.08272, simple_loss=0.1034, pruned_loss=0.02116, audio_tagging_loss=0.009864, over 3044672.04 frames. ], batch size: 57, lr: 5.78e-03, grad_scale: 32.0 2023-11-20 03:11:13,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=916853.3333333334, ans=0.125 2023-11-20 03:11:22,570 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.55 vs. limit=5.0 2023-11-20 03:11:38,210 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 137550 2023-11-20 03:11:51,372 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 5300, loss[loss=0.07855, simple_loss=0.1027, pruned_loss=0.01632, audio_tagging_loss=0.01089, over 15870.00 frames. ], tot_loss[loss=0.08326, simple_loss=0.1039, pruned_loss=0.0214, audio_tagging_loss=0.009893, over 3041790.25 frames. ], batch size: 60, lr: 5.78e-03, grad_scale: 32.0 2023-11-20 03:11:55,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=917053.3333333334, ans=0.0 2023-11-20 03:12:08,256 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 03:12:10,526 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.799e+01 8.318e+01 9.072e+01 9.912e+01 1.242e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-20 03:12:17,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=917186.6666666666, ans=0.125 2023-11-20 03:12:19,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=917186.6666666666, ans=0.0 2023-11-20 03:12:21,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=917186.6666666666, ans=0.0 2023-11-20 03:12:27,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=917186.6666666666, ans=0.0 2023-11-20 03:12:35,881 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.04 vs. limit=22.5 2023-11-20 03:12:43,192 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 137600 2023-11-20 03:12:56,066 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 5350, loss[loss=0.06097, simple_loss=0.06816, pruned_loss=0.01555, audio_tagging_loss=0.01134, over 14979.00 frames. ], tot_loss[loss=0.08305, simple_loss=0.1035, pruned_loss=0.02135, audio_tagging_loss=0.009939, over 3036701.06 frames. ], batch size: 57, lr: 5.78e-03, grad_scale: 32.0 2023-11-20 03:13:04,463 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.38 vs. limit=10.0 2023-11-20 03:13:06,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=917386.6666666666, ans=0.0 2023-11-20 03:13:23,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=917520.0, ans=0.1 2023-11-20 03:13:47,678 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 137650 2023-11-20 03:14:00,455 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 5400, loss[loss=0.08417, simple_loss=0.1064, pruned_loss=0.0236, audio_tagging_loss=0.007384, over 14345.00 frames. ], tot_loss[loss=0.08298, simple_loss=0.1031, pruned_loss=0.02135, audio_tagging_loss=0.01008, over 3034886.93 frames. ], batch size: 54, lr: 5.78e-03, grad_scale: 32.0 2023-11-20 03:14:18,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=917786.6666666666, ans=0.125 2023-11-20 03:14:19,178 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.643e+01 8.272e+01 8.874e+01 9.617e+01 1.716e+02, threshold=1.775e+02, percent-clipped=0.0 2023-11-20 03:14:31,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=917853.3333333334, ans=0.125 2023-11-20 03:14:34,419 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.89 vs. limit=15.0 2023-11-20 03:14:38,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=917920.0, ans=0.0 2023-11-20 03:14:51,529 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 137700 2023-11-20 03:14:56,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=917986.6666666666, ans=0.125 2023-11-20 03:15:04,145 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 5450, loss[loss=0.09069, simple_loss=0.1112, pruned_loss=0.02254, audio_tagging_loss=0.01257, over 14431.00 frames. ], tot_loss[loss=0.08324, simple_loss=0.1032, pruned_loss=0.02147, audio_tagging_loss=0.01015, over 3032238.04 frames. ], batch size: 54, lr: 5.78e-03, grad_scale: 32.0 2023-11-20 03:15:30,587 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 03:15:35,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=918186.6666666666, ans=0.5 2023-11-20 03:15:36,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=918186.6666666666, ans=0.1 2023-11-20 03:15:43,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=918253.3333333334, ans=0.0 2023-11-20 03:15:54,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=918320.0, ans=0.125 2023-11-20 03:15:55,406 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 137750 2023-11-20 03:15:56,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=918320.0, ans=0.125 2023-11-20 03:16:08,288 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 5500, loss[loss=0.06651, simple_loss=0.07944, pruned_loss=0.0171, audio_tagging_loss=0.009692, over 14352.00 frames. ], tot_loss[loss=0.08335, simple_loss=0.1034, pruned_loss=0.02155, audio_tagging_loss=0.01009, over 3032872.65 frames. ], batch size: 54, lr: 5.78e-03, grad_scale: 32.0 2023-11-20 03:16:14,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=918386.6666666666, ans=0.0 2023-11-20 03:16:20,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=918453.3333333334, ans=0.125 2023-11-20 03:16:27,777 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.847e+01 8.238e+01 9.064e+01 9.896e+01 2.099e+02, threshold=1.813e+02, percent-clipped=1.0 2023-11-20 03:16:39,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=918520.0, ans=0.125 2023-11-20 03:16:43,456 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.86 vs. limit=10.0 2023-11-20 03:17:00,410 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 137800 2023-11-20 03:17:05,086 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.86 vs. limit=10.0 2023-11-20 03:17:13,430 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 5550, loss[loss=0.05728, simple_loss=0.07277, pruned_loss=0.01123, audio_tagging_loss=0.009663, over 15073.00 frames. ], tot_loss[loss=0.08281, simple_loss=0.103, pruned_loss=0.02112, audio_tagging_loss=0.01019, over 3029566.49 frames. ], batch size: 60, lr: 5.78e-03, grad_scale: 32.0 2023-11-20 03:17:37,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=918853.3333333334, ans=0.1 2023-11-20 03:17:51,137 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.42 vs. limit=12.0 2023-11-20 03:18:04,346 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 137850 2023-11-20 03:18:04,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=918986.6666666666, ans=0.125 2023-11-20 03:18:16,841 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 5600, loss[loss=0.09481, simple_loss=0.1203, pruned_loss=0.02464, audio_tagging_loss=0.01004, over 14199.00 frames. ], tot_loss[loss=0.08267, simple_loss=0.1026, pruned_loss=0.02101, audio_tagging_loss=0.01037, over 3028922.25 frames. ], batch size: 54, lr: 5.78e-03, grad_scale: 32.0 2023-11-20 03:18:35,341 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.491e+01 8.234e+01 9.061e+01 1.016e+02 1.381e+02, threshold=1.812e+02, percent-clipped=0.0 2023-11-20 03:18:36,712 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.89 vs. limit=15.0 2023-11-20 03:18:50,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=919186.6666666666, ans=0.125 2023-11-20 03:19:02,525 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 03:19:07,336 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 137900 2023-11-20 03:19:19,193 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 5650, loss[loss=0.1112, simple_loss=0.1385, pruned_loss=0.03453, audio_tagging_loss=0.007377, over 16001.00 frames. ], tot_loss[loss=0.08198, simple_loss=0.1014, pruned_loss=0.02088, audio_tagging_loss=0.01039, over 3036952.37 frames. ], batch size: 57, lr: 5.77e-03, grad_scale: 32.0 2023-11-20 03:19:19,695 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.94 vs. limit=15.0 2023-11-20 03:19:21,867 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 03:19:24,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=919386.6666666666, ans=0.0 2023-11-20 03:19:27,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=919386.6666666666, ans=0.1 2023-11-20 03:19:41,571 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 03:19:49,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=919520.0, ans=0.125 2023-11-20 03:20:09,960 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 137950 2023-11-20 03:20:23,368 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 5700, loss[loss=0.08192, simple_loss=0.1041, pruned_loss=0.02026, audio_tagging_loss=0.009613, over 15261.00 frames. ], tot_loss[loss=0.08238, simple_loss=0.1019, pruned_loss=0.02108, audio_tagging_loss=0.01036, over 3041186.85 frames. ], batch size: 54, lr: 5.77e-03, grad_scale: 32.0 2023-11-20 03:20:23,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=919720.0, ans=0.125 2023-11-20 03:20:36,098 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 03:20:42,396 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.046e+01 8.375e+01 8.900e+01 9.766e+01 1.297e+02, threshold=1.780e+02, percent-clipped=0.0 2023-11-20 03:20:45,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=919786.6666666666, ans=0.1 2023-11-20 03:21:15,173 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 138000 2023-11-20 03:21:28,300 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 5750, loss[loss=0.05034, simple_loss=0.05973, pruned_loss=0.011, audio_tagging_loss=0.009478, over 15021.00 frames. ], tot_loss[loss=0.08201, simple_loss=0.1013, pruned_loss=0.02106, audio_tagging_loss=0.0103, over 3034328.11 frames. ], batch size: 59, lr: 5.77e-03, grad_scale: 32.0 2023-11-20 03:21:36,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=920053.3333333334, ans=0.035 2023-11-20 03:21:55,791 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.67 vs. limit=22.5 2023-11-20 03:22:11,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=920253.3333333334, ans=0.0 2023-11-20 03:22:17,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=920253.3333333334, ans=0.125 2023-11-20 03:22:19,734 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 138050 2023-11-20 03:22:29,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=920320.0, ans=0.0 2023-11-20 03:22:31,822 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 5800, loss[loss=0.1057, simple_loss=0.1424, pruned_loss=0.02444, audio_tagging_loss=0.01006, over 15267.00 frames. ], tot_loss[loss=0.0825, simple_loss=0.1022, pruned_loss=0.02121, audio_tagging_loss=0.01022, over 3034433.67 frames. ], batch size: 54, lr: 5.77e-03, grad_scale: 32.0 2023-11-20 03:22:34,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=920386.6666666666, ans=0.1 2023-11-20 03:22:52,121 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.779e+01 8.315e+01 8.891e+01 9.653e+01 1.172e+02, threshold=1.778e+02, percent-clipped=0.0 2023-11-20 03:23:05,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=920520.0, ans=0.125 2023-11-20 03:23:05,411 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.74 vs. limit=15.0 2023-11-20 03:23:17,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=920586.6666666666, ans=0.0 2023-11-20 03:23:23,074 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 138100 2023-11-20 03:23:24,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=920653.3333333334, ans=0.125 2023-11-20 03:23:30,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=920653.3333333334, ans=0.2 2023-11-20 03:23:36,491 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 5850, loss[loss=0.06421, simple_loss=0.0773, pruned_loss=0.01603, audio_tagging_loss=0.009533, over 14870.00 frames. ], tot_loss[loss=0.0827, simple_loss=0.1026, pruned_loss=0.02126, audio_tagging_loss=0.01016, over 3039059.89 frames. ], batch size: 58, lr: 5.77e-03, grad_scale: 32.0 2023-11-20 03:24:00,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=920853.3333333334, ans=0.125 2023-11-20 03:24:10,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=920853.3333333334, ans=0.125 2023-11-20 03:24:27,829 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 138150 2023-11-20 03:24:29,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=920986.6666666666, ans=10.0 2023-11-20 03:24:39,931 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 5900, loss[loss=0.06307, simple_loss=0.07967, pruned_loss=0.01394, audio_tagging_loss=0.009302, over 14458.00 frames. ], tot_loss[loss=0.08347, simple_loss=0.1034, pruned_loss=0.02168, audio_tagging_loss=0.0101, over 3038377.75 frames. ], batch size: 55, lr: 5.77e-03, grad_scale: 16.0 2023-11-20 03:24:59,263 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.31 vs. limit=6.0 2023-11-20 03:24:59,804 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.588e+01 8.195e+01 8.943e+01 1.006e+02 1.652e+02, threshold=1.789e+02, percent-clipped=0.0 2023-11-20 03:25:08,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=921186.6666666666, ans=0.125 2023-11-20 03:25:31,468 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 138200 2023-11-20 03:25:35,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=921320.0, ans=0.0 2023-11-20 03:25:43,831 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 5950, loss[loss=0.077, simple_loss=0.1043, pruned_loss=0.01818, audio_tagging_loss=0.006681, over 15607.00 frames. ], tot_loss[loss=0.08338, simple_loss=0.1035, pruned_loss=0.02155, audio_tagging_loss=0.01008, over 3041481.83 frames. ], batch size: 57, lr: 5.77e-03, grad_scale: 16.0 2023-11-20 03:26:01,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=921453.3333333334, ans=0.125 2023-11-20 03:26:03,801 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.54 vs. limit=22.5 2023-11-20 03:26:34,121 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.54 vs. limit=15.0 2023-11-20 03:26:34,887 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 138250 2023-11-20 03:26:35,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=921653.3333333334, ans=0.125 2023-11-20 03:26:47,498 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 6000, loss[loss=0.06168, simple_loss=0.07579, pruned_loss=0.01393, audio_tagging_loss=0.009865, over 15883.00 frames. ], tot_loss[loss=0.08283, simple_loss=0.1029, pruned_loss=0.02136, audio_tagging_loss=0.01003, over 3044556.17 frames. ], batch size: 59, lr: 5.77e-03, grad_scale: 32.0 2023-11-20 03:26:47,499 INFO [train_asr.py:1285] (1/4) Computing validation loss 2023-11-20 03:27:26,667 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.1973, 3.5140, 5.0568, 4.8093], device='cuda:1') 2023-11-20 03:27:28,649 INFO [train_asr.py:1294] (1/4) Epoch 12, validation: loss=0.06387, simple_loss=0.05435, pruned_loss=0.006012, audio_tagging_loss=0.03068, over 4681554.00 frames. 2023-11-20 03:27:28,650 INFO [train_asr.py:1295] (1/4) Maximum memory allocated so far is 26082MB 2023-11-20 03:27:36,068 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 03:27:43,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn2.whiten.whitening_limit, batch_count=921786.6666666666, ans=22.5 2023-11-20 03:27:44,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=921786.6666666666, ans=0.0 2023-11-20 03:27:48,132 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.723e+01 8.205e+01 8.900e+01 1.006e+02 1.555e+02, threshold=1.780e+02, percent-clipped=0.0 2023-11-20 03:28:16,465 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 03:28:20,231 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 138300 2023-11-20 03:28:32,668 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 6050, loss[loss=0.06403, simple_loss=0.08448, pruned_loss=0.01284, audio_tagging_loss=0.008944, over 14555.00 frames. ], tot_loss[loss=0.08183, simple_loss=0.1018, pruned_loss=0.02087, audio_tagging_loss=0.01004, over 3046552.46 frames. ], batch size: 55, lr: 5.77e-03, grad_scale: 16.0 2023-11-20 03:28:43,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=922053.3333333334, ans=0.95 2023-11-20 03:28:45,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=922120.0, ans=0.125 2023-11-20 03:29:00,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=922186.6666666666, ans=0.0 2023-11-20 03:29:12,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=922253.3333333334, ans=0.1 2023-11-20 03:29:24,344 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 138350 2023-11-20 03:29:24,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=922320.0, ans=0.0 2023-11-20 03:29:37,754 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 6100, loss[loss=0.08803, simple_loss=0.1139, pruned_loss=0.02254, audio_tagging_loss=0.008533, over 14799.00 frames. ], tot_loss[loss=0.08194, simple_loss=0.1022, pruned_loss=0.02083, audio_tagging_loss=0.01003, over 3050524.08 frames. ], batch size: 56, lr: 5.76e-03, grad_scale: 16.0 2023-11-20 03:29:44,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=922386.6666666666, ans=0.09899494936611666 2023-11-20 03:30:00,020 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.566e+01 7.861e+01 8.501e+01 9.321e+01 2.317e+02, threshold=1.700e+02, percent-clipped=1.0 2023-11-20 03:30:30,146 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 138400 2023-11-20 03:30:37,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=922653.3333333334, ans=0.0 2023-11-20 03:30:39,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=922653.3333333334, ans=0.125 2023-11-20 03:30:43,081 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 6150, loss[loss=0.08322, simple_loss=0.1145, pruned_loss=0.01843, audio_tagging_loss=0.00754, over 14980.00 frames. ], tot_loss[loss=0.08229, simple_loss=0.1024, pruned_loss=0.021, audio_tagging_loss=0.01008, over 3048127.28 frames. ], batch size: 57, lr: 5.76e-03, grad_scale: 16.0 2023-11-20 03:30:47,401 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.98 vs. limit=15.0 2023-11-20 03:30:53,423 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.87 vs. limit=15.0 2023-11-20 03:30:57,007 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 03:30:57,320 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.69 vs. limit=15.0 2023-11-20 03:31:34,747 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 138450 2023-11-20 03:31:34,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=922986.6666666666, ans=0.1 2023-11-20 03:31:47,108 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 6200, loss[loss=0.08757, simple_loss=0.1135, pruned_loss=0.0215, audio_tagging_loss=0.009291, over 15579.00 frames. ], tot_loss[loss=0.08208, simple_loss=0.1021, pruned_loss=0.0209, audio_tagging_loss=0.01013, over 3046828.47 frames. ], batch size: 56, lr: 5.76e-03, grad_scale: 16.0 2023-11-20 03:31:47,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=923053.3333333334, ans=0.125 2023-11-20 03:31:53,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=923053.3333333334, ans=0.1 2023-11-20 03:32:09,313 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.126e+01 8.412e+01 9.324e+01 1.012e+02 1.364e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-20 03:32:10,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=923120.0, ans=0.125 2023-11-20 03:32:18,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=923186.6666666666, ans=0.0 2023-11-20 03:32:38,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=923320.0, ans=0.0 2023-11-20 03:32:39,135 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 138500 2023-11-20 03:32:51,995 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 6250, loss[loss=0.08492, simple_loss=0.09874, pruned_loss=0.01865, audio_tagging_loss=0.0169, over 15689.00 frames. ], tot_loss[loss=0.08228, simple_loss=0.1022, pruned_loss=0.02095, audio_tagging_loss=0.0102, over 3053281.38 frames. ], batch size: 60, lr: 5.76e-03, grad_scale: 16.0 2023-11-20 03:33:24,877 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.14 vs. limit=12.0 2023-11-20 03:33:30,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=923586.6666666666, ans=0.125 2023-11-20 03:33:39,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=923586.6666666666, ans=0.125 2023-11-20 03:33:43,692 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 138550 2023-11-20 03:33:44,318 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.07 vs. limit=10.0 2023-11-20 03:33:45,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=923653.3333333334, ans=0.025 2023-11-20 03:33:52,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=923653.3333333334, ans=0.0 2023-11-20 03:33:55,695 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 6300, loss[loss=0.08757, simple_loss=0.1132, pruned_loss=0.02122, audio_tagging_loss=0.009746, over 16905.00 frames. ], tot_loss[loss=0.08263, simple_loss=0.1028, pruned_loss=0.02105, audio_tagging_loss=0.01019, over 3051146.55 frames. ], batch size: 66, lr: 5.76e-03, grad_scale: 16.0 2023-11-20 03:34:17,721 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.678e+01 8.557e+01 9.163e+01 1.006e+02 1.411e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-20 03:34:19,780 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.11 vs. limit=15.0 2023-11-20 03:34:24,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=923853.3333333334, ans=0.0 2023-11-20 03:34:24,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=923853.3333333334, ans=0.0 2023-11-20 03:34:33,438 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.13 vs. limit=22.5 2023-11-20 03:34:45,788 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.23 vs. limit=22.5 2023-11-20 03:34:46,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=923986.6666666666, ans=0.0 2023-11-20 03:34:48,348 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 138600 2023-11-20 03:34:55,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=923986.6666666666, ans=0.125 2023-11-20 03:34:56,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=923986.6666666666, ans=0.0 2023-11-20 03:34:59,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=923986.6666666666, ans=0.125 2023-11-20 03:35:01,679 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 6350, loss[loss=0.0952, simple_loss=0.1065, pruned_loss=0.02721, audio_tagging_loss=0.01474, over 14345.00 frames. ], tot_loss[loss=0.08294, simple_loss=0.1032, pruned_loss=0.02113, audio_tagging_loss=0.01018, over 3042585.80 frames. ], batch size: 55, lr: 5.76e-03, grad_scale: 16.0 2023-11-20 03:35:17,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=924120.0, ans=0.0 2023-11-20 03:35:38,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=924186.6666666666, ans=0.1 2023-11-20 03:35:53,789 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 138650 2023-11-20 03:36:02,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=924320.0, ans=0.0 2023-11-20 03:36:02,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=924320.0, ans=0.125 2023-11-20 03:36:03,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=924320.0, ans=0.125 2023-11-20 03:36:06,521 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 6400, loss[loss=0.0656, simple_loss=0.07852, pruned_loss=0.01408, audio_tagging_loss=0.01226, over 14299.00 frames. ], tot_loss[loss=0.08298, simple_loss=0.103, pruned_loss=0.02111, audio_tagging_loss=0.01037, over 3044864.84 frames. ], batch size: 55, lr: 5.76e-03, grad_scale: 32.0 2023-11-20 03:36:15,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=924386.6666666666, ans=0.125 2023-11-20 03:36:28,551 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.492e+01 8.195e+01 8.907e+01 9.891e+01 1.303e+02, threshold=1.781e+02, percent-clipped=0.0 2023-11-20 03:36:31,793 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.54 vs. limit=22.5 2023-11-20 03:36:44,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=924586.6666666666, ans=0.0 2023-11-20 03:36:56,587 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.90 vs. limit=15.0 2023-11-20 03:36:58,268 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 138700 2023-11-20 03:37:11,061 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 6450, loss[loss=0.0537, simple_loss=0.06315, pruned_loss=0.01125, audio_tagging_loss=0.01088, over 15559.00 frames. ], tot_loss[loss=0.083, simple_loss=0.103, pruned_loss=0.02105, audio_tagging_loss=0.01044, over 3045296.12 frames. ], batch size: 59, lr: 5.76e-03, grad_scale: 32.0 2023-11-20 03:37:33,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=924786.6666666666, ans=0.1 2023-11-20 03:37:33,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=924786.6666666666, ans=0.125 2023-11-20 03:37:35,041 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.49 vs. limit=22.5 2023-11-20 03:37:36,132 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.03 vs. limit=15.0 2023-11-20 03:37:58,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=924920.0, ans=0.1 2023-11-20 03:38:01,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=924986.6666666666, ans=0.1 2023-11-20 03:38:02,398 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 138750 2023-11-20 03:38:15,263 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 6500, loss[loss=0.07943, simple_loss=0.1033, pruned_loss=0.01734, audio_tagging_loss=0.01044, over 14626.00 frames. ], tot_loss[loss=0.08281, simple_loss=0.1029, pruned_loss=0.02093, audio_tagging_loss=0.01043, over 3041423.17 frames. ], batch size: 53, lr: 5.76e-03, grad_scale: 32.0 2023-11-20 03:38:18,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=925053.3333333334, ans=0.1 2023-11-20 03:38:37,925 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.023e+01 8.283e+01 9.037e+01 9.701e+01 1.555e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-20 03:38:56,356 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.75 vs. limit=10.0 2023-11-20 03:39:06,606 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 138800 2023-11-20 03:39:20,289 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 6550, loss[loss=0.07423, simple_loss=0.09738, pruned_loss=0.01824, audio_tagging_loss=0.007303, over 15571.00 frames. ], tot_loss[loss=0.08207, simple_loss=0.1021, pruned_loss=0.02073, audio_tagging_loss=0.01029, over 3050741.82 frames. ], batch size: 58, lr: 5.76e-03, grad_scale: 32.0 2023-11-20 03:39:54,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=925520.0, ans=0.2 2023-11-20 03:40:04,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=925586.6666666666, ans=0.0 2023-11-20 03:40:12,485 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 138850 2023-11-20 03:40:25,144 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 6600, loss[loss=0.08881, simple_loss=0.1169, pruned_loss=0.02288, audio_tagging_loss=0.007464, over 16127.00 frames. ], tot_loss[loss=0.08143, simple_loss=0.1016, pruned_loss=0.02045, audio_tagging_loss=0.01017, over 3043617.87 frames. ], batch size: 58, lr: 5.75e-03, grad_scale: 32.0 2023-11-20 03:40:46,612 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.736e+01 8.079e+01 8.710e+01 9.676e+01 1.211e+02, threshold=1.742e+02, percent-clipped=0.0 2023-11-20 03:40:50,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=925853.3333333334, ans=0.1 2023-11-20 03:41:00,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=925853.3333333334, ans=0.2 2023-11-20 03:41:01,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=925853.3333333334, ans=0.125 2023-11-20 03:41:17,197 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 138900 2023-11-20 03:41:25,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=925986.6666666666, ans=0.125 2023-11-20 03:41:29,872 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 6650, loss[loss=0.07051, simple_loss=0.08656, pruned_loss=0.01811, audio_tagging_loss=0.009126, over 15482.00 frames. ], tot_loss[loss=0.08188, simple_loss=0.1018, pruned_loss=0.02078, audio_tagging_loss=0.0102, over 3041183.41 frames. ], batch size: 59, lr: 5.75e-03, grad_scale: 32.0 2023-11-20 03:41:36,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=926053.3333333334, ans=0.125 2023-11-20 03:41:38,179 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.61 vs. limit=6.0 2023-11-20 03:41:56,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=926186.6666666666, ans=0.0 2023-11-20 03:42:15,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=926253.3333333334, ans=0.125 2023-11-20 03:42:19,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=926253.3333333334, ans=0.0 2023-11-20 03:42:21,829 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 138950 2023-11-20 03:42:23,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=926320.0, ans=0.125 2023-11-20 03:42:34,591 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 6700, loss[loss=0.06845, simple_loss=0.08564, pruned_loss=0.01602, audio_tagging_loss=0.009611, over 14531.00 frames. ], tot_loss[loss=0.08221, simple_loss=0.1024, pruned_loss=0.02098, audio_tagging_loss=0.01003, over 3049358.60 frames. ], batch size: 56, lr: 5.75e-03, grad_scale: 32.0 2023-11-20 03:42:36,078 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.65 vs. limit=22.5 2023-11-20 03:42:47,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=926453.3333333334, ans=0.125 2023-11-20 03:42:56,612 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.827e+01 8.182e+01 8.627e+01 9.432e+01 1.236e+02, threshold=1.725e+02, percent-clipped=0.0 2023-11-20 03:42:56,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=926453.3333333334, ans=0.1 2023-11-20 03:43:05,262 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.55 vs. limit=15.0 2023-11-20 03:43:14,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=926586.6666666666, ans=0.125 2023-11-20 03:43:20,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=926586.6666666666, ans=0.1 2023-11-20 03:43:26,322 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 139000 2023-11-20 03:43:27,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=926653.3333333334, ans=0.1 2023-11-20 03:43:30,759 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.03 vs. limit=10.0 2023-11-20 03:43:36,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=926653.3333333334, ans=0.025 2023-11-20 03:43:39,253 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 6750, loss[loss=0.07267, simple_loss=0.09637, pruned_loss=0.01499, audio_tagging_loss=0.00949, over 15441.00 frames. ], tot_loss[loss=0.08287, simple_loss=0.1035, pruned_loss=0.02121, audio_tagging_loss=0.009912, over 3042465.20 frames. ], batch size: 57, lr: 5.75e-03, grad_scale: 32.0 2023-11-20 03:44:30,336 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 139050 2023-11-20 03:44:30,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=926986.6666666666, ans=0.1 2023-11-20 03:44:37,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=926986.6666666666, ans=10.0 2023-11-20 03:44:42,802 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 6800, loss[loss=0.09325, simple_loss=0.1219, pruned_loss=0.02497, audio_tagging_loss=0.007332, over 15248.00 frames. ], tot_loss[loss=0.08255, simple_loss=0.1029, pruned_loss=0.02117, audio_tagging_loss=0.00992, over 3039287.33 frames. ], batch size: 55, lr: 5.75e-03, grad_scale: 32.0 2023-11-20 03:44:43,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=927053.3333333334, ans=0.1 2023-11-20 03:44:45,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=927053.3333333334, ans=0.125 2023-11-20 03:45:06,176 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.197e+01 8.226e+01 8.844e+01 9.966e+01 1.208e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-20 03:45:34,590 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 139100 2023-11-20 03:45:46,932 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 6850, loss[loss=0.06935, simple_loss=0.08322, pruned_loss=0.01593, audio_tagging_loss=0.01181, over 15781.00 frames. ], tot_loss[loss=0.08222, simple_loss=0.1025, pruned_loss=0.02101, audio_tagging_loss=0.009967, over 3038157.75 frames. ], batch size: 61, lr: 5.75e-03, grad_scale: 32.0 2023-11-20 03:45:49,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=927386.6666666666, ans=0.125 2023-11-20 03:46:11,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=927453.3333333334, ans=0.0 2023-11-20 03:46:34,445 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.18 vs. limit=12.0 2023-11-20 03:46:38,576 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 139150 2023-11-20 03:46:42,111 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.56 vs. limit=22.5 2023-11-20 03:46:52,492 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 6900, loss[loss=0.05589, simple_loss=0.06467, pruned_loss=0.01122, audio_tagging_loss=0.01234, over 15134.00 frames. ], tot_loss[loss=0.08197, simple_loss=0.1021, pruned_loss=0.02084, audio_tagging_loss=0.01005, over 3030834.66 frames. ], batch size: 58, lr: 5.75e-03, grad_scale: 32.0 2023-11-20 03:46:55,548 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.68 vs. limit=10.0 2023-11-20 03:47:16,185 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.998e+01 8.075e+01 8.808e+01 9.662e+01 1.235e+02, threshold=1.762e+02, percent-clipped=0.0 2023-11-20 03:47:42,935 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 03:47:44,312 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 139200 2023-11-20 03:47:49,027 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.00 vs. limit=15.0 2023-11-20 03:47:57,584 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 6950, loss[loss=0.07328, simple_loss=0.08206, pruned_loss=0.01945, audio_tagging_loss=0.0128, over 14693.00 frames. ], tot_loss[loss=0.08234, simple_loss=0.1025, pruned_loss=0.02098, audio_tagging_loss=0.0101, over 3036382.99 frames. ], batch size: 56, lr: 5.75e-03, grad_scale: 16.0 2023-11-20 03:48:13,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=928120.0, ans=0.125 2023-11-20 03:48:19,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=928120.0, ans=0.1 2023-11-20 03:48:36,890 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.87 vs. limit=22.5 2023-11-20 03:48:45,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=928253.3333333334, ans=0.0 2023-11-20 03:48:48,896 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 139250 2023-11-20 03:48:49,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=928320.0, ans=0.125 2023-11-20 03:49:01,007 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 7000, loss[loss=0.09513, simple_loss=0.1256, pruned_loss=0.02096, audio_tagging_loss=0.01137, over 15508.00 frames. ], tot_loss[loss=0.08225, simple_loss=0.1021, pruned_loss=0.02098, audio_tagging_loss=0.01022, over 3036830.11 frames. ], batch size: 57, lr: 5.75e-03, grad_scale: 16.0 2023-11-20 03:49:15,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=928453.3333333334, ans=0.2 2023-11-20 03:49:25,213 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.07 vs. limit=12.0 2023-11-20 03:49:25,769 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.837e+01 8.169e+01 8.824e+01 9.776e+01 1.242e+02, threshold=1.765e+02, percent-clipped=0.0 2023-11-20 03:49:49,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=928586.6666666666, ans=0.1 2023-11-20 03:49:52,148 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 139300 2023-11-20 03:49:58,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=928653.3333333334, ans=0.0 2023-11-20 03:50:05,512 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 7050, loss[loss=0.09383, simple_loss=0.1164, pruned_loss=0.02673, audio_tagging_loss=0.008921, over 14139.00 frames. ], tot_loss[loss=0.0821, simple_loss=0.1017, pruned_loss=0.02095, audio_tagging_loss=0.01029, over 3031173.93 frames. ], batch size: 52, lr: 5.75e-03, grad_scale: 16.0 2023-11-20 03:50:10,026 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.28 vs. limit=15.0 2023-11-20 03:50:15,507 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.16 vs. limit=10.0 2023-11-20 03:50:26,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=928786.6666666666, ans=0.0 2023-11-20 03:50:48,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=928920.0, ans=0.125 2023-11-20 03:50:57,662 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 139350 2023-11-20 03:51:10,701 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 7100, loss[loss=0.08226, simple_loss=0.1072, pruned_loss=0.01794, audio_tagging_loss=0.01073, over 15802.00 frames. ], tot_loss[loss=0.0822, simple_loss=0.1021, pruned_loss=0.02082, audio_tagging_loss=0.01031, over 3032243.22 frames. ], batch size: 57, lr: 5.74e-03, grad_scale: 16.0 2023-11-20 03:51:30,298 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.17 vs. limit=15.0 2023-11-20 03:51:34,670 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.602e+01 8.104e+01 8.912e+01 9.574e+01 1.346e+02, threshold=1.782e+02, percent-clipped=0.0 2023-11-20 03:52:03,501 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 139400 2023-11-20 03:52:13,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=929320.0, ans=0.0 2023-11-20 03:52:15,905 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 7150, loss[loss=0.08817, simple_loss=0.1115, pruned_loss=0.0209, audio_tagging_loss=0.01153, over 16213.00 frames. ], tot_loss[loss=0.08248, simple_loss=0.1024, pruned_loss=0.0209, audio_tagging_loss=0.01039, over 3039356.96 frames. ], batch size: 61, lr: 5.74e-03, grad_scale: 16.0 2023-11-20 03:52:22,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=929386.6666666666, ans=0.0 2023-11-20 03:52:30,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=929453.3333333334, ans=0.1 2023-11-20 03:52:41,819 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.35 vs. limit=15.0 2023-11-20 03:52:45,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=929520.0, ans=0.125 2023-11-20 03:53:02,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=929586.6666666666, ans=0.125 2023-11-20 03:53:07,758 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 139450 2023-11-20 03:53:20,607 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 7200, loss[loss=0.09025, simple_loss=0.1157, pruned_loss=0.02408, audio_tagging_loss=0.00833, over 14852.00 frames. ], tot_loss[loss=0.0828, simple_loss=0.1028, pruned_loss=0.02103, audio_tagging_loss=0.01036, over 3040845.24 frames. ], batch size: 54, lr: 5.74e-03, grad_scale: 32.0 2023-11-20 03:53:45,892 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.207e+01 8.353e+01 8.991e+01 9.790e+01 1.410e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-20 03:53:54,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=929853.3333333334, ans=0.125 2023-11-20 03:54:02,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=929920.0, ans=0.2 2023-11-20 03:54:13,124 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 139500 2023-11-20 03:54:25,348 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 7250, loss[loss=0.07939, simple_loss=0.1013, pruned_loss=0.01945, audio_tagging_loss=0.009268, over 14726.00 frames. ], tot_loss[loss=0.08336, simple_loss=0.1035, pruned_loss=0.0213, audio_tagging_loss=0.0103, over 3045295.90 frames. ], batch size: 56, lr: 5.74e-03, grad_scale: 16.0 2023-11-20 03:54:33,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=930053.3333333334, ans=0.0 2023-11-20 03:54:36,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=930053.3333333334, ans=0.125 2023-11-20 03:54:38,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=930120.0, ans=0.0 2023-11-20 03:54:44,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=930120.0, ans=0.1 2023-11-20 03:54:56,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=930186.6666666666, ans=0.2 2023-11-20 03:55:05,264 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 03:55:06,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=930253.3333333334, ans=0.0 2023-11-20 03:55:17,538 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 139550 2023-11-20 03:55:30,926 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 7300, loss[loss=0.0697, simple_loss=0.08779, pruned_loss=0.01529, audio_tagging_loss=0.01052, over 14356.00 frames. ], tot_loss[loss=0.08235, simple_loss=0.1023, pruned_loss=0.02096, audio_tagging_loss=0.01024, over 3042431.63 frames. ], batch size: 54, lr: 5.74e-03, grad_scale: 16.0 2023-11-20 03:55:36,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=930386.6666666666, ans=0.0 2023-11-20 03:55:39,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=930386.6666666666, ans=0.0 2023-11-20 03:55:46,418 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=930453.3333333334, ans=0.125 2023-11-20 03:55:56,587 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 5.831e+01 8.147e+01 8.750e+01 9.433e+01 1.159e+02, threshold=1.750e+02, percent-clipped=0.0 2023-11-20 03:56:14,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=930586.6666666666, ans=0.0 2023-11-20 03:56:14,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=930586.6666666666, ans=0.125 2023-11-20 03:56:16,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=930586.6666666666, ans=0.0 2023-11-20 03:56:22,015 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 139600 2023-11-20 03:56:35,317 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 7350, loss[loss=0.07459, simple_loss=0.09227, pruned_loss=0.01989, audio_tagging_loss=0.008572, over 15683.00 frames. ], tot_loss[loss=0.08209, simple_loss=0.1022, pruned_loss=0.02086, audio_tagging_loss=0.01012, over 3039193.56 frames. ], batch size: 60, lr: 5.74e-03, grad_scale: 16.0 2023-11-20 03:56:41,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=930720.0, ans=0.0 2023-11-20 03:56:41,350 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.63 vs. limit=10.0 2023-11-20 03:57:04,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=930853.3333333334, ans=0.125 2023-11-20 03:57:11,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=930853.3333333334, ans=0.125 2023-11-20 03:57:20,681 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.95 vs. limit=22.5 2023-11-20 03:57:25,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=930986.6666666666, ans=0.0 2023-11-20 03:57:26,968 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 139650 2023-11-20 03:57:29,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=930986.6666666666, ans=0.125 2023-11-20 03:57:39,582 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 7400, loss[loss=0.0839, simple_loss=0.1089, pruned_loss=0.01899, audio_tagging_loss=0.01048, over 14566.00 frames. ], tot_loss[loss=0.08123, simple_loss=0.1013, pruned_loss=0.02052, audio_tagging_loss=0.01006, over 3041696.09 frames. ], batch size: 56, lr: 5.74e-03, grad_scale: 16.0 2023-11-20 03:57:41,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=931053.3333333334, ans=0.125 2023-11-20 03:57:45,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=931053.3333333334, ans=0.1 2023-11-20 03:57:55,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=931120.0, ans=0.2 2023-11-20 03:57:58,633 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.75 vs. limit=10.0 2023-11-20 03:58:04,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=931186.6666666666, ans=0.1 2023-11-20 03:58:05,074 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.423e+01 7.809e+01 8.517e+01 9.487e+01 1.228e+02, threshold=1.703e+02, percent-clipped=0.0 2023-11-20 03:58:12,396 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.33 vs. limit=10.0 2023-11-20 03:58:30,974 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 139700 2023-11-20 03:58:38,868 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=931320.0, ans=0.1 2023-11-20 03:58:44,226 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 7450, loss[loss=0.07719, simple_loss=0.09642, pruned_loss=0.01816, audio_tagging_loss=0.01082, over 15807.00 frames. ], tot_loss[loss=0.08122, simple_loss=0.1013, pruned_loss=0.02058, audio_tagging_loss=0.009988, over 3041423.15 frames. ], batch size: 62, lr: 5.74e-03, grad_scale: 16.0 2023-11-20 03:58:45,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=931386.6666666666, ans=0.0 2023-11-20 03:58:54,787 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.84 vs. limit=15.0 2023-11-20 03:59:01,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=931453.3333333334, ans=0.0 2023-11-20 03:59:03,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=931453.3333333334, ans=0.1 2023-11-20 03:59:06,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=931453.3333333334, ans=0.1 2023-11-20 03:59:19,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=931520.0, ans=0.125 2023-11-20 03:59:21,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=931586.6666666666, ans=0.125 2023-11-20 03:59:35,535 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 139750 2023-11-20 03:59:35,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=931653.3333333334, ans=0.0 2023-11-20 03:59:40,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=931653.3333333334, ans=0.0 2023-11-20 03:59:47,754 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 7500, loss[loss=0.07976, simple_loss=0.09778, pruned_loss=0.01901, audio_tagging_loss=0.01187, over 15654.00 frames. ], tot_loss[loss=0.08205, simple_loss=0.1024, pruned_loss=0.02092, audio_tagging_loss=0.009929, over 3043797.79 frames. ], batch size: 61, lr: 5.74e-03, grad_scale: 16.0 2023-11-20 03:59:52,607 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.00 vs. limit=15.0 2023-11-20 04:00:00,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=931786.6666666666, ans=0.125 2023-11-20 04:00:13,507 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.250e+01 8.176e+01 8.772e+01 9.671e+01 2.176e+02, threshold=1.754e+02, percent-clipped=1.0 2023-11-20 04:00:18,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=931853.3333333334, ans=0.04949747468305833 2023-11-20 04:00:18,470 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.83 vs. limit=22.5 2023-11-20 04:00:23,465 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 04:00:31,299 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.16 vs. limit=15.0 2023-11-20 04:00:34,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=931920.0, ans=0.0 2023-11-20 04:00:39,794 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 139800 2023-11-20 04:00:47,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=931986.6666666666, ans=0.125 2023-11-20 04:00:52,953 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 7550, loss[loss=0.06741, simple_loss=0.08253, pruned_loss=0.01408, audio_tagging_loss=0.01207, over 14739.00 frames. ], tot_loss[loss=0.0811, simple_loss=0.1013, pruned_loss=0.02056, audio_tagging_loss=0.009908, over 3049737.64 frames. ], batch size: 57, lr: 5.73e-03, grad_scale: 16.0 2023-11-20 04:00:55,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=932053.3333333334, ans=0.0 2023-11-20 04:01:04,089 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.41 vs. limit=15.0 2023-11-20 04:01:08,969 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.36 vs. limit=15.0 2023-11-20 04:01:12,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=932120.0, ans=0.2 2023-11-20 04:01:19,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=932186.6666666666, ans=0.0 2023-11-20 04:01:22,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=932186.6666666666, ans=0.125 2023-11-20 04:01:23,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=932186.6666666666, ans=0.125 2023-11-20 04:01:44,682 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 139850 2023-11-20 04:01:55,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=932320.0, ans=0.0 2023-11-20 04:01:56,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=932386.6666666666, ans=0.125 2023-11-20 04:01:57,654 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 7600, loss[loss=0.08525, simple_loss=0.1098, pruned_loss=0.02175, audio_tagging_loss=0.008599, over 14484.00 frames. ], tot_loss[loss=0.08054, simple_loss=0.1005, pruned_loss=0.02031, audio_tagging_loss=0.009984, over 3047584.07 frames. ], batch size: 55, lr: 5.73e-03, grad_scale: 32.0 2023-11-20 04:02:04,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=932386.6666666666, ans=0.125 2023-11-20 04:02:23,911 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.765e+01 8.108e+01 8.753e+01 9.539e+01 1.294e+02, threshold=1.751e+02, percent-clipped=0.0 2023-11-20 04:02:25,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=932520.0, ans=10.0 2023-11-20 04:02:49,272 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 139900 2023-11-20 04:02:49,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=932653.3333333334, ans=0.125 2023-11-20 04:03:02,158 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 7650, loss[loss=0.08157, simple_loss=0.0902, pruned_loss=0.02073, audio_tagging_loss=0.01575, over 14471.00 frames. ], tot_loss[loss=0.08069, simple_loss=0.1005, pruned_loss=0.02038, audio_tagging_loss=0.01005, over 3041902.80 frames. ], batch size: 57, lr: 5.73e-03, grad_scale: 32.0 2023-11-20 04:03:08,006 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 04:03:10,858 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.42 vs. limit=12.0 2023-11-20 04:03:12,545 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.14 vs. limit=15.0 2023-11-20 04:03:15,141 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.51 vs. limit=22.5 2023-11-20 04:03:18,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=932786.6666666666, ans=22.5 2023-11-20 04:03:19,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=932786.6666666666, ans=0.2 2023-11-20 04:03:28,603 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.94 vs. limit=6.0 2023-11-20 04:03:34,095 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.62 vs. limit=22.5 2023-11-20 04:03:47,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=932920.0, ans=0.1 2023-11-20 04:03:48,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=932920.0, ans=0.0 2023-11-20 04:03:51,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=932920.0, ans=0.0 2023-11-20 04:03:53,574 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 139950 2023-11-20 04:04:01,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=932986.6666666666, ans=0.2 2023-11-20 04:04:03,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=932986.6666666666, ans=0.0 2023-11-20 04:04:07,049 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 7700, loss[loss=0.08609, simple_loss=0.1151, pruned_loss=0.0187, audio_tagging_loss=0.00983, over 15550.00 frames. ], tot_loss[loss=0.08069, simple_loss=0.101, pruned_loss=0.02029, audio_tagging_loss=0.0099, over 3039664.24 frames. ], batch size: 55, lr: 5.73e-03, grad_scale: 32.0 2023-11-20 04:04:07,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=933053.3333333334, ans=0.2 2023-11-20 04:04:32,002 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.410e+01 8.189e+01 8.877e+01 9.506e+01 1.213e+02, threshold=1.775e+02, percent-clipped=0.0 2023-11-20 04:04:37,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=933186.6666666666, ans=0.1 2023-11-20 04:04:45,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=933253.3333333334, ans=0.2 2023-11-20 04:04:58,495 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 140000 2023-11-20 04:05:15,107 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 7750, loss[loss=0.08628, simple_loss=0.09959, pruned_loss=0.02387, audio_tagging_loss=0.01261, over 16029.00 frames. ], tot_loss[loss=0.08165, simple_loss=0.102, pruned_loss=0.0206, audio_tagging_loss=0.01004, over 3040623.43 frames. ], batch size: 62, lr: 5.73e-03, grad_scale: 32.0 2023-11-20 04:05:21,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=933386.6666666666, ans=0.1 2023-11-20 04:05:41,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=933520.0, ans=0.1 2023-11-20 04:05:46,849 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.41 vs. limit=12.0 2023-11-20 04:05:55,205 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.70 vs. limit=12.0 2023-11-20 04:05:56,144 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 04:06:06,665 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 140050 2023-11-20 04:06:18,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=933720.0, ans=0.0 2023-11-20 04:06:19,739 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 7800, loss[loss=0.08993, simple_loss=0.1099, pruned_loss=0.02668, audio_tagging_loss=0.008328, over 14377.00 frames. ], tot_loss[loss=0.08181, simple_loss=0.1022, pruned_loss=0.02068, audio_tagging_loss=0.01003, over 3040355.28 frames. ], batch size: 56, lr: 5.73e-03, grad_scale: 16.0 2023-11-20 04:06:46,650 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.019e+01 8.169e+01 8.821e+01 9.790e+01 1.228e+02, threshold=1.764e+02, percent-clipped=0.0 2023-11-20 04:07:11,508 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 140100 2023-11-20 04:07:17,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=933986.6666666666, ans=0.125 2023-11-20 04:07:18,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=933986.6666666666, ans=0.0 2023-11-20 04:07:22,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=933986.6666666666, ans=0.0 2023-11-20 04:07:24,820 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 7850, loss[loss=0.07714, simple_loss=0.09494, pruned_loss=0.02008, audio_tagging_loss=0.009591, over 15284.00 frames. ], tot_loss[loss=0.08178, simple_loss=0.1021, pruned_loss=0.02067, audio_tagging_loss=0.01009, over 3041724.09 frames. ], batch size: 56, lr: 5.73e-03, grad_scale: 16.0 2023-11-20 04:07:25,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=934053.3333333334, ans=0.1 2023-11-20 04:07:27,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=934053.3333333334, ans=0.2 2023-11-20 04:07:35,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=934053.3333333334, ans=0.125 2023-11-20 04:07:36,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=934120.0, ans=0.0 2023-11-20 04:07:38,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=934120.0, ans=0.125 2023-11-20 04:07:52,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=934186.6666666666, ans=0.125 2023-11-20 04:07:56,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=934186.6666666666, ans=0.1 2023-11-20 04:07:56,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=934186.6666666666, ans=0.2 2023-11-20 04:08:00,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=934186.6666666666, ans=0.125 2023-11-20 04:08:16,401 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 140150 2023-11-20 04:08:26,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=934320.0, ans=0.125 2023-11-20 04:08:28,727 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 7900, loss[loss=0.08756, simple_loss=0.1083, pruned_loss=0.02429, audio_tagging_loss=0.009144, over 14090.00 frames. ], tot_loss[loss=0.08292, simple_loss=0.1033, pruned_loss=0.02105, audio_tagging_loss=0.01022, over 3043198.28 frames. ], batch size: 53, lr: 5.73e-03, grad_scale: 16.0 2023-11-20 04:08:56,541 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.338e+01 8.229e+01 9.116e+01 9.916e+01 1.318e+02, threshold=1.823e+02, percent-clipped=0.0 2023-11-20 04:09:15,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=934586.6666666666, ans=0.0 2023-11-20 04:09:21,233 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 140200 2023-11-20 04:09:32,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=934720.0, ans=0.0 2023-11-20 04:09:33,006 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.57 vs. limit=15.0 2023-11-20 04:09:33,523 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 7950, loss[loss=0.108, simple_loss=0.1393, pruned_loss=0.03058, audio_tagging_loss=0.007783, over 15671.00 frames. ], tot_loss[loss=0.08295, simple_loss=0.103, pruned_loss=0.02104, audio_tagging_loss=0.01039, over 3047594.82 frames. ], batch size: 56, lr: 5.73e-03, grad_scale: 16.0 2023-11-20 04:09:50,712 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 04:09:53,795 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.95 vs. limit=15.0 2023-11-20 04:10:00,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=934853.3333333334, ans=0.125 2023-11-20 04:10:24,397 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 140250 2023-11-20 04:10:38,657 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 8000, loss[loss=0.07867, simple_loss=0.09477, pruned_loss=0.02009, audio_tagging_loss=0.0112, over 15202.00 frames. ], tot_loss[loss=0.08118, simple_loss=0.1001, pruned_loss=0.0205, audio_tagging_loss=0.01063, over 3046805.96 frames. ], batch size: 57, lr: 5.73e-03, grad_scale: 32.0 2023-11-20 04:10:45,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=935053.3333333334, ans=0.125 2023-11-20 04:11:05,189 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.597e+01 8.205e+01 8.859e+01 1.003e+02 1.525e+02, threshold=1.772e+02, percent-clipped=0.0 2023-11-20 04:11:15,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=935253.3333333334, ans=0.125 2023-11-20 04:11:30,813 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 140300 2023-11-20 04:11:42,824 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 8050, loss[loss=0.08139, simple_loss=0.1105, pruned_loss=0.01843, audio_tagging_loss=0.007725, over 15663.00 frames. ], tot_loss[loss=0.08114, simple_loss=0.1001, pruned_loss=0.02047, audio_tagging_loss=0.01062, over 3046686.18 frames. ], batch size: 57, lr: 5.72e-03, grad_scale: 32.0 2023-11-20 04:11:48,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=935386.6666666666, ans=0.125 2023-11-20 04:12:33,875 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 140350 2023-11-20 04:12:46,661 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 8100, loss[loss=0.07883, simple_loss=0.0915, pruned_loss=0.01922, audio_tagging_loss=0.01386, over 15426.00 frames. ], tot_loss[loss=0.08139, simple_loss=0.101, pruned_loss=0.02049, audio_tagging_loss=0.01041, over 3042750.85 frames. ], batch size: 59, lr: 5.72e-03, grad_scale: 32.0 2023-11-20 04:12:56,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=935720.0, ans=0.125 2023-11-20 04:13:14,176 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.312e+01 8.397e+01 8.892e+01 9.736e+01 1.286e+02, threshold=1.778e+02, percent-clipped=0.0 2023-11-20 04:13:18,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=935853.3333333334, ans=0.0 2023-11-20 04:13:19,780 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.51 vs. limit=15.0 2023-11-20 04:13:28,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=935920.0, ans=0.1 2023-11-20 04:13:38,144 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 140400 2023-11-20 04:13:43,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=935986.6666666666, ans=0.1 2023-11-20 04:13:47,117 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.43 vs. limit=15.0 2023-11-20 04:13:51,060 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 8150, loss[loss=0.07315, simple_loss=0.0902, pruned_loss=0.01893, audio_tagging_loss=0.009117, over 16092.00 frames. ], tot_loss[loss=0.08115, simple_loss=0.1009, pruned_loss=0.02046, audio_tagging_loss=0.01024, over 3053191.76 frames. ], batch size: 62, lr: 5.72e-03, grad_scale: 32.0 2023-11-20 04:14:15,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=936120.0, ans=0.1 2023-11-20 04:14:43,644 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 140450 2023-11-20 04:14:56,556 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 8200, loss[loss=0.07355, simple_loss=0.08861, pruned_loss=0.01819, audio_tagging_loss=0.01106, over 14317.00 frames. ], tot_loss[loss=0.08117, simple_loss=0.1016, pruned_loss=0.02029, audio_tagging_loss=0.0101, over 3057989.55 frames. ], batch size: 56, lr: 5.72e-03, grad_scale: 32.0 2023-11-20 04:14:57,839 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 04:15:00,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=936386.6666666666, ans=0.2 2023-11-20 04:15:05,463 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.88 vs. limit=15.0 2023-11-20 04:15:23,168 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.922e+01 8.334e+01 8.915e+01 9.605e+01 1.213e+02, threshold=1.783e+02, percent-clipped=0.0 2023-11-20 04:15:28,063 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.02 vs. limit=15.0 2023-11-20 04:15:40,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=936586.6666666666, ans=0.2 2023-11-20 04:15:43,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=936586.6666666666, ans=0.07 2023-11-20 04:15:48,564 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 140500 2023-11-20 04:15:51,612 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.06 vs. limit=12.0 2023-11-20 04:15:53,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=936653.3333333334, ans=0.2 2023-11-20 04:15:59,699 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.92 vs. limit=15.0 2023-11-20 04:16:01,642 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 8250, loss[loss=0.09155, simple_loss=0.1283, pruned_loss=0.02067, audio_tagging_loss=0.00674, over 15744.00 frames. ], tot_loss[loss=0.08131, simple_loss=0.1016, pruned_loss=0.02048, audio_tagging_loss=0.01005, over 3053505.00 frames. ], batch size: 55, lr: 5.72e-03, grad_scale: 32.0 2023-11-20 04:16:08,661 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.31 vs. limit=22.5 2023-11-20 04:16:14,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=936786.6666666666, ans=0.09899494936611666 2023-11-20 04:16:24,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=936786.6666666666, ans=0.125 2023-11-20 04:16:34,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=936853.3333333334, ans=0.5 2023-11-20 04:16:35,875 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.20 vs. limit=15.0 2023-11-20 04:16:50,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=936920.0, ans=0.125 2023-11-20 04:16:52,923 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 140550 2023-11-20 04:17:05,471 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 8300, loss[loss=0.07305, simple_loss=0.09131, pruned_loss=0.01596, audio_tagging_loss=0.01144, over 15934.00 frames. ], tot_loss[loss=0.08099, simple_loss=0.1013, pruned_loss=0.02028, audio_tagging_loss=0.01004, over 3059390.00 frames. ], batch size: 60, lr: 5.72e-03, grad_scale: 16.0 2023-11-20 04:17:13,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=937053.3333333334, ans=0.125 2023-11-20 04:17:20,443 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.26 vs. limit=22.5 2023-11-20 04:17:33,792 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.558e+01 8.113e+01 8.924e+01 9.899e+01 1.160e+02, threshold=1.785e+02, percent-clipped=0.0 2023-11-20 04:17:48,790 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 04:17:54,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=937253.3333333334, ans=0.1 2023-11-20 04:17:56,380 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 140600 2023-11-20 04:17:57,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=937320.0, ans=0.125 2023-11-20 04:18:09,908 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 8350, loss[loss=0.06853, simple_loss=0.09125, pruned_loss=0.01255, audio_tagging_loss=0.01036, over 15372.00 frames. ], tot_loss[loss=0.08097, simple_loss=0.1014, pruned_loss=0.02023, audio_tagging_loss=0.01004, over 3054618.71 frames. ], batch size: 56, lr: 5.72e-03, grad_scale: 16.0 2023-11-20 04:18:12,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=937386.6666666666, ans=0.04949747468305833 2023-11-20 04:18:34,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=937520.0, ans=0.0 2023-11-20 04:18:34,228 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=937520.0, ans=0.125 2023-11-20 04:18:45,063 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.95 vs. limit=22.5 2023-11-20 04:19:02,273 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 140650 2023-11-20 04:19:03,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=937653.3333333334, ans=0.07 2023-11-20 04:19:15,093 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 8400, loss[loss=0.06408, simple_loss=0.08435, pruned_loss=0.01356, audio_tagging_loss=0.008342, over 15930.00 frames. ], tot_loss[loss=0.08098, simple_loss=0.1013, pruned_loss=0.02031, audio_tagging_loss=0.01001, over 3052524.35 frames. ], batch size: 61, lr: 5.72e-03, grad_scale: 32.0 2023-11-20 04:19:22,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=937720.0, ans=0.0 2023-11-20 04:19:23,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=937720.0, ans=0.125 2023-11-20 04:19:43,610 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.796e+01 7.950e+01 8.532e+01 9.547e+01 1.131e+02, threshold=1.706e+02, percent-clipped=0.0 2023-11-20 04:20:04,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=937920.0, ans=0.125 2023-11-20 04:20:05,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=937920.0, ans=0.1 2023-11-20 04:20:07,360 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 140700 2023-11-20 04:20:16,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=937986.6666666666, ans=0.125 2023-11-20 04:20:19,657 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 8450, loss[loss=0.0778, simple_loss=0.1014, pruned_loss=0.01631, audio_tagging_loss=0.01079, over 15578.00 frames. ], tot_loss[loss=0.08089, simple_loss=0.1008, pruned_loss=0.02037, audio_tagging_loss=0.01013, over 3050491.98 frames. ], batch size: 59, lr: 5.72e-03, grad_scale: 32.0 2023-11-20 04:20:25,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=938053.3333333334, ans=0.0 2023-11-20 04:20:32,673 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.78 vs. limit=15.0 2023-11-20 04:20:58,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=938253.3333333334, ans=0.125 2023-11-20 04:21:04,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=938253.3333333334, ans=0.0 2023-11-20 04:21:12,268 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 140750 2023-11-20 04:21:21,844 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.59 vs. limit=22.5 2023-11-20 04:21:24,928 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 8500, loss[loss=0.08304, simple_loss=0.108, pruned_loss=0.01802, audio_tagging_loss=0.01103, over 16022.00 frames. ], tot_loss[loss=0.08119, simple_loss=0.1012, pruned_loss=0.02052, audio_tagging_loss=0.0101, over 3057485.03 frames. ], batch size: 59, lr: 5.72e-03, grad_scale: 32.0 2023-11-20 04:21:28,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=938386.6666666666, ans=0.1 2023-11-20 04:21:28,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=938386.6666666666, ans=0.025 2023-11-20 04:21:44,052 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.52 vs. limit=15.0 2023-11-20 04:21:53,275 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.458e+01 8.139e+01 8.950e+01 9.720e+01 1.235e+02, threshold=1.790e+02, percent-clipped=0.0 2023-11-20 04:22:09,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=938586.6666666666, ans=0.0 2023-11-20 04:22:16,779 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 140800 2023-11-20 04:22:22,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=938653.3333333334, ans=0.0 2023-11-20 04:22:23,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=938653.3333333334, ans=0.95 2023-11-20 04:22:24,029 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.93 vs. limit=22.5 2023-11-20 04:22:30,070 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 8550, loss[loss=0.09938, simple_loss=0.1272, pruned_loss=0.02487, audio_tagging_loss=0.01089, over 14732.00 frames. ], tot_loss[loss=0.0814, simple_loss=0.1013, pruned_loss=0.02054, audio_tagging_loss=0.01022, over 3059434.71 frames. ], batch size: 54, lr: 5.71e-03, grad_scale: 32.0 2023-11-20 04:22:31,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=938720.0, ans=0.125 2023-11-20 04:22:31,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=938720.0, ans=0.2 2023-11-20 04:22:42,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=938786.6666666666, ans=0.0 2023-11-20 04:23:16,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=938920.0, ans=0.125 2023-11-20 04:23:19,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=938920.0, ans=0.1 2023-11-20 04:23:21,942 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 140850 2023-11-20 04:23:33,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=939053.3333333334, ans=0.2 2023-11-20 04:23:34,048 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 8600, loss[loss=0.09137, simple_loss=0.1178, pruned_loss=0.02237, audio_tagging_loss=0.0101, over 15611.00 frames. ], tot_loss[loss=0.08179, simple_loss=0.102, pruned_loss=0.02058, audio_tagging_loss=0.01021, over 3053206.59 frames. ], batch size: 57, lr: 5.71e-03, grad_scale: 16.0 2023-11-20 04:23:40,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=939053.3333333334, ans=0.1 2023-11-20 04:24:04,332 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.979e+01 8.151e+01 8.812e+01 9.579e+01 1.857e+02, threshold=1.762e+02, percent-clipped=1.0 2023-11-20 04:24:21,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=939253.3333333334, ans=0.0 2023-11-20 04:24:26,695 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 140900 2023-11-20 04:24:29,346 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=939320.0, ans=0.125 2023-11-20 04:24:39,296 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 8650, loss[loss=0.08588, simple_loss=0.09181, pruned_loss=0.02461, audio_tagging_loss=0.01537, over 14848.00 frames. ], tot_loss[loss=0.0816, simple_loss=0.1015, pruned_loss=0.0205, audio_tagging_loss=0.01035, over 3048674.54 frames. ], batch size: 57, lr: 5.71e-03, grad_scale: 16.0 2023-11-20 04:24:48,464 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.51 vs. limit=15.0 2023-11-20 04:25:07,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=939520.0, ans=0.125 2023-11-20 04:25:30,762 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 140950 2023-11-20 04:25:40,423 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.04 vs. limit=15.0 2023-11-20 04:25:43,347 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 8700, loss[loss=0.07671, simple_loss=0.08796, pruned_loss=0.01865, audio_tagging_loss=0.01409, over 15607.00 frames. ], tot_loss[loss=0.08193, simple_loss=0.1017, pruned_loss=0.0206, audio_tagging_loss=0.01047, over 3045653.93 frames. ], batch size: 61, lr: 5.71e-03, grad_scale: 16.0 2023-11-20 04:25:48,392 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.12 vs. limit=15.0 2023-11-20 04:25:54,439 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.33 vs. limit=22.5 2023-11-20 04:26:13,592 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.908e+01 8.317e+01 9.149e+01 9.990e+01 1.361e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-20 04:26:25,445 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.53 vs. limit=12.0 2023-11-20 04:26:32,178 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.26 vs. limit=22.5 2023-11-20 04:26:35,172 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 141000 2023-11-20 04:26:36,660 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=939986.6666666666, ans=0.125 2023-11-20 04:26:41,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=939986.6666666666, ans=0.09899494936611666 2023-11-20 04:26:48,337 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 8750, loss[loss=0.07609, simple_loss=0.09856, pruned_loss=0.01542, audio_tagging_loss=0.01139, over 15412.00 frames. ], tot_loss[loss=0.08198, simple_loss=0.1018, pruned_loss=0.02054, audio_tagging_loss=0.01054, over 3048385.13 frames. ], batch size: 57, lr: 5.71e-03, grad_scale: 16.0 2023-11-20 04:26:49,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=940053.3333333334, ans=0.0 2023-11-20 04:26:52,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=940053.3333333334, ans=0.125 2023-11-20 04:26:54,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=940053.3333333334, ans=0.05 2023-11-20 04:26:57,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=940053.3333333334, ans=0.125 2023-11-20 04:27:12,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=940120.0, ans=0.2 2023-11-20 04:27:40,288 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 141050 2023-11-20 04:27:40,919 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.81 vs. limit=15.0 2023-11-20 04:27:53,775 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 8800, loss[loss=0.09315, simple_loss=0.1118, pruned_loss=0.02444, audio_tagging_loss=0.01282, over 15135.00 frames. ], tot_loss[loss=0.08349, simple_loss=0.1039, pruned_loss=0.02108, audio_tagging_loss=0.01046, over 3039219.24 frames. ], batch size: 56, lr: 5.71e-03, grad_scale: 32.0 2023-11-20 04:28:16,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=940453.3333333334, ans=0.125 2023-11-20 04:28:23,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=940520.0, ans=0.2 2023-11-20 04:28:24,228 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.494e+01 8.276e+01 8.994e+01 9.890e+01 1.240e+02, threshold=1.799e+02, percent-clipped=0.0 2023-11-20 04:28:35,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=940586.6666666666, ans=0.125 2023-11-20 04:28:44,104 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.99 vs. limit=15.0 2023-11-20 04:28:45,009 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 141100 2023-11-20 04:28:45,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=940653.3333333334, ans=0.125 2023-11-20 04:28:58,009 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 8850, loss[loss=0.08382, simple_loss=0.1077, pruned_loss=0.02165, audio_tagging_loss=0.008323, over 14720.00 frames. ], tot_loss[loss=0.08393, simple_loss=0.1046, pruned_loss=0.02126, audio_tagging_loss=0.01039, over 3047702.86 frames. ], batch size: 54, lr: 5.71e-03, grad_scale: 16.0 2023-11-20 04:28:58,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=940720.0, ans=0.07 2023-11-20 04:29:10,833 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 04:29:11,394 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.15 vs. limit=10.0 2023-11-20 04:29:13,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=940786.6666666666, ans=0.0 2023-11-20 04:29:25,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=940853.3333333334, ans=0.1 2023-11-20 04:29:30,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=940853.3333333334, ans=0.1 2023-11-20 04:29:43,166 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.85 vs. limit=15.0 2023-11-20 04:29:49,272 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 141150 2023-11-20 04:29:54,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=940986.6666666666, ans=0.125 2023-11-20 04:29:58,383 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.06 vs. limit=22.5 2023-11-20 04:30:01,933 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 8900, loss[loss=0.1009, simple_loss=0.1364, pruned_loss=0.02456, audio_tagging_loss=0.008107, over 14864.00 frames. ], tot_loss[loss=0.08372, simple_loss=0.1045, pruned_loss=0.02123, audio_tagging_loss=0.01022, over 3048161.91 frames. ], batch size: 55, lr: 5.71e-03, grad_scale: 8.0 2023-11-20 04:30:12,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=941053.3333333334, ans=0.125 2023-11-20 04:30:13,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=941120.0, ans=0.1 2023-11-20 04:30:21,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=941120.0, ans=0.125 2023-11-20 04:30:34,477 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.851e+01 8.057e+01 8.835e+01 9.972e+01 1.678e+02, threshold=1.767e+02, percent-clipped=0.0 2023-11-20 04:30:42,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=941253.3333333334, ans=0.125 2023-11-20 04:30:52,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=941320.0, ans=0.125 2023-11-20 04:30:53,985 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 141200 2023-11-20 04:31:07,520 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 8950, loss[loss=0.09101, simple_loss=0.1303, pruned_loss=0.02032, audio_tagging_loss=0.005538, over 15657.00 frames. ], tot_loss[loss=0.08363, simple_loss=0.1049, pruned_loss=0.02126, audio_tagging_loss=0.009919, over 3049342.12 frames. ], batch size: 58, lr: 5.71e-03, grad_scale: 8.0 2023-11-20 04:31:07,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=941386.6666666666, ans=0.125 2023-11-20 04:31:09,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=941386.6666666666, ans=0.125 2023-11-20 04:31:20,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=941453.3333333334, ans=0.0 2023-11-20 04:31:24,506 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.41 vs. limit=22.5 2023-11-20 04:31:49,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=941586.6666666666, ans=0.5 2023-11-20 04:31:58,575 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 141250 2023-11-20 04:32:04,057 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.37 vs. limit=15.0 2023-11-20 04:32:10,725 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 9000, loss[loss=0.08544, simple_loss=0.1022, pruned_loss=0.02294, audio_tagging_loss=0.01141, over 15932.00 frames. ], tot_loss[loss=0.08299, simple_loss=0.1042, pruned_loss=0.02101, audio_tagging_loss=0.009905, over 3054017.46 frames. ], batch size: 60, lr: 5.71e-03, grad_scale: 8.0 2023-11-20 04:32:10,725 INFO [train_asr.py:1285] (1/4) Computing validation loss 2023-11-20 04:32:53,271 INFO [train_asr.py:1294] (1/4) Epoch 12, validation: loss=0.06397, simple_loss=0.05412, pruned_loss=0.005869, audio_tagging_loss=0.03104, over 4681554.00 frames. 2023-11-20 04:32:53,272 INFO [train_asr.py:1295] (1/4) Maximum memory allocated so far is 26082MB 2023-11-20 04:32:54,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=941720.0, ans=0.125 2023-11-20 04:33:04,520 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.06 vs. limit=10.0 2023-11-20 04:33:13,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=941786.6666666666, ans=0.125 2023-11-20 04:33:23,003 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.22 vs. limit=15.0 2023-11-20 04:33:25,865 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.562e+01 8.268e+01 8.688e+01 9.407e+01 1.162e+02, threshold=1.738e+02, percent-clipped=0.0 2023-11-20 04:33:41,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=941920.0, ans=0.0 2023-11-20 04:33:43,277 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.81 vs. limit=15.0 2023-11-20 04:33:45,369 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 141300 2023-11-20 04:33:53,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=941986.6666666666, ans=0.0 2023-11-20 04:33:55,605 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.64 vs. limit=10.0 2023-11-20 04:33:57,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=942053.3333333334, ans=0.1 2023-11-20 04:33:58,658 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 9050, loss[loss=0.08665, simple_loss=0.1182, pruned_loss=0.01936, audio_tagging_loss=0.008173, over 15299.00 frames. ], tot_loss[loss=0.08331, simple_loss=0.1045, pruned_loss=0.02118, audio_tagging_loss=0.009852, over 3046032.02 frames. ], batch size: 55, lr: 5.70e-03, grad_scale: 8.0 2023-11-20 04:34:01,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=942053.3333333334, ans=0.025 2023-11-20 04:34:03,165 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.33 vs. limit=10.0 2023-11-20 04:34:14,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=942120.0, ans=0.1 2023-11-20 04:34:25,865 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.15 vs. limit=12.0 2023-11-20 04:34:41,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=942253.3333333334, ans=0.125 2023-11-20 04:34:49,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=942320.0, ans=0.1 2023-11-20 04:34:50,678 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 141350 2023-11-20 04:34:50,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=942320.0, ans=0.0 2023-11-20 04:35:03,402 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 9100, loss[loss=0.07443, simple_loss=0.09084, pruned_loss=0.01865, audio_tagging_loss=0.01036, over 15407.00 frames. ], tot_loss[loss=0.08266, simple_loss=0.1035, pruned_loss=0.02101, audio_tagging_loss=0.009904, over 3040060.09 frames. ], batch size: 57, lr: 5.70e-03, grad_scale: 8.0 2023-11-20 04:35:07,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=942386.6666666666, ans=0.1 2023-11-20 04:35:33,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=942520.0, ans=0.125 2023-11-20 04:35:36,014 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.481e+01 8.086e+01 8.915e+01 9.526e+01 1.275e+02, threshold=1.783e+02, percent-clipped=0.0 2023-11-20 04:35:39,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=942520.0, ans=0.1 2023-11-20 04:35:40,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=942520.0, ans=0.125 2023-11-20 04:35:55,745 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 141400 2023-11-20 04:36:08,577 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 9150, loss[loss=0.06923, simple_loss=0.08535, pruned_loss=0.01523, audio_tagging_loss=0.01133, over 15599.00 frames. ], tot_loss[loss=0.0828, simple_loss=0.1037, pruned_loss=0.02107, audio_tagging_loss=0.009871, over 3042382.00 frames. ], batch size: 59, lr: 5.70e-03, grad_scale: 8.0 2023-11-20 04:36:11,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=942720.0, ans=0.1 2023-11-20 04:36:42,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=942853.3333333334, ans=0.125 2023-11-20 04:37:00,731 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 141450 2023-11-20 04:37:08,231 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.49 vs. limit=15.0 2023-11-20 04:37:10,396 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.92 vs. limit=12.0 2023-11-20 04:37:14,205 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 9200, loss[loss=0.06484, simple_loss=0.07147, pruned_loss=0.01599, audio_tagging_loss=0.01312, over 15051.00 frames. ], tot_loss[loss=0.08237, simple_loss=0.1029, pruned_loss=0.02095, audio_tagging_loss=0.009967, over 3046683.18 frames. ], batch size: 58, lr: 5.70e-03, grad_scale: 16.0 2023-11-20 04:37:24,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=943053.3333333334, ans=0.125 2023-11-20 04:37:25,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=943120.0, ans=0.125 2023-11-20 04:37:45,725 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.820e+01 8.352e+01 9.147e+01 9.950e+01 1.226e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-20 04:37:59,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=943253.3333333334, ans=0.125 2023-11-20 04:38:05,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=943320.0, ans=0.025 2023-11-20 04:38:06,665 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 141500 2023-11-20 04:38:11,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=943320.0, ans=0.125 2023-11-20 04:38:19,623 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 9250, loss[loss=0.06415, simple_loss=0.07144, pruned_loss=0.01495, audio_tagging_loss=0.01348, over 16409.00 frames. ], tot_loss[loss=0.08206, simple_loss=0.1022, pruned_loss=0.02096, audio_tagging_loss=0.01002, over 3047460.76 frames. ], batch size: 63, lr: 5.70e-03, grad_scale: 16.0 2023-11-20 04:38:28,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=943386.6666666666, ans=0.1 2023-11-20 04:38:41,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=943453.3333333334, ans=0.0 2023-11-20 04:38:52,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=943520.0, ans=0.1 2023-11-20 04:39:10,756 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.25 vs. limit=15.0 2023-11-20 04:39:11,432 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 141550 2023-11-20 04:39:20,799 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.21 vs. limit=22.5 2023-11-20 04:39:23,881 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 9300, loss[loss=0.09756, simple_loss=0.1247, pruned_loss=0.0266, audio_tagging_loss=0.008594, over 15618.00 frames. ], tot_loss[loss=0.08201, simple_loss=0.1022, pruned_loss=0.02088, audio_tagging_loss=0.01005, over 3047621.92 frames. ], batch size: 57, lr: 5.70e-03, grad_scale: 16.0 2023-11-20 04:39:31,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=943720.0, ans=0.0 2023-11-20 04:39:57,019 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.234e+01 8.210e+01 8.770e+01 9.348e+01 1.167e+02, threshold=1.754e+02, percent-clipped=0.0 2023-11-20 04:40:15,585 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 141600 2023-11-20 04:40:28,810 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 9350, loss[loss=0.07624, simple_loss=0.09783, pruned_loss=0.01795, audio_tagging_loss=0.009376, over 15703.00 frames. ], tot_loss[loss=0.08167, simple_loss=0.1018, pruned_loss=0.02068, audio_tagging_loss=0.01009, over 3047032.90 frames. ], batch size: 59, lr: 5.70e-03, grad_scale: 16.0 2023-11-20 04:40:33,841 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.65 vs. limit=15.0 2023-11-20 04:41:01,794 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.93 vs. limit=15.0 2023-11-20 04:41:04,184 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.15 vs. limit=15.0 2023-11-20 04:41:19,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=944320.0, ans=0.1 2023-11-20 04:41:21,518 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 141650 2023-11-20 04:41:32,057 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.81 vs. limit=15.0 2023-11-20 04:41:33,793 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 9400, loss[loss=0.07907, simple_loss=0.09811, pruned_loss=0.01813, audio_tagging_loss=0.01188, over 14955.00 frames. ], tot_loss[loss=0.08174, simple_loss=0.1017, pruned_loss=0.02072, audio_tagging_loss=0.01017, over 3049065.68 frames. ], batch size: 54, lr: 5.70e-03, grad_scale: 16.0 2023-11-20 04:41:47,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=944453.3333333334, ans=0.0 2023-11-20 04:41:59,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=944520.0, ans=0.125 2023-11-20 04:42:05,741 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.961e+01 8.348e+01 8.869e+01 9.935e+01 1.327e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-20 04:42:17,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=944586.6666666666, ans=0.125 2023-11-20 04:42:23,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=944586.6666666666, ans=0.0 2023-11-20 04:42:26,283 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 141700 2023-11-20 04:42:28,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=944653.3333333334, ans=0.125 2023-11-20 04:42:37,258 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 04:42:38,434 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 9450, loss[loss=0.07428, simple_loss=0.1014, pruned_loss=0.01618, audio_tagging_loss=0.007379, over 15684.00 frames. ], tot_loss[loss=0.08147, simple_loss=0.1016, pruned_loss=0.02049, audio_tagging_loss=0.01016, over 3052766.40 frames. ], batch size: 59, lr: 5.70e-03, grad_scale: 16.0 2023-11-20 04:43:20,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=944920.0, ans=0.1 2023-11-20 04:43:26,848 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 04:43:30,217 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 141750 2023-11-20 04:43:42,770 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 9500, loss[loss=0.0925, simple_loss=0.1127, pruned_loss=0.02418, audio_tagging_loss=0.01197, over 15793.00 frames. ], tot_loss[loss=0.08171, simple_loss=0.1017, pruned_loss=0.02049, audio_tagging_loss=0.01037, over 3051562.21 frames. ], batch size: 58, lr: 5.70e-03, grad_scale: 16.0 2023-11-20 04:43:56,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=945120.0, ans=0.125 2023-11-20 04:44:04,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=945120.0, ans=0.125 2023-11-20 04:44:06,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=945120.0, ans=0.035 2023-11-20 04:44:14,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=945186.6666666666, ans=0.09899494936611666 2023-11-20 04:44:15,579 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.845e+01 8.327e+01 9.041e+01 9.892e+01 1.668e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-20 04:44:35,057 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 141800 2023-11-20 04:44:48,689 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 9550, loss[loss=0.08334, simple_loss=0.1039, pruned_loss=0.02002, audio_tagging_loss=0.01136, over 15168.00 frames. ], tot_loss[loss=0.08316, simple_loss=0.1037, pruned_loss=0.02099, audio_tagging_loss=0.01032, over 3055912.31 frames. ], batch size: 56, lr: 5.69e-03, grad_scale: 16.0 2023-11-20 04:44:58,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=945386.6666666666, ans=0.2 2023-11-20 04:44:58,673 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.49 vs. limit=22.5 2023-11-20 04:44:59,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=945386.6666666666, ans=0.0 2023-11-20 04:45:10,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=945453.3333333334, ans=0.125 2023-11-20 04:45:21,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=945520.0, ans=0.125 2023-11-20 04:45:29,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=945586.6666666666, ans=0.1 2023-11-20 04:45:40,906 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 141850 2023-11-20 04:45:42,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=945653.3333333334, ans=0.0 2023-11-20 04:45:54,294 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 9600, loss[loss=0.07048, simple_loss=0.08315, pruned_loss=0.0193, audio_tagging_loss=0.009606, over 13473.00 frames. ], tot_loss[loss=0.08274, simple_loss=0.1031, pruned_loss=0.02083, audio_tagging_loss=0.01038, over 3048897.97 frames. ], batch size: 54, lr: 5.69e-03, grad_scale: 32.0 2023-11-20 04:46:07,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=945786.6666666666, ans=0.125 2023-11-20 04:46:26,484 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.916e+01 8.231e+01 8.901e+01 9.790e+01 1.400e+02, threshold=1.780e+02, percent-clipped=0.0 2023-11-20 04:46:39,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=945920.0, ans=0.1 2023-11-20 04:46:39,321 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.42 vs. limit=15.0 2023-11-20 04:46:46,226 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 141900 2023-11-20 04:46:50,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=945986.6666666666, ans=0.2 2023-11-20 04:46:58,315 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 9650, loss[loss=0.0713, simple_loss=0.08402, pruned_loss=0.0168, audio_tagging_loss=0.0125, over 13934.00 frames. ], tot_loss[loss=0.08252, simple_loss=0.1028, pruned_loss=0.02083, audio_tagging_loss=0.01027, over 3046541.18 frames. ], batch size: 54, lr: 5.69e-03, grad_scale: 16.0 2023-11-20 04:47:01,593 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=946053.3333333334, ans=0.125 2023-11-20 04:47:07,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=946053.3333333334, ans=0.125 2023-11-20 04:47:22,370 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.34 vs. limit=15.0 2023-11-20 04:47:32,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=946186.6666666666, ans=0.0 2023-11-20 04:47:50,343 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 141950 2023-11-20 04:47:57,696 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.16 vs. limit=10.0 2023-11-20 04:48:03,365 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 9700, loss[loss=0.0776, simple_loss=0.09534, pruned_loss=0.02022, audio_tagging_loss=0.009712, over 15710.00 frames. ], tot_loss[loss=0.08248, simple_loss=0.1028, pruned_loss=0.02094, audio_tagging_loss=0.01013, over 3046353.48 frames. ], batch size: 61, lr: 5.69e-03, grad_scale: 16.0 2023-11-20 04:48:03,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=946386.6666666666, ans=0.0 2023-11-20 04:48:04,178 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.91 vs. limit=15.0 2023-11-20 04:48:17,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=946453.3333333334, ans=0.125 2023-11-20 04:48:28,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=946520.0, ans=0.0 2023-11-20 04:48:35,111 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.12 vs. limit=15.0 2023-11-20 04:48:36,943 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.885e+01 8.155e+01 8.941e+01 9.505e+01 1.207e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-20 04:48:55,393 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 142000 2023-11-20 04:48:55,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=946653.3333333334, ans=0.1 2023-11-20 04:49:03,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=946653.3333333334, ans=0.2 2023-11-20 04:49:08,456 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 9750, loss[loss=0.0882, simple_loss=0.1063, pruned_loss=0.02517, audio_tagging_loss=0.00986, over 16011.00 frames. ], tot_loss[loss=0.0826, simple_loss=0.1032, pruned_loss=0.02109, audio_tagging_loss=0.009902, over 3043756.86 frames. ], batch size: 60, lr: 5.69e-03, grad_scale: 16.0 2023-11-20 04:49:11,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=946720.0, ans=0.04949747468305833 2023-11-20 04:49:21,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=946786.6666666666, ans=0.1 2023-11-20 04:49:22,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=946786.6666666666, ans=0.0 2023-11-20 04:50:00,577 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 142050 2023-11-20 04:50:06,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=946986.6666666666, ans=0.125 2023-11-20 04:50:09,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=946986.6666666666, ans=0.0 2023-11-20 04:50:12,900 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 9800, loss[loss=0.07838, simple_loss=0.09962, pruned_loss=0.01607, audio_tagging_loss=0.0125, over 15232.00 frames. ], tot_loss[loss=0.08214, simple_loss=0.1027, pruned_loss=0.02084, audio_tagging_loss=0.009945, over 3042011.20 frames. ], batch size: 56, lr: 5.69e-03, grad_scale: 16.0 2023-11-20 04:50:18,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=947053.3333333334, ans=0.125 2023-11-20 04:50:20,457 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.50 vs. limit=15.0 2023-11-20 04:50:33,354 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.93 vs. limit=15.0 2023-11-20 04:50:41,175 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.34 vs. limit=15.0 2023-11-20 04:50:47,051 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.09 vs. limit=10.0 2023-11-20 04:50:47,333 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.379e+01 8.314e+01 8.707e+01 9.702e+01 1.155e+02, threshold=1.741e+02, percent-clipped=0.0 2023-11-20 04:50:48,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=947186.6666666666, ans=0.0 2023-11-20 04:50:54,167 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.73 vs. limit=6.0 2023-11-20 04:50:55,612 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.68 vs. limit=15.0 2023-11-20 04:50:56,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=947253.3333333334, ans=0.2 2023-11-20 04:51:00,281 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.51 vs. limit=15.0 2023-11-20 04:51:04,546 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 142100 2023-11-20 04:51:11,766 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 04:51:17,961 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 9850, loss[loss=0.06911, simple_loss=0.09763, pruned_loss=0.0117, audio_tagging_loss=0.008587, over 15652.00 frames. ], tot_loss[loss=0.08233, simple_loss=0.1033, pruned_loss=0.02083, audio_tagging_loss=0.009844, over 3047160.02 frames. ], batch size: 58, lr: 5.69e-03, grad_scale: 16.0 2023-11-20 04:51:20,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=947386.6666666666, ans=0.0 2023-11-20 04:51:25,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=947386.6666666666, ans=0.125 2023-11-20 04:51:30,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=947453.3333333334, ans=0.04949747468305833 2023-11-20 04:51:45,618 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.06 vs. limit=12.0 2023-11-20 04:51:46,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=947520.0, ans=0.0 2023-11-20 04:51:53,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=947520.0, ans=0.1 2023-11-20 04:52:01,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=947586.6666666666, ans=0.0 2023-11-20 04:52:07,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=947586.6666666666, ans=0.025 2023-11-20 04:52:09,662 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 142150 2023-11-20 04:52:16,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=947653.3333333334, ans=0.0 2023-11-20 04:52:22,413 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 9900, loss[loss=0.06553, simple_loss=0.08548, pruned_loss=0.01412, audio_tagging_loss=0.008668, over 15893.00 frames. ], tot_loss[loss=0.08301, simple_loss=0.1043, pruned_loss=0.02105, audio_tagging_loss=0.009792, over 3044628.93 frames. ], batch size: 59, lr: 5.69e-03, grad_scale: 16.0 2023-11-20 04:52:30,862 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.43 vs. limit=15.0 2023-11-20 04:52:37,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=947786.6666666666, ans=0.125 2023-11-20 04:52:39,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=947786.6666666666, ans=0.125 2023-11-20 04:52:50,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=947853.3333333334, ans=0.125 2023-11-20 04:52:56,049 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.295e+01 8.098e+01 8.892e+01 9.710e+01 1.368e+02, threshold=1.778e+02, percent-clipped=0.0 2023-11-20 04:52:57,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=947853.3333333334, ans=0.0 2023-11-20 04:53:14,536 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 142200 2023-11-20 04:53:27,295 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 9950, loss[loss=0.06254, simple_loss=0.07885, pruned_loss=0.01125, audio_tagging_loss=0.01186, over 15301.00 frames. ], tot_loss[loss=0.08288, simple_loss=0.1043, pruned_loss=0.02094, audio_tagging_loss=0.009807, over 3050420.42 frames. ], batch size: 57, lr: 5.69e-03, grad_scale: 16.0 2023-11-20 04:53:39,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=948120.0, ans=0.1 2023-11-20 04:53:53,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=948186.6666666666, ans=0.125 2023-11-20 04:54:09,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=948253.3333333334, ans=0.0 2023-11-20 04:54:18,902 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 142250 2023-11-20 04:54:31,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=948386.6666666666, ans=0.2 2023-11-20 04:54:32,598 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 10000, loss[loss=0.1055, simple_loss=0.1342, pruned_loss=0.02726, audio_tagging_loss=0.01113, over 16112.00 frames. ], tot_loss[loss=0.08272, simple_loss=0.1039, pruned_loss=0.0209, audio_tagging_loss=0.009879, over 3051394.44 frames. ], batch size: 59, lr: 5.69e-03, grad_scale: 32.0 2023-11-20 04:54:34,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=948386.6666666666, ans=0.09899494936611666 2023-11-20 04:54:57,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=948520.0, ans=0.125 2023-11-20 04:55:01,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=948520.0, ans=0.125 2023-11-20 04:55:05,757 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.530e+01 8.242e+01 9.183e+01 1.026e+02 1.433e+02, threshold=1.837e+02, percent-clipped=0.0 2023-11-20 04:55:11,418 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.53 vs. limit=15.0 2023-11-20 04:55:24,365 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 142300 2023-11-20 04:55:25,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=948653.3333333334, ans=0.2 2023-11-20 04:55:31,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=948653.3333333334, ans=0.1 2023-11-20 04:55:37,134 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 10050, loss[loss=0.06915, simple_loss=0.0821, pruned_loss=0.01936, audio_tagging_loss=0.008735, over 14266.00 frames. ], tot_loss[loss=0.08181, simple_loss=0.1023, pruned_loss=0.02056, audio_tagging_loss=0.01011, over 3042300.32 frames. ], batch size: 54, lr: 5.68e-03, grad_scale: 32.0 2023-11-20 04:55:39,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=948720.0, ans=0.0 2023-11-20 04:56:02,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=948853.3333333334, ans=0.125 2023-11-20 04:56:07,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=948853.3333333334, ans=0.2 2023-11-20 04:56:08,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=948853.3333333334, ans=0.125 2023-11-20 04:56:09,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=948853.3333333334, ans=10.0 2023-11-20 04:56:10,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=948853.3333333334, ans=0.2 2023-11-20 04:56:28,237 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 142350 2023-11-20 04:56:41,015 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 10100, loss[loss=0.1218, simple_loss=0.1426, pruned_loss=0.03749, audio_tagging_loss=0.01297, over 16420.00 frames. ], tot_loss[loss=0.08208, simple_loss=0.1026, pruned_loss=0.0207, audio_tagging_loss=0.01008, over 3044543.10 frames. ], batch size: 63, lr: 5.68e-03, grad_scale: 16.0 2023-11-20 04:56:48,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=949053.3333333334, ans=0.125 2023-11-20 04:56:54,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=949120.0, ans=0.125 2023-11-20 04:56:59,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=949120.0, ans=0.1 2023-11-20 04:57:12,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=949186.6666666666, ans=0.125 2023-11-20 04:57:16,168 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.858e+01 8.344e+01 8.796e+01 9.668e+01 1.145e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-20 04:57:22,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=949253.3333333334, ans=0.125 2023-11-20 04:57:32,844 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 04:57:32,886 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 142400 2023-11-20 04:57:46,700 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 10150, loss[loss=0.09005, simple_loss=0.1045, pruned_loss=0.02653, audio_tagging_loss=0.01128, over 13751.00 frames. ], tot_loss[loss=0.08204, simple_loss=0.1025, pruned_loss=0.02059, audio_tagging_loss=0.0102, over 3046890.76 frames. ], batch size: 54, lr: 5.68e-03, grad_scale: 16.0 2023-11-20 04:58:00,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=949453.3333333334, ans=0.125 2023-11-20 04:58:16,892 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 04:58:36,621 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.81 vs. limit=22.5 2023-11-20 04:58:38,516 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 142450 2023-11-20 04:58:44,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=949653.3333333334, ans=0.1 2023-11-20 04:58:51,108 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 10200, loss[loss=0.1267, simple_loss=0.1673, pruned_loss=0.03672, audio_tagging_loss=0.006366, over 15999.00 frames. ], tot_loss[loss=0.08258, simple_loss=0.1031, pruned_loss=0.02083, audio_tagging_loss=0.0102, over 3046437.22 frames. ], batch size: 56, lr: 5.68e-03, grad_scale: 16.0 2023-11-20 04:58:52,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=949720.0, ans=0.2 2023-11-20 04:59:01,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=949720.0, ans=0.025 2023-11-20 04:59:06,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=949786.6666666666, ans=0.0 2023-11-20 04:59:15,461 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 04:59:26,222 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.455e+01 8.468e+01 8.830e+01 9.466e+01 1.234e+02, threshold=1.766e+02, percent-clipped=0.0 2023-11-20 04:59:32,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=949920.0, ans=0.125 2023-11-20 04:59:33,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=949920.0, ans=0.1 2023-11-20 04:59:42,868 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 142500 2023-11-20 04:59:54,956 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 10250, loss[loss=0.05984, simple_loss=0.07322, pruned_loss=0.01157, audio_tagging_loss=0.01166, over 15399.00 frames. ], tot_loss[loss=0.08224, simple_loss=0.1024, pruned_loss=0.02079, audio_tagging_loss=0.01025, over 3042538.94 frames. ], batch size: 58, lr: 5.68e-03, grad_scale: 16.0 2023-11-20 04:59:56,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=950053.3333333334, ans=0.0 2023-11-20 05:00:03,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=950053.3333333334, ans=0.125 2023-11-20 05:00:05,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=950053.3333333334, ans=0.05 2023-11-20 05:00:08,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=950120.0, ans=0.125 2023-11-20 05:00:17,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=950120.0, ans=0.125 2023-11-20 05:00:40,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=950253.3333333334, ans=0.1 2023-11-20 05:00:46,645 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 142550 2023-11-20 05:00:59,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=950386.6666666666, ans=0.125 2023-11-20 05:01:00,107 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 10300, loss[loss=0.06321, simple_loss=0.07405, pruned_loss=0.01378, audio_tagging_loss=0.01241, over 15061.00 frames. ], tot_loss[loss=0.08137, simple_loss=0.1012, pruned_loss=0.02047, audio_tagging_loss=0.01028, over 3045809.45 frames. ], batch size: 58, lr: 5.68e-03, grad_scale: 16.0 2023-11-20 05:01:26,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=950520.0, ans=0.1 2023-11-20 05:01:34,322 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.811e+01 8.101e+01 8.687e+01 9.429e+01 1.201e+02, threshold=1.737e+02, percent-clipped=0.0 2023-11-20 05:01:44,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=950586.6666666666, ans=0.04949747468305833 2023-11-20 05:01:52,449 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 142600 2023-11-20 05:02:04,315 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.34 vs. limit=10.0 2023-11-20 05:02:04,840 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 10350, loss[loss=0.06608, simple_loss=0.07876, pruned_loss=0.01478, audio_tagging_loss=0.01192, over 14986.00 frames. ], tot_loss[loss=0.08213, simple_loss=0.1022, pruned_loss=0.02078, audio_tagging_loss=0.01025, over 3044324.48 frames. ], batch size: 58, lr: 5.68e-03, grad_scale: 16.0 2023-11-20 05:02:14,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=950720.0, ans=0.125 2023-11-20 05:02:17,416 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.22 vs. limit=15.0 2023-11-20 05:02:32,305 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.48 vs. limit=15.0 2023-11-20 05:02:57,440 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 142650 2023-11-20 05:03:00,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=950986.6666666666, ans=0.1 2023-11-20 05:03:01,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=950986.6666666666, ans=0.2 2023-11-20 05:03:09,695 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 10400, loss[loss=0.09247, simple_loss=0.1147, pruned_loss=0.02457, audio_tagging_loss=0.01055, over 15638.00 frames. ], tot_loss[loss=0.08126, simple_loss=0.101, pruned_loss=0.02037, audio_tagging_loss=0.01038, over 3047637.77 frames. ], batch size: 56, lr: 5.68e-03, grad_scale: 32.0 2023-11-20 05:03:09,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=951053.3333333334, ans=0.125 2023-11-20 05:03:45,418 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.768e+01 8.746e+01 9.185e+01 9.994e+01 1.378e+02, threshold=1.837e+02, percent-clipped=0.0 2023-11-20 05:04:01,611 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 142700 2023-11-20 05:04:05,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=951320.0, ans=0.07 2023-11-20 05:04:14,418 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 10450, loss[loss=0.06455, simple_loss=0.07893, pruned_loss=0.01447, audio_tagging_loss=0.01061, over 15368.00 frames. ], tot_loss[loss=0.08155, simple_loss=0.1016, pruned_loss=0.02041, audio_tagging_loss=0.01035, over 3049761.52 frames. ], batch size: 58, lr: 5.68e-03, grad_scale: 32.0 2023-11-20 05:04:20,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=951386.6666666666, ans=0.0 2023-11-20 05:05:01,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=951586.6666666666, ans=0.0 2023-11-20 05:05:06,119 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 142750 2023-11-20 05:05:06,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=951653.3333333334, ans=0.0 2023-11-20 05:05:18,691 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 10500, loss[loss=0.07126, simple_loss=0.08581, pruned_loss=0.01679, audio_tagging_loss=0.01157, over 15926.00 frames. ], tot_loss[loss=0.08168, simple_loss=0.1019, pruned_loss=0.02052, audio_tagging_loss=0.01019, over 3054918.21 frames. ], batch size: 63, lr: 5.68e-03, grad_scale: 32.0 2023-11-20 05:05:23,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=951720.0, ans=0.125 2023-11-20 05:05:52,966 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.818e+01 8.079e+01 8.976e+01 9.766e+01 1.332e+02, threshold=1.795e+02, percent-clipped=0.0 2023-11-20 05:05:58,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=951920.0, ans=0.0 2023-11-20 05:06:07,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=951920.0, ans=0.0 2023-11-20 05:06:10,122 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 142800 2023-11-20 05:06:14,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=951986.6666666666, ans=0.2 2023-11-20 05:06:17,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=951986.6666666666, ans=0.1 2023-11-20 05:06:22,640 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 10550, loss[loss=0.0765, simple_loss=0.08325, pruned_loss=0.02145, audio_tagging_loss=0.01343, over 16204.00 frames. ], tot_loss[loss=0.08198, simple_loss=0.1025, pruned_loss=0.02068, audio_tagging_loss=0.01004, over 3054245.45 frames. ], batch size: 61, lr: 5.67e-03, grad_scale: 32.0 2023-11-20 05:06:24,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=952053.3333333334, ans=0.0 2023-11-20 05:06:25,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=952053.3333333334, ans=0.0 2023-11-20 05:06:32,919 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.19 vs. limit=15.0 2023-11-20 05:06:46,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=952120.0, ans=0.125 2023-11-20 05:06:50,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=952186.6666666666, ans=0.125 2023-11-20 05:07:00,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=952253.3333333334, ans=0.125 2023-11-20 05:07:10,623 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 05:07:14,203 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 142850 2023-11-20 05:07:23,272 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.87 vs. limit=15.0 2023-11-20 05:07:25,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=952386.6666666666, ans=0.0 2023-11-20 05:07:26,291 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 10600, loss[loss=0.06432, simple_loss=0.07343, pruned_loss=0.01531, audio_tagging_loss=0.01229, over 14418.00 frames. ], tot_loss[loss=0.08138, simple_loss=0.1017, pruned_loss=0.02045, audio_tagging_loss=0.01006, over 3047773.88 frames. ], batch size: 54, lr: 5.67e-03, grad_scale: 32.0 2023-11-20 05:07:44,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=952453.3333333334, ans=0.125 2023-11-20 05:07:45,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=952453.3333333334, ans=0.125 2023-11-20 05:07:59,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=952520.0, ans=0.125 2023-11-20 05:08:01,772 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.830e+01 8.408e+01 9.197e+01 1.017e+02 1.438e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-20 05:08:10,186 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.12 vs. limit=6.0 2023-11-20 05:08:18,691 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 142900 2023-11-20 05:08:31,755 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 10650, loss[loss=0.08856, simple_loss=0.1167, pruned_loss=0.0199, audio_tagging_loss=0.0103, over 14928.00 frames. ], tot_loss[loss=0.08188, simple_loss=0.1022, pruned_loss=0.02078, audio_tagging_loss=0.009967, over 3050670.68 frames. ], batch size: 57, lr: 5.67e-03, grad_scale: 32.0 2023-11-20 05:08:49,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=952786.6666666666, ans=0.125 2023-11-20 05:09:08,575 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.59 vs. limit=15.0 2023-11-20 05:09:23,770 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 142950 2023-11-20 05:09:29,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=952986.6666666666, ans=0.0 2023-11-20 05:09:36,520 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 10700, loss[loss=0.05762, simple_loss=0.07263, pruned_loss=0.01076, audio_tagging_loss=0.01054, over 15017.00 frames. ], tot_loss[loss=0.08163, simple_loss=0.1021, pruned_loss=0.02059, audio_tagging_loss=0.01001, over 3052504.91 frames. ], batch size: 58, lr: 5.67e-03, grad_scale: 32.0 2023-11-20 05:09:57,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=953120.0, ans=0.125 2023-11-20 05:09:59,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=953120.0, ans=0.125 2023-11-20 05:10:02,817 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.14 vs. limit=15.0 2023-11-20 05:10:11,505 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.004e+01 8.248e+01 8.993e+01 9.641e+01 1.206e+02, threshold=1.799e+02, percent-clipped=0.0 2023-11-20 05:10:13,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=953186.6666666666, ans=0.0 2023-11-20 05:10:21,341 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.03 vs. limit=15.0 2023-11-20 05:10:28,377 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 143000 2023-11-20 05:10:31,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=953320.0, ans=0.1 2023-11-20 05:10:40,894 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 10750, loss[loss=0.08754, simple_loss=0.1139, pruned_loss=0.02386, audio_tagging_loss=0.006748, over 15366.00 frames. ], tot_loss[loss=0.08142, simple_loss=0.1018, pruned_loss=0.02053, audio_tagging_loss=0.009976, over 3055234.42 frames. ], batch size: 56, lr: 5.67e-03, grad_scale: 32.0 2023-11-20 05:10:50,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=953386.6666666666, ans=0.125 2023-11-20 05:10:50,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=953386.6666666666, ans=0.0 2023-11-20 05:11:10,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=953520.0, ans=0.125 2023-11-20 05:11:13,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=953520.0, ans=0.1 2023-11-20 05:11:16,119 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.53 vs. limit=22.5 2023-11-20 05:11:32,695 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 143050 2023-11-20 05:11:33,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=953653.3333333334, ans=0.0 2023-11-20 05:11:37,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=953653.3333333334, ans=0.0 2023-11-20 05:11:45,904 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 10800, loss[loss=0.08676, simple_loss=0.1083, pruned_loss=0.02371, audio_tagging_loss=0.008914, over 15120.00 frames. ], tot_loss[loss=0.08122, simple_loss=0.1016, pruned_loss=0.02049, audio_tagging_loss=0.00993, over 3055693.70 frames. ], batch size: 55, lr: 5.67e-03, grad_scale: 32.0 2023-11-20 05:12:09,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=953786.6666666666, ans=0.0 2023-11-20 05:12:13,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=953853.3333333334, ans=0.125 2023-11-20 05:12:20,204 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.594e+01 7.886e+01 8.544e+01 9.371e+01 1.667e+02, threshold=1.709e+02, percent-clipped=0.0 2023-11-20 05:12:35,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=953920.0, ans=0.125 2023-11-20 05:12:37,707 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 143100 2023-11-20 05:12:50,279 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 10850, loss[loss=0.09199, simple_loss=0.1198, pruned_loss=0.02481, audio_tagging_loss=0.007272, over 14938.00 frames. ], tot_loss[loss=0.08093, simple_loss=0.101, pruned_loss=0.02046, audio_tagging_loss=0.00996, over 3043032.14 frames. ], batch size: 55, lr: 5.67e-03, grad_scale: 16.0 2023-11-20 05:13:30,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=954253.3333333334, ans=0.04949747468305833 2023-11-20 05:13:41,485 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 143150 2023-11-20 05:13:50,151 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 05:13:53,680 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 10900, loss[loss=0.09058, simple_loss=0.1109, pruned_loss=0.02354, audio_tagging_loss=0.0116, over 15048.00 frames. ], tot_loss[loss=0.08166, simple_loss=0.102, pruned_loss=0.02066, audio_tagging_loss=0.01, over 3048394.43 frames. ], batch size: 56, lr: 5.67e-03, grad_scale: 16.0 2023-11-20 05:14:07,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=954453.3333333334, ans=0.125 2023-11-20 05:14:10,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=954453.3333333334, ans=0.125 2023-11-20 05:14:15,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=954453.3333333334, ans=0.1 2023-11-20 05:14:29,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=954520.0, ans=0.0 2023-11-20 05:14:30,502 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.839e+01 8.320e+01 9.175e+01 1.028e+02 1.481e+02, threshold=1.835e+02, percent-clipped=0.0 2023-11-20 05:14:33,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=954586.6666666666, ans=0.2 2023-11-20 05:14:38,947 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.19 vs. limit=15.0 2023-11-20 05:14:43,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=954586.6666666666, ans=0.125 2023-11-20 05:14:45,542 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 143200 2023-11-20 05:14:47,364 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.25 vs. limit=15.0 2023-11-20 05:14:59,423 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 10950, loss[loss=0.09003, simple_loss=0.119, pruned_loss=0.02002, audio_tagging_loss=0.0105, over 15705.00 frames. ], tot_loss[loss=0.08132, simple_loss=0.1014, pruned_loss=0.02053, audio_tagging_loss=0.01009, over 3050257.03 frames. ], batch size: 59, lr: 5.67e-03, grad_scale: 16.0 2023-11-20 05:14:59,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=954720.0, ans=0.1 2023-11-20 05:15:00,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=954720.0, ans=0.125 2023-11-20 05:15:07,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=954720.0, ans=0.0 2023-11-20 05:15:31,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=954853.3333333334, ans=0.2 2023-11-20 05:15:48,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=954920.0, ans=0.0 2023-11-20 05:15:51,648 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 143250 2023-11-20 05:16:03,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=955053.3333333334, ans=0.2 2023-11-20 05:16:04,523 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 11000, loss[loss=0.07685, simple_loss=0.09291, pruned_loss=0.02094, audio_tagging_loss=0.009455, over 15926.00 frames. ], tot_loss[loss=0.08159, simple_loss=0.1019, pruned_loss=0.0206, audio_tagging_loss=0.01003, over 3050356.72 frames. ], batch size: 58, lr: 5.67e-03, grad_scale: 16.0 2023-11-20 05:16:14,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=955053.3333333334, ans=0.0 2023-11-20 05:16:14,977 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 05:16:16,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=955120.0, ans=0.2 2023-11-20 05:16:18,053 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.14 vs. limit=15.0 2023-11-20 05:16:19,368 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.10 vs. limit=12.0 2023-11-20 05:16:26,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=955120.0, ans=0.04949747468305833 2023-11-20 05:16:27,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=955120.0, ans=0.125 2023-11-20 05:16:35,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=955186.6666666666, ans=0.125 2023-11-20 05:16:40,608 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.137e+01 8.224e+01 8.941e+01 1.006e+02 1.362e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-20 05:16:56,694 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 143300 2023-11-20 05:17:06,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=955320.0, ans=0.2 2023-11-20 05:17:08,880 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 11050, loss[loss=0.08512, simple_loss=0.1032, pruned_loss=0.0226, audio_tagging_loss=0.01094, over 16133.00 frames. ], tot_loss[loss=0.08147, simple_loss=0.1013, pruned_loss=0.02052, audio_tagging_loss=0.01027, over 3050060.96 frames. ], batch size: 59, lr: 5.66e-03, grad_scale: 16.0 2023-11-20 05:17:15,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=955386.6666666666, ans=0.95 2023-11-20 05:17:36,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=955520.0, ans=0.0 2023-11-20 05:18:00,884 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 143350 2023-11-20 05:18:07,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=955653.3333333334, ans=0.1 2023-11-20 05:18:08,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=955653.3333333334, ans=0.04949747468305833 2023-11-20 05:18:14,380 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 11100, loss[loss=0.0829, simple_loss=0.1012, pruned_loss=0.02164, audio_tagging_loss=0.01066, over 14805.00 frames. ], tot_loss[loss=0.08183, simple_loss=0.1016, pruned_loss=0.02062, audio_tagging_loss=0.0104, over 3046819.86 frames. ], batch size: 57, lr: 5.66e-03, grad_scale: 16.0 2023-11-20 05:18:17,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=955720.0, ans=0.2 2023-11-20 05:18:30,561 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.55 vs. limit=22.5 2023-11-20 05:18:37,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=955786.6666666666, ans=0.0 2023-11-20 05:18:39,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=955853.3333333334, ans=0.125 2023-11-20 05:18:41,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=955853.3333333334, ans=0.125 2023-11-20 05:18:42,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=955853.3333333334, ans=0.125 2023-11-20 05:18:49,756 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.985e+01 8.290e+01 8.982e+01 9.769e+01 2.008e+02, threshold=1.796e+02, percent-clipped=1.0 2023-11-20 05:18:51,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=955920.0, ans=10.0 2023-11-20 05:18:54,232 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.59 vs. limit=22.5 2023-11-20 05:19:05,654 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 143400 2023-11-20 05:19:05,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=955986.6666666666, ans=0.125 2023-11-20 05:19:06,147 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.56 vs. limit=12.0 2023-11-20 05:19:10,445 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.73 vs. limit=6.0 2023-11-20 05:19:18,867 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 11150, loss[loss=0.1059, simple_loss=0.1319, pruned_loss=0.0329, audio_tagging_loss=0.007043, over 14965.00 frames. ], tot_loss[loss=0.08307, simple_loss=0.1032, pruned_loss=0.02104, audio_tagging_loss=0.01042, over 3043511.94 frames. ], batch size: 56, lr: 5.66e-03, grad_scale: 16.0 2023-11-20 05:19:19,477 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.81 vs. limit=15.0 2023-11-20 05:19:20,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=956053.3333333334, ans=0.125 2023-11-20 05:19:40,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_na.min_abs, batch_count=956120.0, ans=0.02 2023-11-20 05:19:52,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=956186.6666666666, ans=0.1 2023-11-20 05:20:06,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=956253.3333333334, ans=0.1 2023-11-20 05:20:10,316 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 143450 2023-11-20 05:20:15,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=956320.0, ans=0.5 2023-11-20 05:20:15,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=956320.0, ans=0.0 2023-11-20 05:20:17,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=956320.0, ans=0.0 2023-11-20 05:20:23,234 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 11200, loss[loss=0.07242, simple_loss=0.09723, pruned_loss=0.01586, audio_tagging_loss=0.007944, over 15522.00 frames. ], tot_loss[loss=0.08217, simple_loss=0.1019, pruned_loss=0.02069, audio_tagging_loss=0.01053, over 3050052.46 frames. ], batch size: 58, lr: 5.66e-03, grad_scale: 32.0 2023-11-20 05:20:59,540 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.186e+01 8.107e+01 8.782e+01 9.452e+01 1.606e+02, threshold=1.756e+02, percent-clipped=0.0 2023-11-20 05:21:14,927 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 143500 2023-11-20 05:21:15,428 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.93 vs. limit=15.0 2023-11-20 05:21:17,926 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.59 vs. limit=6.0 2023-11-20 05:21:25,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=956653.3333333334, ans=0.2 2023-11-20 05:21:27,547 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 11250, loss[loss=0.1073, simple_loss=0.1396, pruned_loss=0.03025, audio_tagging_loss=0.007233, over 14814.00 frames. ], tot_loss[loss=0.08246, simple_loss=0.1022, pruned_loss=0.02084, audio_tagging_loss=0.01049, over 3051271.45 frames. ], batch size: 55, lr: 5.66e-03, grad_scale: 32.0 2023-11-20 05:21:40,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=956786.6666666666, ans=0.0 2023-11-20 05:22:13,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=956920.0, ans=0.0 2023-11-20 05:22:19,680 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 143550 2023-11-20 05:22:20,435 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.83 vs. limit=15.0 2023-11-20 05:22:32,432 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 11300, loss[loss=0.06766, simple_loss=0.08978, pruned_loss=0.0112, audio_tagging_loss=0.01157, over 14289.00 frames. ], tot_loss[loss=0.08182, simple_loss=0.1017, pruned_loss=0.02062, audio_tagging_loss=0.01035, over 3046871.39 frames. ], batch size: 55, lr: 5.66e-03, grad_scale: 16.0 2023-11-20 05:22:34,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=957053.3333333334, ans=0.2 2023-11-20 05:22:54,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=957120.0, ans=0.07 2023-11-20 05:23:08,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=957186.6666666666, ans=0.125 2023-11-20 05:23:10,353 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.444e+01 8.144e+01 8.778e+01 9.554e+01 1.373e+02, threshold=1.756e+02, percent-clipped=0.0 2023-11-20 05:23:11,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=957253.3333333334, ans=0.125 2023-11-20 05:23:12,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=957253.3333333334, ans=0.2 2023-11-20 05:23:18,692 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.82 vs. limit=15.0 2023-11-20 05:23:24,898 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 143600 2023-11-20 05:23:37,248 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 11350, loss[loss=0.07559, simple_loss=0.1044, pruned_loss=0.0168, audio_tagging_loss=0.006584, over 14071.00 frames. ], tot_loss[loss=0.08169, simple_loss=0.1016, pruned_loss=0.02071, audio_tagging_loss=0.01019, over 3049187.01 frames. ], batch size: 52, lr: 5.66e-03, grad_scale: 16.0 2023-11-20 05:23:37,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=957386.6666666666, ans=0.125 2023-11-20 05:23:44,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=957386.6666666666, ans=0.1 2023-11-20 05:24:05,591 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.11 vs. limit=22.5 2023-11-20 05:24:06,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=957520.0, ans=0.0 2023-11-20 05:24:14,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=957520.0, ans=0.125 2023-11-20 05:24:26,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=957586.6666666666, ans=0.1 2023-11-20 05:24:29,269 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 143650 2023-11-20 05:24:36,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=957653.3333333334, ans=0.125 2023-11-20 05:24:38,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=957653.3333333334, ans=0.125 2023-11-20 05:24:42,579 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 11400, loss[loss=0.09739, simple_loss=0.1169, pruned_loss=0.02539, audio_tagging_loss=0.01357, over 15106.00 frames. ], tot_loss[loss=0.08254, simple_loss=0.103, pruned_loss=0.02097, audio_tagging_loss=0.01006, over 3053623.04 frames. ], batch size: 55, lr: 5.66e-03, grad_scale: 16.0 2023-11-20 05:25:11,499 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.64 vs. limit=15.0 2023-11-20 05:25:19,465 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.590e+01 8.041e+01 8.720e+01 9.566e+01 1.309e+02, threshold=1.744e+02, percent-clipped=0.0 2023-11-20 05:25:22,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=957920.0, ans=0.125 2023-11-20 05:25:22,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=957920.0, ans=0.2 2023-11-20 05:25:34,556 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 143700 2023-11-20 05:25:47,433 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 11450, loss[loss=0.08712, simple_loss=0.1035, pruned_loss=0.02533, audio_tagging_loss=0.01005, over 15686.00 frames. ], tot_loss[loss=0.08288, simple_loss=0.1034, pruned_loss=0.02112, audio_tagging_loss=0.01005, over 3055323.83 frames. ], batch size: 58, lr: 5.66e-03, grad_scale: 16.0 2023-11-20 05:25:53,316 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=1.287e-02 2023-11-20 05:26:28,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=958253.3333333334, ans=0.0 2023-11-20 05:26:28,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=958253.3333333334, ans=0.125 2023-11-20 05:26:38,716 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 143750 2023-11-20 05:26:48,479 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.70 vs. limit=22.5 2023-11-20 05:26:51,452 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 11500, loss[loss=0.06528, simple_loss=0.07375, pruned_loss=0.0162, audio_tagging_loss=0.0122, over 14833.00 frames. ], tot_loss[loss=0.08239, simple_loss=0.1031, pruned_loss=0.02088, audio_tagging_loss=0.009975, over 3058523.58 frames. ], batch size: 61, lr: 5.66e-03, grad_scale: 16.0 2023-11-20 05:27:29,101 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.118e+01 7.975e+01 8.821e+01 9.410e+01 1.240e+02, threshold=1.764e+02, percent-clipped=0.0 2023-11-20 05:27:33,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=958586.6666666666, ans=0.1 2023-11-20 05:27:41,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=958653.3333333334, ans=0.0 2023-11-20 05:27:42,799 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 143800 2023-11-20 05:27:55,520 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.66 vs. limit=22.5 2023-11-20 05:27:56,052 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 11550, loss[loss=0.09874, simple_loss=0.1331, pruned_loss=0.02478, audio_tagging_loss=0.007423, over 15666.00 frames. ], tot_loss[loss=0.08164, simple_loss=0.102, pruned_loss=0.02059, audio_tagging_loss=0.01007, over 3052970.99 frames. ], batch size: 59, lr: 5.65e-03, grad_scale: 16.0 2023-11-20 05:28:01,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=958720.0, ans=0.0 2023-11-20 05:28:16,228 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=958786.6666666666, ans=0.025 2023-11-20 05:28:33,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=958920.0, ans=0.025 2023-11-20 05:28:36,093 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 05:28:45,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=958920.0, ans=0.125 2023-11-20 05:28:48,368 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 143850 2023-11-20 05:29:01,023 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 11600, loss[loss=0.09378, simple_loss=0.1197, pruned_loss=0.02343, audio_tagging_loss=0.01048, over 15477.00 frames. ], tot_loss[loss=0.08213, simple_loss=0.1025, pruned_loss=0.02083, audio_tagging_loss=0.01007, over 3052295.60 frames. ], batch size: 56, lr: 5.65e-03, grad_scale: 32.0 2023-11-20 05:29:06,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=959053.3333333334, ans=0.0 2023-11-20 05:29:14,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=959120.0, ans=0.125 2023-11-20 05:29:27,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=959186.6666666666, ans=0.0 2023-11-20 05:29:29,358 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.36 vs. limit=15.0 2023-11-20 05:29:38,410 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.843e+01 8.207e+01 8.809e+01 9.519e+01 1.148e+02, threshold=1.762e+02, percent-clipped=0.0 2023-11-20 05:29:38,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=959253.3333333334, ans=0.125 2023-11-20 05:29:52,785 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 143900 2023-11-20 05:30:05,591 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 11650, loss[loss=0.08845, simple_loss=0.1187, pruned_loss=0.01998, audio_tagging_loss=0.009121, over 15576.00 frames. ], tot_loss[loss=0.08203, simple_loss=0.1023, pruned_loss=0.02077, audio_tagging_loss=0.0101, over 3048002.74 frames. ], batch size: 59, lr: 5.65e-03, grad_scale: 32.0 2023-11-20 05:30:23,341 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.37 vs. limit=22.5 2023-11-20 05:30:43,707 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.57 vs. limit=10.0 2023-11-20 05:30:51,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=959586.6666666666, ans=10.0 2023-11-20 05:30:57,354 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 143950 2023-11-20 05:31:02,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=959653.3333333334, ans=0.125 2023-11-20 05:31:09,579 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 11700, loss[loss=0.07365, simple_loss=0.09585, pruned_loss=0.01494, audio_tagging_loss=0.01078, over 14645.00 frames. ], tot_loss[loss=0.08225, simple_loss=0.1029, pruned_loss=0.02076, audio_tagging_loss=0.01006, over 3049958.43 frames. ], batch size: 54, lr: 5.65e-03, grad_scale: 32.0 2023-11-20 05:31:28,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=959786.6666666666, ans=0.125 2023-11-20 05:31:42,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=959853.3333333334, ans=0.125 2023-11-20 05:31:46,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=959853.3333333334, ans=0.1 2023-11-20 05:31:46,952 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.116e+01 8.169e+01 8.905e+01 9.543e+01 1.392e+02, threshold=1.781e+02, percent-clipped=0.0 2023-11-20 05:31:56,203 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.83 vs. limit=10.0 2023-11-20 05:32:00,440 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 144000 2023-11-20 05:32:17,063 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 11750, loss[loss=0.08325, simple_loss=0.09645, pruned_loss=0.02318, audio_tagging_loss=0.01184, over 14774.00 frames. ], tot_loss[loss=0.08204, simple_loss=0.1023, pruned_loss=0.02078, audio_tagging_loss=0.01013, over 3046618.75 frames. ], batch size: 55, lr: 5.65e-03, grad_scale: 32.0 2023-11-20 05:32:17,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=960053.3333333334, ans=0.0 2023-11-20 05:32:27,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=960053.3333333334, ans=0.125 2023-11-20 05:32:30,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=960120.0, ans=0.125 2023-11-20 05:32:59,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=960253.3333333334, ans=0.0 2023-11-20 05:33:08,896 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 144050 2023-11-20 05:33:20,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=960386.6666666666, ans=0.125 2023-11-20 05:33:21,629 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 11800, loss[loss=0.08276, simple_loss=0.1086, pruned_loss=0.01975, audio_tagging_loss=0.008701, over 15239.00 frames. ], tot_loss[loss=0.08233, simple_loss=0.1026, pruned_loss=0.02088, audio_tagging_loss=0.01012, over 3044615.51 frames. ], batch size: 54, lr: 5.65e-03, grad_scale: 16.0 2023-11-20 05:33:24,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=960386.6666666666, ans=0.0 2023-11-20 05:33:59,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=960586.6666666666, ans=0.2 2023-11-20 05:33:59,958 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.293e+01 8.347e+01 8.824e+01 9.407e+01 1.316e+02, threshold=1.765e+02, percent-clipped=0.0 2023-11-20 05:34:12,992 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 144100 2023-11-20 05:34:17,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=960653.3333333334, ans=0.125 2023-11-20 05:34:25,238 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 11850, loss[loss=0.08559, simple_loss=0.1076, pruned_loss=0.02059, audio_tagging_loss=0.01122, over 16122.00 frames. ], tot_loss[loss=0.08256, simple_loss=0.103, pruned_loss=0.02086, audio_tagging_loss=0.01021, over 3045287.90 frames. ], batch size: 57, lr: 5.65e-03, grad_scale: 16.0 2023-11-20 05:34:25,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=960720.0, ans=0.125 2023-11-20 05:35:02,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=960853.3333333334, ans=0.125 2023-11-20 05:35:07,459 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.95 vs. limit=15.0 2023-11-20 05:35:16,553 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 144150 2023-11-20 05:35:24,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=960986.6666666666, ans=0.125 2023-11-20 05:35:29,888 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 11900, loss[loss=0.06803, simple_loss=0.08563, pruned_loss=0.01486, audio_tagging_loss=0.01035, over 14013.00 frames. ], tot_loss[loss=0.08215, simple_loss=0.1026, pruned_loss=0.02054, audio_tagging_loss=0.01033, over 3038031.28 frames. ], batch size: 52, lr: 5.65e-03, grad_scale: 16.0 2023-11-20 05:35:30,730 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.32 vs. limit=12.0 2023-11-20 05:35:54,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=961186.6666666666, ans=0.0 2023-11-20 05:35:57,525 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.77 vs. limit=6.0 2023-11-20 05:36:03,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=961186.6666666666, ans=0.0 2023-11-20 05:36:07,722 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.082e+01 8.202e+01 8.808e+01 9.616e+01 1.545e+02, threshold=1.762e+02, percent-clipped=0.0 2023-11-20 05:36:21,937 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 144200 2023-11-20 05:36:23,399 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 05:36:35,081 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 11950, loss[loss=0.06639, simple_loss=0.07683, pruned_loss=0.0145, audio_tagging_loss=0.01347, over 14610.00 frames. ], tot_loss[loss=0.08191, simple_loss=0.1021, pruned_loss=0.02043, audio_tagging_loss=0.01044, over 3037766.73 frames. ], batch size: 57, lr: 5.65e-03, grad_scale: 16.0 2023-11-20 05:36:45,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=961386.6666666666, ans=0.0 2023-11-20 05:36:53,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=961453.3333333334, ans=0.1 2023-11-20 05:37:13,729 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.78 vs. limit=15.0 2023-11-20 05:37:21,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=961586.6666666666, ans=0.125 2023-11-20 05:37:25,354 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 144250 2023-11-20 05:37:37,583 INFO [train_asr.py:1262] (1/4) Epoch 12, batch 12000, loss[loss=0.08683, simple_loss=0.1098, pruned_loss=0.02322, audio_tagging_loss=0.008696, over 14983.00 frames. ], tot_loss[loss=0.08148, simple_loss=0.1015, pruned_loss=0.02032, audio_tagging_loss=0.01043, over 3038872.55 frames. ], batch size: 56, lr: 5.65e-03, grad_scale: 32.0 2023-11-20 05:37:37,586 INFO [train_asr.py:1285] (1/4) Computing validation loss 2023-11-20 05:38:18,765 INFO [train_asr.py:1294] (1/4) Epoch 12, validation: loss=0.06309, simple_loss=0.0542, pruned_loss=0.005937, audio_tagging_loss=0.03005, over 4681554.00 frames. 2023-11-20 05:38:18,766 INFO [train_asr.py:1295] (1/4) Maximum memory allocated so far is 26082MB 2023-11-20 05:38:36,065 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.92 vs. limit=15.0 2023-11-20 05:39:27,244 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 0, loss[loss=0.07489, simple_loss=0.0754, pruned_loss=0.01049, audio_tagging_loss=0.0267, over 15957.00 frames. ], tot_loss[loss=0.07489, simple_loss=0.0754, pruned_loss=0.01049, audio_tagging_loss=0.0267, over 15957.00 frames. ], batch size: 63, lr: 5.43e-03, grad_scale: 32.0 2023-11-20 05:39:27,245 INFO [train_asr.py:1285] (1/4) Computing validation loss 2023-11-20 05:40:04,320 INFO [train_asr.py:1294] (1/4) Epoch 13, validation: loss=0.06272, simple_loss=0.05429, pruned_loss=0.006071, audio_tagging_loss=0.02951, over 4681554.00 frames. 2023-11-20 05:40:04,321 INFO [train_asr.py:1295] (1/4) Maximum memory allocated so far is 26082MB 2023-11-20 05:40:04,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=961886.6666666666, ans=0.0 2023-11-20 05:40:10,454 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.395e+01 8.159e+01 8.856e+01 9.666e+01 1.294e+02, threshold=1.771e+02, percent-clipped=0.0 2023-11-20 05:40:12,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=961886.6666666666, ans=0.0 2023-11-20 05:40:23,145 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 144300 2023-11-20 05:40:33,870 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.93 vs. limit=15.0 2023-11-20 05:40:34,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=962020.0, ans=0.0 2023-11-20 05:40:55,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=962153.3333333334, ans=0.2 2023-11-20 05:41:09,222 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 50, loss[loss=0.1077, simple_loss=0.1284, pruned_loss=0.02987, audio_tagging_loss=0.01367, over 15478.00 frames. ], tot_loss[loss=0.08985, simple_loss=0.0987, pruned_loss=0.02055, audio_tagging_loss=0.01996, over 682285.44 frames. ], batch size: 57, lr: 5.42e-03, grad_scale: 16.0 2023-11-20 05:41:10,083 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.48 vs. limit=10.0 2023-11-20 05:41:15,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=962220.0, ans=0.125 2023-11-20 05:41:20,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=962286.6666666666, ans=0.125 2023-11-20 05:41:24,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=962286.6666666666, ans=0.2 2023-11-20 05:41:28,949 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 144350 2023-11-20 05:41:34,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=962353.3333333334, ans=0.125 2023-11-20 05:41:55,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=962420.0, ans=0.125 2023-11-20 05:42:04,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=962486.6666666666, ans=0.5 2023-11-20 05:42:05,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=962486.6666666666, ans=0.125 2023-11-20 05:42:05,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=962486.6666666666, ans=0.125 2023-11-20 05:42:10,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=962486.6666666666, ans=0.0 2023-11-20 05:42:12,680 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 100, loss[loss=0.1047, simple_loss=0.1231, pruned_loss=0.0261, audio_tagging_loss=0.01705, over 15635.00 frames. ], tot_loss[loss=0.0906, simple_loss=0.1017, pruned_loss=0.02073, audio_tagging_loss=0.01905, over 1214499.49 frames. ], batch size: 55, lr: 5.42e-03, grad_scale: 16.0 2023-11-20 05:42:19,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=962553.3333333334, ans=0.1 2023-11-20 05:42:21,355 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.568e+01 8.954e+01 9.518e+01 1.027e+02 1.327e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-20 05:42:24,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=962553.3333333334, ans=0.125 2023-11-20 05:42:33,883 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 144400 2023-11-20 05:42:46,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=962686.6666666666, ans=0.125 2023-11-20 05:42:55,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=962753.3333333334, ans=0.2 2023-11-20 05:42:56,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=962753.3333333334, ans=0.125 2023-11-20 05:43:06,914 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=962820.0, ans=0.125 2023-11-20 05:43:09,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=962820.0, ans=0.2 2023-11-20 05:43:15,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=962820.0, ans=0.5 2023-11-20 05:43:18,744 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 150, loss[loss=0.08532, simple_loss=0.09436, pruned_loss=0.02065, audio_tagging_loss=0.01749, over 14787.00 frames. ], tot_loss[loss=0.08786, simple_loss=0.1012, pruned_loss=0.0202, audio_tagging_loss=0.01704, over 1614342.51 frames. ], batch size: 56, lr: 5.42e-03, grad_scale: 16.0 2023-11-20 05:43:38,541 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 144450 2023-11-20 05:43:41,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=962953.3333333334, ans=0.125 2023-11-20 05:43:48,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=963020.0, ans=0.125 2023-11-20 05:43:49,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=963020.0, ans=0.0 2023-11-20 05:44:21,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=963153.3333333334, ans=0.125 2023-11-20 05:44:24,348 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 200, loss[loss=0.06763, simple_loss=0.08237, pruned_loss=0.01506, audio_tagging_loss=0.01138, over 15186.00 frames. ], tot_loss[loss=0.08716, simple_loss=0.1031, pruned_loss=0.02081, audio_tagging_loss=0.01479, over 1929425.80 frames. ], batch size: 57, lr: 5.42e-03, grad_scale: 16.0 2023-11-20 05:44:29,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=963220.0, ans=0.125 2023-11-20 05:44:31,991 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.261e+01 8.317e+01 9.168e+01 9.939e+01 1.407e+02, threshold=1.834e+02, percent-clipped=0.0 2023-11-20 05:44:36,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=963286.6666666666, ans=0.2 2023-11-20 05:44:37,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=963286.6666666666, ans=0.125 2023-11-20 05:44:43,901 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 144500 2023-11-20 05:45:19,956 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.23 vs. limit=15.0 2023-11-20 05:45:28,844 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 250, loss[loss=0.08461, simple_loss=0.1146, pruned_loss=0.0198, audio_tagging_loss=0.007533, over 15279.00 frames. ], tot_loss[loss=0.0863, simple_loss=0.104, pruned_loss=0.02088, audio_tagging_loss=0.01343, over 2185174.13 frames. ], batch size: 59, lr: 5.42e-03, grad_scale: 16.0 2023-11-20 05:45:30,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=963553.3333333334, ans=0.125 2023-11-20 05:45:41,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=963620.0, ans=0.125 2023-11-20 05:45:43,119 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.40 vs. limit=22.5 2023-11-20 05:45:49,365 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 144550 2023-11-20 05:45:54,849 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.16 vs. limit=15.0 2023-11-20 05:46:06,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=963686.6666666666, ans=15.0 2023-11-20 05:46:10,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=963753.3333333334, ans=0.125 2023-11-20 05:46:10,475 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.76 vs. limit=15.0 2023-11-20 05:46:14,490 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.37 vs. limit=12.0 2023-11-20 05:46:18,254 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.08 vs. limit=22.5 2023-11-20 05:46:23,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=963820.0, ans=0.0 2023-11-20 05:46:32,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=963820.0, ans=0.125 2023-11-20 05:46:32,630 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.95 vs. limit=15.0 2023-11-20 05:46:34,394 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 300, loss[loss=0.07573, simple_loss=0.09472, pruned_loss=0.01755, audio_tagging_loss=0.01082, over 14865.00 frames. ], tot_loss[loss=0.08473, simple_loss=0.1033, pruned_loss=0.02067, audio_tagging_loss=0.01241, over 2378950.25 frames. ], batch size: 54, lr: 5.42e-03, grad_scale: 16.0 2023-11-20 05:46:42,581 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.680e+01 8.488e+01 9.150e+01 9.824e+01 1.478e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-20 05:46:54,346 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 144600 2023-11-20 05:47:40,289 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 350, loss[loss=0.07172, simple_loss=0.08119, pruned_loss=0.01928, audio_tagging_loss=0.01185, over 13998.00 frames. ], tot_loss[loss=0.08449, simple_loss=0.104, pruned_loss=0.0208, audio_tagging_loss=0.01168, over 2526326.84 frames. ], batch size: 54, lr: 5.42e-03, grad_scale: 16.0 2023-11-20 05:47:48,139 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.41 vs. limit=15.0 2023-11-20 05:47:49,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=964220.0, ans=0.125 2023-11-20 05:47:58,736 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 144650 2023-11-20 05:48:23,446 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.62 vs. limit=12.0 2023-11-20 05:48:31,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=964486.6666666666, ans=0.125 2023-11-20 05:48:41,423 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.12 vs. limit=15.0 2023-11-20 05:48:44,434 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 400, loss[loss=0.07874, simple_loss=0.09992, pruned_loss=0.0176, audio_tagging_loss=0.01118, over 15515.00 frames. ], tot_loss[loss=0.08384, simple_loss=0.1037, pruned_loss=0.02072, audio_tagging_loss=0.01126, over 2640122.39 frames. ], batch size: 57, lr: 5.42e-03, grad_scale: 32.0 2023-11-20 05:48:52,440 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.722e+01 8.150e+01 8.876e+01 9.638e+01 1.255e+02, threshold=1.775e+02, percent-clipped=0.0 2023-11-20 05:48:56,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=964620.0, ans=0.125 2023-11-20 05:49:00,579 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=964620.0, ans=0.125 2023-11-20 05:49:04,171 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 144700 2023-11-20 05:49:21,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=964686.6666666666, ans=10.0 2023-11-20 05:49:24,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=964753.3333333334, ans=0.0 2023-11-20 05:49:40,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=964820.0, ans=0.125 2023-11-20 05:49:49,696 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 450, loss[loss=0.08089, simple_loss=0.09565, pruned_loss=0.02557, audio_tagging_loss=0.007492, over 15981.00 frames. ], tot_loss[loss=0.08341, simple_loss=0.1035, pruned_loss=0.02079, audio_tagging_loss=0.0109, over 2730703.28 frames. ], batch size: 61, lr: 5.42e-03, grad_scale: 32.0 2023-11-20 05:50:06,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=964953.3333333334, ans=0.125 2023-11-20 05:50:08,938 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 144750 2023-11-20 05:50:25,111 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.54 vs. limit=12.0 2023-11-20 05:50:37,274 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.89 vs. limit=6.0 2023-11-20 05:50:54,224 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 500, loss[loss=0.07787, simple_loss=0.09412, pruned_loss=0.02015, audio_tagging_loss=0.01066, over 15387.00 frames. ], tot_loss[loss=0.08388, simple_loss=0.1044, pruned_loss=0.02109, audio_tagging_loss=0.0106, over 2803762.44 frames. ], batch size: 58, lr: 5.42e-03, grad_scale: 32.0 2023-11-20 05:51:02,247 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.903e+01 8.170e+01 8.806e+01 9.563e+01 1.167e+02, threshold=1.761e+02, percent-clipped=0.0 2023-11-20 05:51:10,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=965286.6666666666, ans=0.1 2023-11-20 05:51:13,708 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 144800 2023-11-20 05:51:22,011 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.19 vs. limit=22.5 2023-11-20 05:51:25,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=965353.3333333334, ans=0.2 2023-11-20 05:51:25,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=965353.3333333334, ans=0.125 2023-11-20 05:51:25,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=965353.3333333334, ans=0.0 2023-11-20 05:51:38,331 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.72 vs. limit=10.0 2023-11-20 05:51:41,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=965420.0, ans=0.125 2023-11-20 05:51:47,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=965486.6666666666, ans=0.1 2023-11-20 05:51:48,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=965486.6666666666, ans=0.0 2023-11-20 05:51:52,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=965486.6666666666, ans=0.0 2023-11-20 05:51:58,868 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.72 vs. limit=15.0 2023-11-20 05:51:59,459 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 550, loss[loss=0.07679, simple_loss=0.1003, pruned_loss=0.01772, audio_tagging_loss=0.008941, over 14941.00 frames. ], tot_loss[loss=0.08356, simple_loss=0.1038, pruned_loss=0.02108, audio_tagging_loss=0.01056, over 2856031.63 frames. ], batch size: 57, lr: 5.41e-03, grad_scale: 32.0 2023-11-20 05:52:02,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=965553.3333333334, ans=0.125 2023-11-20 05:52:19,370 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 144850 2023-11-20 05:52:42,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=965753.3333333334, ans=0.0 2023-11-20 05:52:42,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=965753.3333333334, ans=0.2 2023-11-20 05:52:46,908 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.41 vs. limit=15.0 2023-11-20 05:53:04,623 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 600, loss[loss=0.07817, simple_loss=0.104, pruned_loss=0.01817, audio_tagging_loss=0.00801, over 15828.00 frames. ], tot_loss[loss=0.08277, simple_loss=0.1029, pruned_loss=0.02078, audio_tagging_loss=0.01052, over 2896437.79 frames. ], batch size: 58, lr: 5.41e-03, grad_scale: 32.0 2023-11-20 05:53:04,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=965886.6666666666, ans=0.125 2023-11-20 05:53:07,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=965886.6666666666, ans=0.125 2023-11-20 05:53:12,773 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.448e+01 8.106e+01 8.665e+01 9.302e+01 1.312e+02, threshold=1.733e+02, percent-clipped=0.0 2023-11-20 05:53:18,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=965953.3333333334, ans=0.2 2023-11-20 05:53:24,008 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 144900 2023-11-20 05:53:29,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=966020.0, ans=0.05 2023-11-20 05:53:38,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=966020.0, ans=0.07 2023-11-20 05:53:38,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=966020.0, ans=0.125 2023-11-20 05:53:43,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=966086.6666666666, ans=0.07 2023-11-20 05:53:48,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=966086.6666666666, ans=10.0 2023-11-20 05:53:53,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=966086.6666666666, ans=0.125 2023-11-20 05:53:57,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=966153.3333333334, ans=0.125 2023-11-20 05:54:10,278 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 650, loss[loss=0.06887, simple_loss=0.08797, pruned_loss=0.01821, audio_tagging_loss=0.006672, over 14619.00 frames. ], tot_loss[loss=0.08257, simple_loss=0.103, pruned_loss=0.02073, audio_tagging_loss=0.01033, over 2931209.37 frames. ], batch size: 55, lr: 5.41e-03, grad_scale: 32.0 2023-11-20 05:54:29,349 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 144950 2023-11-20 05:54:44,944 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.91 vs. limit=6.0 2023-11-20 05:55:08,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=966486.6666666666, ans=0.125 2023-11-20 05:55:10,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=966486.6666666666, ans=0.0 2023-11-20 05:55:14,356 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 700, loss[loss=0.09843, simple_loss=0.1178, pruned_loss=0.02836, audio_tagging_loss=0.01119, over 14553.00 frames. ], tot_loss[loss=0.08317, simple_loss=0.1042, pruned_loss=0.02086, audio_tagging_loss=0.01022, over 2959928.91 frames. ], batch size: 54, lr: 5.41e-03, grad_scale: 32.0 2023-11-20 05:55:14,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=966553.3333333334, ans=0.125 2023-11-20 05:55:22,270 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.647e+01 8.027e+01 8.645e+01 9.342e+01 1.133e+02, threshold=1.729e+02, percent-clipped=0.0 2023-11-20 05:55:34,329 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.01 vs. limit=15.0 2023-11-20 05:55:34,897 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 145000 2023-11-20 05:55:47,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=966686.6666666666, ans=0.1 2023-11-20 05:55:50,223 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.94 vs. limit=15.0 2023-11-20 05:55:50,838 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=966686.6666666666, ans=0.1 2023-11-20 05:55:52,479 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.88 vs. limit=22.5 2023-11-20 05:55:57,575 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.01 vs. limit=6.0 2023-11-20 05:56:04,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=966753.3333333334, ans=0.125 2023-11-20 05:56:21,155 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 750, loss[loss=0.08193, simple_loss=0.1024, pruned_loss=0.01849, audio_tagging_loss=0.01223, over 14913.00 frames. ], tot_loss[loss=0.08257, simple_loss=0.1034, pruned_loss=0.02063, audio_tagging_loss=0.01023, over 2984213.17 frames. ], batch size: 57, lr: 5.41e-03, grad_scale: 32.0 2023-11-20 05:56:28,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=966886.6666666666, ans=0.0 2023-11-20 05:56:39,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=966953.3333333334, ans=0.1 2023-11-20 05:56:40,397 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 145050 2023-11-20 05:56:46,142 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.13 vs. limit=15.0 2023-11-20 05:56:57,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=967086.6666666666, ans=0.1 2023-11-20 05:57:10,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=967086.6666666666, ans=0.2 2023-11-20 05:57:13,707 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.99 vs. limit=15.0 2023-11-20 05:57:25,794 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 800, loss[loss=0.08381, simple_loss=0.1036, pruned_loss=0.02161, audio_tagging_loss=0.0104, over 14894.00 frames. ], tot_loss[loss=0.08335, simple_loss=0.1039, pruned_loss=0.02105, audio_tagging_loss=0.01036, over 2999954.52 frames. ], batch size: 55, lr: 5.41e-03, grad_scale: 32.0 2023-11-20 05:57:30,189 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.18 vs. limit=15.0 2023-11-20 05:57:33,180 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.560e+01 8.396e+01 9.088e+01 9.762e+01 1.189e+02, threshold=1.818e+02, percent-clipped=0.0 2023-11-20 05:57:36,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=967220.0, ans=0.1 2023-11-20 05:57:41,291 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.20 vs. limit=15.0 2023-11-20 05:57:43,770 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.15 vs. limit=15.0 2023-11-20 05:57:44,468 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 145100 2023-11-20 05:58:04,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=967420.0, ans=0.1 2023-11-20 05:58:10,674 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.84 vs. limit=10.0 2023-11-20 05:58:14,372 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.42 vs. limit=15.0 2023-11-20 05:58:21,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=967486.6666666666, ans=0.125 2023-11-20 05:58:29,583 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 850, loss[loss=0.0725, simple_loss=0.09045, pruned_loss=0.01654, audio_tagging_loss=0.01073, over 15348.00 frames. ], tot_loss[loss=0.08353, simple_loss=0.1042, pruned_loss=0.02105, audio_tagging_loss=0.01038, over 3010047.42 frames. ], batch size: 59, lr: 5.41e-03, grad_scale: 32.0 2023-11-20 05:58:37,478 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.24 vs. limit=15.0 2023-11-20 05:58:38,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=967553.3333333334, ans=0.0 2023-11-20 05:58:49,646 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 145150 2023-11-20 05:59:05,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=967686.6666666666, ans=0.125 2023-11-20 05:59:10,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=967753.3333333334, ans=0.1 2023-11-20 05:59:12,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=967753.3333333334, ans=0.125 2023-11-20 05:59:15,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=967753.3333333334, ans=0.125 2023-11-20 05:59:23,160 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.28 vs. limit=10.0 2023-11-20 05:59:30,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=967820.0, ans=0.125 2023-11-20 05:59:31,214 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.48 vs. limit=15.0 2023-11-20 05:59:32,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=967820.0, ans=0.125 2023-11-20 05:59:32,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=967820.0, ans=0.0 2023-11-20 05:59:34,787 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 900, loss[loss=0.07806, simple_loss=0.1043, pruned_loss=0.01621, audio_tagging_loss=0.009717, over 15908.00 frames. ], tot_loss[loss=0.08355, simple_loss=0.104, pruned_loss=0.02109, audio_tagging_loss=0.01046, over 3024328.21 frames. ], batch size: 60, lr: 5.41e-03, grad_scale: 32.0 2023-11-20 05:59:42,652 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 5.961e+01 8.143e+01 8.772e+01 9.521e+01 1.429e+02, threshold=1.754e+02, percent-clipped=0.0 2023-11-20 05:59:48,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=967953.3333333334, ans=0.125 2023-11-20 05:59:54,307 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 145200 2023-11-20 06:00:22,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=968086.6666666666, ans=0.125 2023-11-20 06:00:40,225 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 950, loss[loss=0.07755, simple_loss=0.09346, pruned_loss=0.02152, audio_tagging_loss=0.0093, over 14803.00 frames. ], tot_loss[loss=0.08272, simple_loss=0.103, pruned_loss=0.02082, audio_tagging_loss=0.01039, over 3025112.31 frames. ], batch size: 57, lr: 5.41e-03, grad_scale: 32.0 2023-11-20 06:00:44,611 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.59 vs. limit=15.0 2023-11-20 06:00:50,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=968220.0, ans=0.015 2023-11-20 06:00:58,562 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 145250 2023-11-20 06:01:06,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=968353.3333333334, ans=0.0 2023-11-20 06:01:10,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=968353.3333333334, ans=0.125 2023-11-20 06:01:16,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=968353.3333333334, ans=0.0 2023-11-20 06:01:17,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=968420.0, ans=0.0 2023-11-20 06:01:37,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=968486.6666666666, ans=0.1 2023-11-20 06:01:43,370 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 1000, loss[loss=0.07631, simple_loss=0.09525, pruned_loss=0.01832, audio_tagging_loss=0.01036, over 15821.00 frames. ], tot_loss[loss=0.08239, simple_loss=0.1029, pruned_loss=0.02075, audio_tagging_loss=0.0102, over 3029370.42 frames. ], batch size: 57, lr: 5.41e-03, grad_scale: 16.0 2023-11-20 06:01:50,069 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.07 vs. limit=22.5 2023-11-20 06:01:51,987 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.711e+01 7.989e+01 8.669e+01 9.928e+01 1.196e+02, threshold=1.734e+02, percent-clipped=0.0 2023-11-20 06:01:59,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=968620.0, ans=0.125 2023-11-20 06:02:03,488 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 145300 2023-11-20 06:02:04,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=968620.0, ans=0.125 2023-11-20 06:02:05,320 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.69 vs. limit=22.5 2023-11-20 06:02:10,070 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 06:02:20,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=968686.6666666666, ans=0.04949747468305833 2023-11-20 06:02:48,448 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 1050, loss[loss=0.08098, simple_loss=0.09979, pruned_loss=0.0193, audio_tagging_loss=0.01178, over 14002.00 frames. ], tot_loss[loss=0.08173, simple_loss=0.102, pruned_loss=0.02051, audio_tagging_loss=0.01023, over 3038271.78 frames. ], batch size: 55, lr: 5.41e-03, grad_scale: 16.0 2023-11-20 06:02:54,700 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.62 vs. limit=15.0 2023-11-20 06:03:09,045 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 145350 2023-11-20 06:03:16,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=969020.0, ans=0.04949747468305833 2023-11-20 06:03:17,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=969020.0, ans=0.2 2023-11-20 06:03:28,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=969086.6666666666, ans=0.1 2023-11-20 06:03:31,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=969086.6666666666, ans=0.125 2023-11-20 06:03:48,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=969153.3333333334, ans=0.0 2023-11-20 06:03:54,966 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 1100, loss[loss=0.09671, simple_loss=0.1304, pruned_loss=0.02447, audio_tagging_loss=0.007053, over 16538.00 frames. ], tot_loss[loss=0.08099, simple_loss=0.1012, pruned_loss=0.02025, audio_tagging_loss=0.01014, over 3044358.00 frames. ], batch size: 58, lr: 5.40e-03, grad_scale: 16.0 2023-11-20 06:03:57,477 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 06:04:03,628 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.580e+01 8.161e+01 8.676e+01 9.552e+01 1.363e+02, threshold=1.735e+02, percent-clipped=0.0 2023-11-20 06:04:13,719 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 145400 2023-11-20 06:04:15,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=969286.6666666666, ans=0.125 2023-11-20 06:04:32,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=969420.0, ans=0.95 2023-11-20 06:04:48,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=969486.6666666666, ans=0.0 2023-11-20 06:04:59,686 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 1150, loss[loss=0.08007, simple_loss=0.1038, pruned_loss=0.019, audio_tagging_loss=0.009194, over 15398.00 frames. ], tot_loss[loss=0.08153, simple_loss=0.1019, pruned_loss=0.02058, audio_tagging_loss=0.009981, over 3044654.61 frames. ], batch size: 61, lr: 5.40e-03, grad_scale: 16.0 2023-11-20 06:05:18,979 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 145450 2023-11-20 06:05:40,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=969753.3333333334, ans=0.07 2023-11-20 06:05:41,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=969753.3333333334, ans=0.125 2023-11-20 06:05:53,414 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.01 vs. limit=12.0 2023-11-20 06:05:54,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=969820.0, ans=0.125 2023-11-20 06:05:55,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=969820.0, ans=0.2 2023-11-20 06:06:03,728 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 1200, loss[loss=0.06463, simple_loss=0.08108, pruned_loss=0.01571, audio_tagging_loss=0.008385, over 14898.00 frames. ], tot_loss[loss=0.0813, simple_loss=0.1014, pruned_loss=0.02067, audio_tagging_loss=0.009922, over 3048617.35 frames. ], batch size: 53, lr: 5.40e-03, grad_scale: 32.0 2023-11-20 06:06:08,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=969886.6666666666, ans=0.04949747468305833 2023-11-20 06:06:13,709 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.636e+01 8.393e+01 8.955e+01 9.874e+01 3.263e+02, threshold=1.791e+02, percent-clipped=1.0 2023-11-20 06:06:14,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=969886.6666666666, ans=0.125 2023-11-20 06:06:16,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=969953.3333333334, ans=0.125 2023-11-20 06:06:24,485 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 145500 2023-11-20 06:06:38,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=970020.0, ans=0.0 2023-11-20 06:06:56,772 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.84 vs. limit=22.5 2023-11-20 06:07:05,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=970153.3333333334, ans=0.1 2023-11-20 06:07:09,464 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 1250, loss[loss=0.09733, simple_loss=0.1329, pruned_loss=0.02274, audio_tagging_loss=0.008141, over 15761.00 frames. ], tot_loss[loss=0.08102, simple_loss=0.1015, pruned_loss=0.02042, audio_tagging_loss=0.00985, over 3049626.53 frames. ], batch size: 58, lr: 5.40e-03, grad_scale: 32.0 2023-11-20 06:07:12,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=970220.0, ans=0.125 2023-11-20 06:07:14,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=970220.0, ans=0.2 2023-11-20 06:07:16,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=970220.0, ans=0.125 2023-11-20 06:07:18,078 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.75 vs. limit=6.0 2023-11-20 06:07:28,460 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 145550 2023-11-20 06:07:33,438 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 06:08:01,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=970486.6666666666, ans=0.0 2023-11-20 06:08:07,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=970486.6666666666, ans=0.125 2023-11-20 06:08:08,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=970486.6666666666, ans=0.125 2023-11-20 06:08:13,557 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 1300, loss[loss=0.07726, simple_loss=0.1077, pruned_loss=0.01633, audio_tagging_loss=0.007066, over 15957.00 frames. ], tot_loss[loss=0.08079, simple_loss=0.1011, pruned_loss=0.02033, audio_tagging_loss=0.00992, over 3043732.59 frames. ], batch size: 57, lr: 5.40e-03, grad_scale: 32.0 2023-11-20 06:08:22,205 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.703e+01 8.020e+01 8.850e+01 9.786e+01 1.232e+02, threshold=1.770e+02, percent-clipped=0.0 2023-11-20 06:08:32,727 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 145600 2023-11-20 06:09:02,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=970753.3333333334, ans=0.0 2023-11-20 06:09:17,773 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 1350, loss[loss=0.09439, simple_loss=0.1238, pruned_loss=0.02297, audio_tagging_loss=0.009501, over 15161.00 frames. ], tot_loss[loss=0.08071, simple_loss=0.1012, pruned_loss=0.02018, audio_tagging_loss=0.009927, over 3040379.82 frames. ], batch size: 54, lr: 5.40e-03, grad_scale: 32.0 2023-11-20 06:09:22,726 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.33 vs. limit=15.0 2023-11-20 06:09:23,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=970886.6666666666, ans=0.125 2023-11-20 06:09:37,906 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 145650 2023-11-20 06:09:38,722 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.98 vs. limit=15.0 2023-11-20 06:09:39,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=970953.3333333334, ans=0.05 2023-11-20 06:09:41,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=970953.3333333334, ans=0.1 2023-11-20 06:09:50,235 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.17 vs. limit=15.0 2023-11-20 06:10:03,334 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 06:10:22,940 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 1400, loss[loss=0.09296, simple_loss=0.1226, pruned_loss=0.02207, audio_tagging_loss=0.00956, over 13489.00 frames. ], tot_loss[loss=0.08121, simple_loss=0.1016, pruned_loss=0.02036, audio_tagging_loss=0.01003, over 3034500.39 frames. ], batch size: 53, lr: 5.40e-03, grad_scale: 32.0 2023-11-20 06:10:32,226 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.083e+01 8.085e+01 8.864e+01 9.771e+01 1.469e+02, threshold=1.773e+02, percent-clipped=0.0 2023-11-20 06:10:42,854 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 145700 2023-11-20 06:10:51,930 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.42 vs. limit=15.0 2023-11-20 06:11:07,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=971420.0, ans=0.0 2023-11-20 06:11:15,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=971486.6666666666, ans=10.0 2023-11-20 06:11:23,553 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.20 vs. limit=22.5 2023-11-20 06:11:28,529 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 1450, loss[loss=0.07525, simple_loss=0.08534, pruned_loss=0.02029, audio_tagging_loss=0.01229, over 14963.00 frames. ], tot_loss[loss=0.08141, simple_loss=0.1016, pruned_loss=0.02059, audio_tagging_loss=0.01002, over 3044271.33 frames. ], batch size: 56, lr: 5.40e-03, grad_scale: 32.0 2023-11-20 06:11:43,969 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.24 vs. limit=12.0 2023-11-20 06:11:46,892 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 145750 2023-11-20 06:12:02,192 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.14 vs. limit=10.0 2023-11-20 06:12:06,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=971753.3333333334, ans=0.125 2023-11-20 06:12:20,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=971820.0, ans=0.0 2023-11-20 06:12:32,269 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 1500, loss[loss=0.07182, simple_loss=0.08381, pruned_loss=0.01779, audio_tagging_loss=0.01213, over 15581.00 frames. ], tot_loss[loss=0.08186, simple_loss=0.1021, pruned_loss=0.02071, audio_tagging_loss=0.01012, over 3042128.26 frames. ], batch size: 60, lr: 5.40e-03, grad_scale: 32.0 2023-11-20 06:12:36,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=971886.6666666666, ans=0.1 2023-11-20 06:12:41,375 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.185e+01 8.333e+01 8.756e+01 9.459e+01 1.276e+02, threshold=1.751e+02, percent-clipped=0.0 2023-11-20 06:12:51,850 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 145800 2023-11-20 06:13:30,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=972153.3333333334, ans=0.1 2023-11-20 06:13:37,303 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 1550, loss[loss=0.1033, simple_loss=0.1207, pruned_loss=0.02934, audio_tagging_loss=0.01364, over 14522.00 frames. ], tot_loss[loss=0.08168, simple_loss=0.1014, pruned_loss=0.02071, audio_tagging_loss=0.01026, over 3036903.95 frames. ], batch size: 55, lr: 5.40e-03, grad_scale: 32.0 2023-11-20 06:13:56,452 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 145850 2023-11-20 06:13:58,420 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.98 vs. limit=15.0 2023-11-20 06:14:06,445 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.15 vs. limit=6.0 2023-11-20 06:14:22,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=972420.0, ans=0.2 2023-11-20 06:14:36,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=972486.6666666666, ans=0.125 2023-11-20 06:14:42,104 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 1600, loss[loss=0.07459, simple_loss=0.08981, pruned_loss=0.01809, audio_tagging_loss=0.0116, over 14357.00 frames. ], tot_loss[loss=0.08181, simple_loss=0.1018, pruned_loss=0.02069, audio_tagging_loss=0.01019, over 3041810.38 frames. ], batch size: 55, lr: 5.40e-03, grad_scale: 32.0 2023-11-20 06:14:47,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=972553.3333333334, ans=0.0 2023-11-20 06:14:48,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=972553.3333333334, ans=0.0 2023-11-20 06:14:51,235 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.806e+01 8.279e+01 8.833e+01 9.772e+01 1.298e+02, threshold=1.767e+02, percent-clipped=0.0 2023-11-20 06:15:01,488 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 145900 2023-11-20 06:15:26,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=972753.3333333334, ans=0.1 2023-11-20 06:15:46,793 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 1650, loss[loss=0.07257, simple_loss=0.09048, pruned_loss=0.01692, audio_tagging_loss=0.01041, over 14939.00 frames. ], tot_loss[loss=0.08108, simple_loss=0.1005, pruned_loss=0.0205, audio_tagging_loss=0.01033, over 3043301.45 frames. ], batch size: 56, lr: 5.39e-03, grad_scale: 16.0 2023-11-20 06:15:54,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=972886.6666666666, ans=0.125 2023-11-20 06:16:06,533 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 145950 2023-11-20 06:16:42,059 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.84 vs. limit=15.0 2023-11-20 06:16:51,539 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 1700, loss[loss=0.08333, simple_loss=0.1075, pruned_loss=0.0199, audio_tagging_loss=0.009652, over 15698.00 frames. ], tot_loss[loss=0.08101, simple_loss=0.1006, pruned_loss=0.02039, audio_tagging_loss=0.01031, over 3045151.94 frames. ], batch size: 56, lr: 5.39e-03, grad_scale: 16.0 2023-11-20 06:17:02,509 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.898e+01 8.052e+01 8.811e+01 9.587e+01 1.278e+02, threshold=1.762e+02, percent-clipped=0.0 2023-11-20 06:17:05,927 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.34 vs. limit=15.0 2023-11-20 06:17:11,368 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 146000 2023-11-20 06:17:25,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=973353.3333333334, ans=0.125 2023-11-20 06:17:53,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=973486.6666666666, ans=0.0 2023-11-20 06:17:54,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=973486.6666666666, ans=0.125 2023-11-20 06:17:56,479 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 1750, loss[loss=0.06391, simple_loss=0.08094, pruned_loss=0.01569, audio_tagging_loss=0.007751, over 14845.00 frames. ], tot_loss[loss=0.08076, simple_loss=0.1005, pruned_loss=0.02028, audio_tagging_loss=0.01025, over 3052979.99 frames. ], batch size: 57, lr: 5.39e-03, grad_scale: 16.0 2023-11-20 06:17:58,314 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.79 vs. limit=10.0 2023-11-20 06:18:02,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=973553.3333333334, ans=0.0 2023-11-20 06:18:15,681 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 146050 2023-11-20 06:18:46,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=973753.3333333334, ans=0.0 2023-11-20 06:18:47,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=973820.0, ans=0.0 2023-11-20 06:18:53,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=973820.0, ans=0.125 2023-11-20 06:19:00,649 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 1800, loss[loss=0.08149, simple_loss=0.1067, pruned_loss=0.02203, audio_tagging_loss=0.006108, over 13968.00 frames. ], tot_loss[loss=0.08103, simple_loss=0.1011, pruned_loss=0.0204, audio_tagging_loss=0.01009, over 3050694.85 frames. ], batch size: 53, lr: 5.39e-03, grad_scale: 16.0 2023-11-20 06:19:09,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=973886.6666666666, ans=0.1 2023-11-20 06:19:11,427 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.956e+01 8.340e+01 8.928e+01 9.770e+01 2.074e+02, threshold=1.786e+02, percent-clipped=1.0 2023-11-20 06:19:21,323 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 146100 2023-11-20 06:19:34,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=974020.0, ans=0.2 2023-11-20 06:19:43,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=974086.6666666666, ans=0.125 2023-11-20 06:19:57,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=974153.3333333334, ans=0.1 2023-11-20 06:20:06,130 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 1850, loss[loss=0.07449, simple_loss=0.09595, pruned_loss=0.0182, audio_tagging_loss=0.008317, over 15110.00 frames. ], tot_loss[loss=0.08118, simple_loss=0.1017, pruned_loss=0.02038, audio_tagging_loss=0.009966, over 3049824.50 frames. ], batch size: 58, lr: 5.39e-03, grad_scale: 16.0 2023-11-20 06:20:25,770 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 146150 2023-11-20 06:20:31,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=974353.3333333334, ans=0.125 2023-11-20 06:20:38,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=974353.3333333334, ans=0.125 2023-11-20 06:21:11,587 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 1900, loss[loss=0.07845, simple_loss=0.1017, pruned_loss=0.01749, audio_tagging_loss=0.01013, over 16621.00 frames. ], tot_loss[loss=0.08096, simple_loss=0.1015, pruned_loss=0.02022, audio_tagging_loss=0.009987, over 3058910.56 frames. ], batch size: 63, lr: 5.39e-03, grad_scale: 16.0 2023-11-20 06:21:16,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=974553.3333333334, ans=0.125 2023-11-20 06:21:21,312 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.109e+01 8.191e+01 8.714e+01 9.670e+01 1.123e+02, threshold=1.743e+02, percent-clipped=0.0 2023-11-20 06:21:25,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=974620.0, ans=0.125 2023-11-20 06:21:30,132 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 146200 2023-11-20 06:22:04,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=974820.0, ans=0.2 2023-11-20 06:22:11,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=974820.0, ans=0.0 2023-11-20 06:22:16,072 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 1950, loss[loss=0.07251, simple_loss=0.091, pruned_loss=0.01779, audio_tagging_loss=0.009221, over 14803.00 frames. ], tot_loss[loss=0.08, simple_loss=0.1002, pruned_loss=0.01997, audio_tagging_loss=0.009946, over 3054703.87 frames. ], batch size: 55, lr: 5.39e-03, grad_scale: 16.0 2023-11-20 06:22:32,617 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.34 vs. limit=15.0 2023-11-20 06:22:34,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=974953.3333333334, ans=0.0 2023-11-20 06:22:35,914 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 146250 2023-11-20 06:22:54,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=975086.6666666666, ans=0.125 2023-11-20 06:22:57,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=975086.6666666666, ans=0.0 2023-11-20 06:22:59,856 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.91 vs. limit=15.0 2023-11-20 06:23:21,434 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 2000, loss[loss=0.0806, simple_loss=0.1034, pruned_loss=0.01777, audio_tagging_loss=0.01114, over 15476.00 frames. ], tot_loss[loss=0.07993, simple_loss=0.09987, pruned_loss=0.02, audio_tagging_loss=0.009992, over 3045554.85 frames. ], batch size: 58, lr: 5.39e-03, grad_scale: 32.0 2023-11-20 06:23:26,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=975220.0, ans=0.125 2023-11-20 06:23:31,811 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.029e+01 7.759e+01 8.483e+01 9.476e+01 1.626e+02, threshold=1.697e+02, percent-clipped=0.0 2023-11-20 06:23:33,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=975286.6666666666, ans=0.0 2023-11-20 06:23:34,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=975286.6666666666, ans=0.0 2023-11-20 06:23:35,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=975286.6666666666, ans=0.125 2023-11-20 06:23:38,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=975286.6666666666, ans=0.125 2023-11-20 06:23:41,218 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 146300 2023-11-20 06:24:02,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=975420.0, ans=0.0 2023-11-20 06:24:26,430 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 2050, loss[loss=0.07516, simple_loss=0.09166, pruned_loss=0.01612, audio_tagging_loss=0.0132, over 15536.00 frames. ], tot_loss[loss=0.08101, simple_loss=0.1013, pruned_loss=0.02034, audio_tagging_loss=0.01001, over 3047832.47 frames. ], batch size: 58, lr: 5.39e-03, grad_scale: 32.0 2023-11-20 06:24:31,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=975553.3333333334, ans=0.125 2023-11-20 06:24:41,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=975620.0, ans=0.1 2023-11-20 06:24:45,180 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 146350 2023-11-20 06:24:46,543 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=975620.0, ans=0.125 2023-11-20 06:25:03,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=975753.3333333334, ans=0.125 2023-11-20 06:25:30,030 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 2100, loss[loss=0.05552, simple_loss=0.06129, pruned_loss=0.01284, audio_tagging_loss=0.01203, over 14260.00 frames. ], tot_loss[loss=0.08095, simple_loss=0.1015, pruned_loss=0.02022, audio_tagging_loss=0.009997, over 3052490.52 frames. ], batch size: 56, lr: 5.39e-03, grad_scale: 32.0 2023-11-20 06:25:39,688 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.798e+01 8.216e+01 8.927e+01 9.679e+01 1.244e+02, threshold=1.785e+02, percent-clipped=0.0 2023-11-20 06:25:48,930 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 146400 2023-11-20 06:26:17,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=976086.6666666666, ans=0.0 2023-11-20 06:26:32,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=976153.3333333334, ans=15.0 2023-11-20 06:26:33,938 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 2150, loss[loss=0.07111, simple_loss=0.08677, pruned_loss=0.01546, audio_tagging_loss=0.01226, over 15785.00 frames. ], tot_loss[loss=0.08025, simple_loss=0.1005, pruned_loss=0.01998, audio_tagging_loss=0.01002, over 3050250.25 frames. ], batch size: 62, lr: 5.39e-03, grad_scale: 32.0 2023-11-20 06:26:36,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=976220.0, ans=0.5 2023-11-20 06:26:50,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=976286.6666666666, ans=0.125 2023-11-20 06:26:55,014 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 146450 2023-11-20 06:27:12,176 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 06:27:19,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=976420.0, ans=0.125 2023-11-20 06:27:30,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=976486.6666666666, ans=0.1 2023-11-20 06:27:36,981 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.01 vs. limit=6.0 2023-11-20 06:27:39,835 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 2200, loss[loss=0.07558, simple_loss=0.09733, pruned_loss=0.01734, audio_tagging_loss=0.009574, over 15149.00 frames. ], tot_loss[loss=0.08063, simple_loss=0.101, pruned_loss=0.02011, audio_tagging_loss=0.01001, over 3049802.42 frames. ], batch size: 56, lr: 5.38e-03, grad_scale: 32.0 2023-11-20 06:27:50,510 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.751e+01 8.222e+01 8.987e+01 9.495e+01 1.215e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-20 06:27:59,383 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 146500 2023-11-20 06:28:03,444 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.95 vs. limit=22.5 2023-11-20 06:28:40,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=976820.0, ans=0.0 2023-11-20 06:28:44,352 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 2250, loss[loss=0.09538, simple_loss=0.1168, pruned_loss=0.0272, audio_tagging_loss=0.009799, over 15625.00 frames. ], tot_loss[loss=0.08062, simple_loss=0.1007, pruned_loss=0.02021, audio_tagging_loss=0.01007, over 3054056.41 frames. ], batch size: 59, lr: 5.38e-03, grad_scale: 32.0 2023-11-20 06:28:44,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=976886.6666666666, ans=0.2 2023-11-20 06:28:47,482 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.06 vs. limit=22.5 2023-11-20 06:28:50,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=976886.6666666666, ans=0.125 2023-11-20 06:28:52,643 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.69 vs. limit=15.0 2023-11-20 06:29:03,357 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 146550 2023-11-20 06:29:03,923 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.25 vs. limit=15.0 2023-11-20 06:29:06,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=976953.3333333334, ans=0.1 2023-11-20 06:29:36,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=977153.3333333334, ans=0.1 2023-11-20 06:29:43,899 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.78 vs. limit=15.0 2023-11-20 06:29:48,055 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 2300, loss[loss=0.08953, simple_loss=0.1077, pruned_loss=0.02357, audio_tagging_loss=0.01209, over 14601.00 frames. ], tot_loss[loss=0.08085, simple_loss=0.1011, pruned_loss=0.02016, audio_tagging_loss=0.01016, over 3051917.85 frames. ], batch size: 56, lr: 5.38e-03, grad_scale: 32.0 2023-11-20 06:29:58,629 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.545e+01 8.135e+01 8.852e+01 9.695e+01 1.259e+02, threshold=1.770e+02, percent-clipped=0.0 2023-11-20 06:30:08,599 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 146600 2023-11-20 06:30:44,328 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 06:30:49,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=977486.6666666666, ans=0.125 2023-11-20 06:30:52,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=977553.3333333334, ans=0.0 2023-11-20 06:30:53,508 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 2350, loss[loss=0.1056, simple_loss=0.132, pruned_loss=0.02954, audio_tagging_loss=0.01002, over 15839.00 frames. ], tot_loss[loss=0.08089, simple_loss=0.101, pruned_loss=0.02018, audio_tagging_loss=0.01022, over 3050213.89 frames. ], batch size: 56, lr: 5.38e-03, grad_scale: 32.0 2023-11-20 06:31:12,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=977620.0, ans=0.125 2023-11-20 06:31:13,197 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 146650 2023-11-20 06:31:18,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=977686.6666666666, ans=0.125 2023-11-20 06:31:34,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=977753.3333333334, ans=0.125 2023-11-20 06:31:51,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=977820.0, ans=0.0 2023-11-20 06:31:52,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=977820.0, ans=0.125 2023-11-20 06:31:57,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=977886.6666666666, ans=0.035 2023-11-20 06:31:58,143 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 2400, loss[loss=0.07508, simple_loss=0.08622, pruned_loss=0.01975, audio_tagging_loss=0.01222, over 14670.00 frames. ], tot_loss[loss=0.08136, simple_loss=0.1017, pruned_loss=0.02032, audio_tagging_loss=0.01022, over 3054081.63 frames. ], batch size: 56, lr: 5.38e-03, grad_scale: 32.0 2023-11-20 06:32:08,484 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.73 vs. limit=12.0 2023-11-20 06:32:09,020 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.339e+01 8.246e+01 8.786e+01 9.716e+01 1.266e+02, threshold=1.757e+02, percent-clipped=0.0 2023-11-20 06:32:16,312 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 146700 2023-11-20 06:32:25,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=978020.0, ans=0.125 2023-11-20 06:32:38,406 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=978086.6666666666, ans=0.1 2023-11-20 06:32:43,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=978086.6666666666, ans=0.2 2023-11-20 06:32:51,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=978153.3333333334, ans=0.125 2023-11-20 06:32:58,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=978153.3333333334, ans=0.05 2023-11-20 06:33:00,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=978220.0, ans=0.125 2023-11-20 06:33:01,560 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 2450, loss[loss=0.08539, simple_loss=0.102, pruned_loss=0.02224, audio_tagging_loss=0.01216, over 15004.00 frames. ], tot_loss[loss=0.08097, simple_loss=0.1008, pruned_loss=0.02021, audio_tagging_loss=0.01033, over 3041842.94 frames. ], batch size: 57, lr: 5.38e-03, grad_scale: 32.0 2023-11-20 06:33:21,623 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 146750 2023-11-20 06:33:42,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=978420.0, ans=0.2 2023-11-20 06:33:47,557 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.99 vs. limit=22.5 2023-11-20 06:33:55,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=978486.6666666666, ans=0.0 2023-11-20 06:33:59,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=978486.6666666666, ans=0.125 2023-11-20 06:34:06,439 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 2500, loss[loss=0.05893, simple_loss=0.06558, pruned_loss=0.01612, audio_tagging_loss=0.01001, over 15824.00 frames. ], tot_loss[loss=0.08137, simple_loss=0.1013, pruned_loss=0.02034, audio_tagging_loss=0.01037, over 3041775.02 frames. ], batch size: 62, lr: 5.38e-03, grad_scale: 32.0 2023-11-20 06:34:07,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=978553.3333333334, ans=0.0 2023-11-20 06:34:18,574 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.294e+01 8.313e+01 9.053e+01 9.691e+01 1.783e+02, threshold=1.811e+02, percent-clipped=1.0 2023-11-20 06:34:22,494 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 06:34:24,116 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.60 vs. limit=15.0 2023-11-20 06:34:26,051 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 146800 2023-11-20 06:34:34,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=978686.6666666666, ans=0.125 2023-11-20 06:34:37,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=978686.6666666666, ans=0.1 2023-11-20 06:34:50,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=978753.3333333334, ans=0.1 2023-11-20 06:35:09,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=978820.0, ans=0.125 2023-11-20 06:35:11,791 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 2550, loss[loss=0.07506, simple_loss=0.0935, pruned_loss=0.01811, audio_tagging_loss=0.0102, over 15595.00 frames. ], tot_loss[loss=0.08153, simple_loss=0.1015, pruned_loss=0.02045, audio_tagging_loss=0.01032, over 3043343.82 frames. ], batch size: 59, lr: 5.38e-03, grad_scale: 32.0 2023-11-20 06:35:23,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=978953.3333333334, ans=0.1 2023-11-20 06:35:23,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=978953.3333333334, ans=0.125 2023-11-20 06:35:23,549 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.51 vs. limit=15.0 2023-11-20 06:35:29,927 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 146850 2023-11-20 06:35:39,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=979020.0, ans=0.0 2023-11-20 06:36:02,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=979153.3333333334, ans=0.2 2023-11-20 06:36:03,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=979153.3333333334, ans=0.2 2023-11-20 06:36:15,067 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 2600, loss[loss=0.08594, simple_loss=0.1027, pruned_loss=0.02479, audio_tagging_loss=0.009775, over 14129.00 frames. ], tot_loss[loss=0.08069, simple_loss=0.1005, pruned_loss=0.02025, audio_tagging_loss=0.01019, over 3046036.07 frames. ], batch size: 54, lr: 5.38e-03, grad_scale: 32.0 2023-11-20 06:36:26,627 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.093e+01 8.215e+01 8.794e+01 9.573e+01 1.201e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-20 06:36:28,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=979286.6666666666, ans=0.0 2023-11-20 06:36:34,893 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 146900 2023-11-20 06:36:58,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=979420.0, ans=0.2 2023-11-20 06:37:06,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=979486.6666666666, ans=0.125 2023-11-20 06:37:07,438 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.28 vs. limit=15.0 2023-11-20 06:37:20,171 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 2650, loss[loss=0.07208, simple_loss=0.08605, pruned_loss=0.0152, audio_tagging_loss=0.01386, over 15376.00 frames. ], tot_loss[loss=0.08059, simple_loss=0.1009, pruned_loss=0.02009, audio_tagging_loss=0.01005, over 3046420.68 frames. ], batch size: 60, lr: 5.38e-03, grad_scale: 16.0 2023-11-20 06:37:22,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=979553.3333333334, ans=0.0 2023-11-20 06:37:39,798 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 146950 2023-11-20 06:37:41,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=979620.0, ans=0.1 2023-11-20 06:37:45,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=979686.6666666666, ans=0.2 2023-11-20 06:37:46,610 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.37 vs. limit=6.0 2023-11-20 06:37:52,172 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 06:38:07,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=979753.3333333334, ans=0.05 2023-11-20 06:38:15,813 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.52 vs. limit=22.5 2023-11-20 06:38:18,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=979820.0, ans=0.125 2023-11-20 06:38:24,563 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 2700, loss[loss=0.1316, simple_loss=0.1619, pruned_loss=0.0447, audio_tagging_loss=0.005917, over 15394.00 frames. ], tot_loss[loss=0.08053, simple_loss=0.1012, pruned_loss=0.02, audio_tagging_loss=0.009941, over 3051177.38 frames. ], batch size: 56, lr: 5.38e-03, grad_scale: 16.0 2023-11-20 06:38:30,368 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 06:38:37,519 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.654e+01 8.126e+01 8.706e+01 9.526e+01 1.459e+02, threshold=1.741e+02, percent-clipped=0.0 2023-11-20 06:38:43,840 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 147000 2023-11-20 06:38:47,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=979953.3333333334, ans=0.0 2023-11-20 06:38:53,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=980020.0, ans=0.125 2023-11-20 06:39:00,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=980020.0, ans=0.125 2023-11-20 06:39:29,318 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 2750, loss[loss=0.07814, simple_loss=0.09447, pruned_loss=0.02098, audio_tagging_loss=0.009934, over 16003.00 frames. ], tot_loss[loss=0.08013, simple_loss=0.1006, pruned_loss=0.01993, audio_tagging_loss=0.009911, over 3053518.33 frames. ], batch size: 60, lr: 5.37e-03, grad_scale: 16.0 2023-11-20 06:39:32,560 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.81 vs. limit=12.0 2023-11-20 06:39:34,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=980220.0, ans=0.125 2023-11-20 06:39:49,307 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 147050 2023-11-20 06:39:55,829 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.60 vs. limit=15.0 2023-11-20 06:39:59,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=980353.3333333334, ans=0.0 2023-11-20 06:40:23,479 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 06:40:28,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=980486.6666666666, ans=0.125 2023-11-20 06:40:28,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=980486.6666666666, ans=0.0 2023-11-20 06:40:30,755 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.46 vs. limit=15.0 2023-11-20 06:40:33,954 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 2800, loss[loss=0.06662, simple_loss=0.07094, pruned_loss=0.01727, audio_tagging_loss=0.01388, over 15519.00 frames. ], tot_loss[loss=0.0796, simple_loss=0.09968, pruned_loss=0.01978, audio_tagging_loss=0.009981, over 3048174.37 frames. ], batch size: 60, lr: 5.37e-03, grad_scale: 16.0 2023-11-20 06:40:37,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=980553.3333333334, ans=0.125 2023-11-20 06:40:40,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=980553.3333333334, ans=0.0 2023-11-20 06:40:42,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=980553.3333333334, ans=0.125 2023-11-20 06:40:48,688 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.373e+01 8.295e+01 8.926e+01 1.019e+02 1.823e+02, threshold=1.785e+02, percent-clipped=1.0 2023-11-20 06:40:53,885 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 147100 2023-11-20 06:41:03,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=980686.6666666666, ans=0.0 2023-11-20 06:41:05,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=980686.6666666666, ans=0.125 2023-11-20 06:41:27,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=980820.0, ans=0.2 2023-11-20 06:41:39,237 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 2850, loss[loss=0.05677, simple_loss=0.05773, pruned_loss=0.01122, audio_tagging_loss=0.01669, over 14491.00 frames. ], tot_loss[loss=0.07936, simple_loss=0.09927, pruned_loss=0.01975, audio_tagging_loss=0.009977, over 3046566.83 frames. ], batch size: 55, lr: 5.37e-03, grad_scale: 16.0 2023-11-20 06:41:40,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=980886.6666666666, ans=0.125 2023-11-20 06:41:58,552 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 147150 2023-11-20 06:42:21,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=981086.6666666666, ans=0.2 2023-11-20 06:42:26,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=981086.6666666666, ans=0.04949747468305833 2023-11-20 06:42:27,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=981086.6666666666, ans=0.125 2023-11-20 06:42:31,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=981153.3333333334, ans=0.125 2023-11-20 06:42:43,710 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 2900, loss[loss=0.08415, simple_loss=0.1064, pruned_loss=0.02058, audio_tagging_loss=0.01039, over 15298.00 frames. ], tot_loss[loss=0.07979, simple_loss=0.1001, pruned_loss=0.0198, audio_tagging_loss=0.009963, over 3046517.58 frames. ], batch size: 57, lr: 5.37e-03, grad_scale: 16.0 2023-11-20 06:42:46,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=981220.0, ans=0.0 2023-11-20 06:42:52,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=981220.0, ans=0.1 2023-11-20 06:42:52,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=981220.0, ans=0.1 2023-11-20 06:42:57,847 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.790e+01 8.308e+01 8.941e+01 9.842e+01 1.369e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-20 06:43:02,737 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 147200 2023-11-20 06:43:30,436 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.69 vs. limit=6.0 2023-11-20 06:43:45,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=981486.6666666666, ans=0.0 2023-11-20 06:43:48,107 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 2950, loss[loss=0.08582, simple_loss=0.1034, pruned_loss=0.02223, audio_tagging_loss=0.01191, over 16398.00 frames. ], tot_loss[loss=0.08035, simple_loss=0.1001, pruned_loss=0.02012, audio_tagging_loss=0.01017, over 3043653.14 frames. ], batch size: 62, lr: 5.37e-03, grad_scale: 16.0 2023-11-20 06:43:48,634 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.65 vs. limit=15.0 2023-11-20 06:44:02,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=981620.0, ans=0.125 2023-11-20 06:44:07,683 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 147250 2023-11-20 06:44:44,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=981820.0, ans=0.0 2023-11-20 06:44:53,071 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 3000, loss[loss=0.1105, simple_loss=0.1261, pruned_loss=0.03628, audio_tagging_loss=0.01111, over 14988.00 frames. ], tot_loss[loss=0.08122, simple_loss=0.101, pruned_loss=0.02044, audio_tagging_loss=0.01028, over 3046565.81 frames. ], batch size: 55, lr: 5.37e-03, grad_scale: 16.0 2023-11-20 06:44:53,072 INFO [train_asr.py:1285] (1/4) Computing validation loss 2023-11-20 06:45:10,979 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.2712, 5.0603, 4.4196, 4.9252], device='cuda:1') 2023-11-20 06:45:11,531 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.3961, 3.5744, 3.2657, 3.1124], device='cuda:1') 2023-11-20 06:45:31,937 INFO [train_asr.py:1294] (1/4) Epoch 13, validation: loss=0.06242, simple_loss=0.05394, pruned_loss=0.005804, audio_tagging_loss=0.02964, over 4681554.00 frames. 2023-11-20 06:45:31,938 INFO [train_asr.py:1295] (1/4) Maximum memory allocated so far is 26082MB 2023-11-20 06:45:35,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=981886.6666666666, ans=0.0 2023-11-20 06:45:46,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=981953.3333333334, ans=15.0 2023-11-20 06:45:46,809 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.395e+01 8.201e+01 8.897e+01 9.903e+01 1.229e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-20 06:45:52,537 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 147300 2023-11-20 06:45:56,543 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=981953.3333333334, ans=0.125 2023-11-20 06:46:00,460 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.05 vs. limit=15.0 2023-11-20 06:46:06,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=982020.0, ans=0.1 2023-11-20 06:46:37,652 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 3050, loss[loss=0.1019, simple_loss=0.1351, pruned_loss=0.02372, audio_tagging_loss=0.01062, over 14640.00 frames. ], tot_loss[loss=0.08116, simple_loss=0.101, pruned_loss=0.02037, audio_tagging_loss=0.01029, over 3040840.37 frames. ], batch size: 53, lr: 5.37e-03, grad_scale: 16.0 2023-11-20 06:46:48,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=982220.0, ans=0.125 2023-11-20 06:46:57,050 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 147350 2023-11-20 06:47:02,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=982353.3333333334, ans=0.125 2023-11-20 06:47:05,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=982353.3333333334, ans=0.2 2023-11-20 06:47:05,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=982353.3333333334, ans=0.125 2023-11-20 06:47:12,997 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 06:47:15,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=982420.0, ans=0.95 2023-11-20 06:47:23,925 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2023-11-20 06:47:32,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=982486.6666666666, ans=0.0 2023-11-20 06:47:42,629 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 3100, loss[loss=0.07506, simple_loss=0.08973, pruned_loss=0.0188, audio_tagging_loss=0.0114, over 14625.00 frames. ], tot_loss[loss=0.08176, simple_loss=0.102, pruned_loss=0.02055, audio_tagging_loss=0.01023, over 3037650.29 frames. ], batch size: 57, lr: 5.37e-03, grad_scale: 16.0 2023-11-20 06:47:55,953 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.221e+01 8.032e+01 8.672e+01 9.635e+01 1.172e+02, threshold=1.734e+02, percent-clipped=0.0 2023-11-20 06:48:00,931 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 147400 2023-11-20 06:48:01,564 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.64 vs. limit=15.0 2023-11-20 06:48:15,030 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 06:48:15,363 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.04 vs. limit=22.5 2023-11-20 06:48:31,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=982753.3333333334, ans=0.2 2023-11-20 06:48:34,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=982820.0, ans=0.1 2023-11-20 06:48:35,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=982820.0, ans=0.125 2023-11-20 06:48:38,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=982820.0, ans=0.0 2023-11-20 06:48:47,377 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 3150, loss[loss=0.08864, simple_loss=0.112, pruned_loss=0.02216, audio_tagging_loss=0.01049, over 15127.00 frames. ], tot_loss[loss=0.08221, simple_loss=0.1025, pruned_loss=0.02063, audio_tagging_loss=0.01031, over 3038341.87 frames. ], batch size: 56, lr: 5.37e-03, grad_scale: 16.0 2023-11-20 06:48:57,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=982886.6666666666, ans=0.2 2023-11-20 06:49:01,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=982953.3333333334, ans=0.0 2023-11-20 06:49:06,545 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 147450 2023-11-20 06:49:14,992 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.13 vs. limit=22.5 2023-11-20 06:49:15,041 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.33 vs. limit=15.0 2023-11-20 06:49:24,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=983020.0, ans=0.0 2023-11-20 06:49:35,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=983086.6666666666, ans=10.0 2023-11-20 06:49:47,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=983153.3333333334, ans=0.0 2023-11-20 06:49:47,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=983153.3333333334, ans=0.2 2023-11-20 06:49:52,029 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 3200, loss[loss=0.06509, simple_loss=0.08298, pruned_loss=0.01486, audio_tagging_loss=0.008743, over 16194.00 frames. ], tot_loss[loss=0.08203, simple_loss=0.1023, pruned_loss=0.0206, audio_tagging_loss=0.01029, over 3035223.18 frames. ], batch size: 60, lr: 5.37e-03, grad_scale: 32.0 2023-11-20 06:50:02,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=983220.0, ans=0.125 2023-11-20 06:50:06,128 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.614e+01 8.160e+01 8.999e+01 9.638e+01 2.555e+02, threshold=1.800e+02, percent-clipped=1.0 2023-11-20 06:50:11,816 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 147500 2023-11-20 06:50:13,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=983286.6666666666, ans=0.125 2023-11-20 06:50:38,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=983420.0, ans=0.125 2023-11-20 06:50:56,507 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 3250, loss[loss=0.08514, simple_loss=0.1033, pruned_loss=0.02308, audio_tagging_loss=0.01042, over 14872.00 frames. ], tot_loss[loss=0.08193, simple_loss=0.102, pruned_loss=0.02049, audio_tagging_loss=0.01042, over 3037746.10 frames. ], batch size: 56, lr: 5.37e-03, grad_scale: 32.0 2023-11-20 06:51:12,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=983620.0, ans=0.1 2023-11-20 06:51:15,518 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 147550 2023-11-20 06:51:18,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=983620.0, ans=0.1 2023-11-20 06:51:31,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=983686.6666666666, ans=0.125 2023-11-20 06:52:00,950 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 3300, loss[loss=0.07066, simple_loss=0.08485, pruned_loss=0.01689, audio_tagging_loss=0.01134, over 14614.00 frames. ], tot_loss[loss=0.08149, simple_loss=0.1011, pruned_loss=0.02035, audio_tagging_loss=0.01059, over 3040835.59 frames. ], batch size: 56, lr: 5.36e-03, grad_scale: 32.0 2023-11-20 06:52:03,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=983886.6666666666, ans=0.04949747468305833 2023-11-20 06:52:10,236 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.77 vs. limit=15.0 2023-11-20 06:52:13,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=983953.3333333334, ans=0.0 2023-11-20 06:52:14,561 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.228e+01 8.367e+01 9.193e+01 1.015e+02 1.401e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-20 06:52:20,244 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 147600 2023-11-20 06:52:20,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=983953.3333333334, ans=0.125 2023-11-20 06:52:47,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=984086.6666666666, ans=0.95 2023-11-20 06:53:05,445 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 3350, loss[loss=0.106, simple_loss=0.1355, pruned_loss=0.03294, audio_tagging_loss=0.005315, over 14887.00 frames. ], tot_loss[loss=0.08125, simple_loss=0.1012, pruned_loss=0.02027, audio_tagging_loss=0.01038, over 3038326.28 frames. ], batch size: 54, lr: 5.36e-03, grad_scale: 32.0 2023-11-20 06:53:19,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=984286.6666666666, ans=0.2 2023-11-20 06:53:26,310 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 147650 2023-11-20 06:53:27,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=984286.6666666666, ans=0.125 2023-11-20 06:53:57,608 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.58 vs. limit=6.0 2023-11-20 06:54:11,599 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 3400, loss[loss=0.08911, simple_loss=0.1103, pruned_loss=0.02523, audio_tagging_loss=0.008709, over 17469.00 frames. ], tot_loss[loss=0.08088, simple_loss=0.1012, pruned_loss=0.02007, audio_tagging_loss=0.0102, over 3044595.39 frames. ], batch size: 65, lr: 5.36e-03, grad_scale: 32.0 2023-11-20 06:54:25,756 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.861e+01 8.407e+01 9.126e+01 1.002e+02 1.896e+02, threshold=1.825e+02, percent-clipped=1.0 2023-11-20 06:54:30,988 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 147700 2023-11-20 06:54:42,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=984686.6666666666, ans=0.125 2023-11-20 06:54:42,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=984686.6666666666, ans=0.125 2023-11-20 06:54:48,354 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.62 vs. limit=6.0 2023-11-20 06:54:50,945 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=984753.3333333334, ans=0.1 2023-11-20 06:55:16,229 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 3450, loss[loss=0.09167, simple_loss=0.1167, pruned_loss=0.02464, audio_tagging_loss=0.008658, over 15513.00 frames. ], tot_loss[loss=0.08065, simple_loss=0.101, pruned_loss=0.02006, audio_tagging_loss=0.01009, over 3047388.43 frames. ], batch size: 56, lr: 5.36e-03, grad_scale: 32.0 2023-11-20 06:55:21,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=984886.6666666666, ans=0.0 2023-11-20 06:55:27,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=984953.3333333334, ans=0.2 2023-11-20 06:55:35,433 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 147750 2023-11-20 06:55:38,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=984953.3333333334, ans=0.125 2023-11-20 06:56:12,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=985153.3333333334, ans=0.1 2023-11-20 06:56:19,957 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 3500, loss[loss=0.08263, simple_loss=0.1043, pruned_loss=0.02279, audio_tagging_loss=0.007677, over 14662.00 frames. ], tot_loss[loss=0.08156, simple_loss=0.1023, pruned_loss=0.0205, audio_tagging_loss=0.009902, over 3043875.44 frames. ], batch size: 55, lr: 5.36e-03, grad_scale: 32.0 2023-11-20 06:56:24,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=985220.0, ans=0.125 2023-11-20 06:56:31,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=985220.0, ans=0.125 2023-11-20 06:56:31,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=985220.0, ans=0.125 2023-11-20 06:56:34,555 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.948e+01 8.197e+01 8.889e+01 1.015e+02 2.808e+02, threshold=1.778e+02, percent-clipped=1.0 2023-11-20 06:56:40,049 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 147800 2023-11-20 06:56:41,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=985286.6666666666, ans=0.0 2023-11-20 06:56:44,903 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.28 vs. limit=15.0 2023-11-20 06:56:52,864 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 06:57:11,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=985486.6666666666, ans=0.2 2023-11-20 06:57:24,522 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 3550, loss[loss=0.09263, simple_loss=0.1144, pruned_loss=0.02654, audio_tagging_loss=0.008911, over 15184.00 frames. ], tot_loss[loss=0.08111, simple_loss=0.1015, pruned_loss=0.02042, audio_tagging_loss=0.009946, over 3046355.11 frames. ], batch size: 56, lr: 5.36e-03, grad_scale: 16.0 2023-11-20 06:57:40,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=985620.0, ans=0.0 2023-11-20 06:57:44,152 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 147850 2023-11-20 06:57:56,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=985686.6666666666, ans=0.2 2023-11-20 06:58:09,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=985753.3333333334, ans=0.0 2023-11-20 06:58:29,818 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 3600, loss[loss=0.06688, simple_loss=0.07919, pruned_loss=0.01516, audio_tagging_loss=0.01213, over 14574.00 frames. ], tot_loss[loss=0.08041, simple_loss=0.1004, pruned_loss=0.02022, audio_tagging_loss=0.009981, over 3043077.29 frames. ], batch size: 56, lr: 5.36e-03, grad_scale: 32.0 2023-11-20 06:58:44,592 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.072e+01 8.078e+01 8.812e+01 9.931e+01 2.962e+02, threshold=1.762e+02, percent-clipped=1.0 2023-11-20 06:58:47,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=985953.3333333334, ans=0.125 2023-11-20 06:58:48,346 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 147900 2023-11-20 06:59:03,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=986020.0, ans=0.2 2023-11-20 06:59:33,138 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 3650, loss[loss=0.08451, simple_loss=0.1089, pruned_loss=0.02059, audio_tagging_loss=0.009491, over 14611.00 frames. ], tot_loss[loss=0.08066, simple_loss=0.1011, pruned_loss=0.02031, audio_tagging_loss=0.009815, over 3042939.66 frames. ], batch size: 56, lr: 5.36e-03, grad_scale: 32.0 2023-11-20 06:59:52,928 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 147950 2023-11-20 07:00:06,693 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.15 vs. limit=22.5 2023-11-20 07:00:08,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=986353.3333333334, ans=0.125 2023-11-20 07:00:28,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=986486.6666666666, ans=0.125 2023-11-20 07:00:34,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=986486.6666666666, ans=0.07 2023-11-20 07:00:38,022 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 3700, loss[loss=0.06138, simple_loss=0.0728, pruned_loss=0.01466, audio_tagging_loss=0.01031, over 15597.00 frames. ], tot_loss[loss=0.08099, simple_loss=0.1014, pruned_loss=0.02046, audio_tagging_loss=0.009814, over 3044831.83 frames. ], batch size: 61, lr: 5.36e-03, grad_scale: 32.0 2023-11-20 07:00:53,828 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.585e+01 7.922e+01 8.709e+01 9.261e+01 1.390e+02, threshold=1.742e+02, percent-clipped=0.0 2023-11-20 07:00:56,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=986620.0, ans=0.0 2023-11-20 07:00:57,664 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 148000 2023-11-20 07:00:59,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=986620.0, ans=0.125 2023-11-20 07:01:28,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=986753.3333333334, ans=0.1 2023-11-20 07:01:47,062 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 3750, loss[loss=0.1048, simple_loss=0.13, pruned_loss=0.02944, audio_tagging_loss=0.01036, over 15191.00 frames. ], tot_loss[loss=0.08247, simple_loss=0.1031, pruned_loss=0.02102, audio_tagging_loss=0.009926, over 3050103.50 frames. ], batch size: 55, lr: 5.36e-03, grad_scale: 32.0 2023-11-20 07:01:48,608 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 07:01:54,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=986886.6666666666, ans=0.125 2023-11-20 07:02:00,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=986953.3333333334, ans=0.0 2023-11-20 07:02:05,697 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 148050 2023-11-20 07:02:26,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=987086.6666666666, ans=0.125 2023-11-20 07:02:30,754 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 07:02:42,063 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.02 vs. limit=22.5 2023-11-20 07:02:42,167 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.69 vs. limit=15.0 2023-11-20 07:02:51,335 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 3800, loss[loss=0.04414, simple_loss=0.0519, pruned_loss=0.007351, audio_tagging_loss=0.01084, over 14561.00 frames. ], tot_loss[loss=0.08176, simple_loss=0.102, pruned_loss=0.02074, audio_tagging_loss=0.01004, over 3045920.91 frames. ], batch size: 57, lr: 5.36e-03, grad_scale: 32.0 2023-11-20 07:02:55,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=987220.0, ans=0.0 2023-11-20 07:03:07,209 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.518e+01 8.071e+01 8.832e+01 9.688e+01 1.208e+02, threshold=1.766e+02, percent-clipped=0.0 2023-11-20 07:03:10,935 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 148100 2023-11-20 07:03:11,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=987286.6666666666, ans=0.2 2023-11-20 07:03:14,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=987286.6666666666, ans=0.125 2023-11-20 07:03:26,000 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.88 vs. limit=15.0 2023-11-20 07:03:26,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=987353.3333333334, ans=0.0 2023-11-20 07:03:31,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=987420.0, ans=0.125 2023-11-20 07:03:55,653 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 3850, loss[loss=0.07344, simple_loss=0.09064, pruned_loss=0.01902, audio_tagging_loss=0.009097, over 15935.00 frames. ], tot_loss[loss=0.08168, simple_loss=0.1018, pruned_loss=0.02065, audio_tagging_loss=0.01011, over 3040450.91 frames. ], batch size: 61, lr: 5.35e-03, grad_scale: 32.0 2023-11-20 07:04:05,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=987553.3333333334, ans=0.125 2023-11-20 07:04:08,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=987620.0, ans=0.2 2023-11-20 07:04:09,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=987620.0, ans=0.0 2023-11-20 07:04:15,190 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 148150 2023-11-20 07:04:15,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=987620.0, ans=0.2 2023-11-20 07:04:30,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=987686.6666666666, ans=0.1 2023-11-20 07:04:30,416 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.74 vs. limit=22.5 2023-11-20 07:04:33,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=987753.3333333334, ans=0.125 2023-11-20 07:04:42,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=987753.3333333334, ans=0.0 2023-11-20 07:04:53,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=987820.0, ans=0.125 2023-11-20 07:05:00,260 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 3900, loss[loss=0.05755, simple_loss=0.06759, pruned_loss=0.0153, audio_tagging_loss=0.008456, over 14318.00 frames. ], tot_loss[loss=0.08163, simple_loss=0.1016, pruned_loss=0.02075, audio_tagging_loss=0.01006, over 3051636.97 frames. ], batch size: 56, lr: 5.35e-03, grad_scale: 32.0 2023-11-20 07:05:15,600 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.880e+01 8.310e+01 9.082e+01 9.917e+01 1.355e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-20 07:05:19,472 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 148200 2023-11-20 07:05:22,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=987953.3333333334, ans=10.0 2023-11-20 07:05:32,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=988020.0, ans=0.2 2023-11-20 07:05:46,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=988086.6666666666, ans=0.1 2023-11-20 07:05:52,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=988153.3333333334, ans=0.2 2023-11-20 07:06:05,385 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 3950, loss[loss=0.1091, simple_loss=0.137, pruned_loss=0.03235, audio_tagging_loss=0.008302, over 15425.00 frames. ], tot_loss[loss=0.08094, simple_loss=0.1009, pruned_loss=0.02032, audio_tagging_loss=0.01018, over 3054922.17 frames. ], batch size: 56, lr: 5.35e-03, grad_scale: 32.0 2023-11-20 07:06:21,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=988286.6666666666, ans=0.0 2023-11-20 07:06:25,456 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 148250 2023-11-20 07:06:31,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=988353.3333333334, ans=0.0 2023-11-20 07:06:47,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=988420.0, ans=0.2 2023-11-20 07:06:53,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=988420.0, ans=0.125 2023-11-20 07:07:01,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=988486.6666666666, ans=0.0 2023-11-20 07:07:03,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=988486.6666666666, ans=0.04949747468305833 2023-11-20 07:07:10,789 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 4000, loss[loss=0.07124, simple_loss=0.0847, pruned_loss=0.01707, audio_tagging_loss=0.01181, over 15020.00 frames. ], tot_loss[loss=0.08141, simple_loss=0.1014, pruned_loss=0.02047, audio_tagging_loss=0.01023, over 3056186.58 frames. ], batch size: 57, lr: 5.35e-03, grad_scale: 32.0 2023-11-20 07:07:11,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=988553.3333333334, ans=0.1 2023-11-20 07:07:17,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=988553.3333333334, ans=0.0 2023-11-20 07:07:26,773 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.067e+01 8.419e+01 9.137e+01 9.996e+01 1.272e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-20 07:07:28,657 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.33 vs. limit=15.0 2023-11-20 07:07:30,567 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 148300 2023-11-20 07:08:16,237 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 4050, loss[loss=0.07037, simple_loss=0.08199, pruned_loss=0.01535, audio_tagging_loss=0.01402, over 15511.00 frames. ], tot_loss[loss=0.08065, simple_loss=0.1005, pruned_loss=0.0201, audio_tagging_loss=0.01029, over 3050263.39 frames. ], batch size: 62, lr: 5.35e-03, grad_scale: 32.0 2023-11-20 07:08:18,719 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 07:08:23,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=988886.6666666666, ans=0.125 2023-11-20 07:08:34,940 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 148350 2023-11-20 07:08:38,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=988953.3333333334, ans=0.125 2023-11-20 07:09:06,569 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.41 vs. limit=12.0 2023-11-20 07:09:20,353 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 4100, loss[loss=0.0816, simple_loss=0.1013, pruned_loss=0.02169, audio_tagging_loss=0.009279, over 14504.00 frames. ], tot_loss[loss=0.08092, simple_loss=0.1009, pruned_loss=0.02009, audio_tagging_loss=0.01036, over 3047052.23 frames. ], batch size: 55, lr: 5.35e-03, grad_scale: 16.0 2023-11-20 07:09:37,911 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.747e+01 7.925e+01 8.541e+01 9.436e+01 1.128e+02, threshold=1.708e+02, percent-clipped=0.0 2023-11-20 07:09:39,213 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 148400 2023-11-20 07:10:24,273 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 4150, loss[loss=0.08437, simple_loss=0.1056, pruned_loss=0.0208, audio_tagging_loss=0.01079, over 15423.00 frames. ], tot_loss[loss=0.08116, simple_loss=0.1013, pruned_loss=0.02026, audio_tagging_loss=0.01023, over 3044994.39 frames. ], batch size: 58, lr: 5.35e-03, grad_scale: 8.0 2023-11-20 07:10:31,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=989553.3333333334, ans=0.125 2023-11-20 07:10:44,136 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 148450 2023-11-20 07:11:01,333 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.90 vs. limit=22.5 2023-11-20 07:11:03,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=989753.3333333334, ans=0.125 2023-11-20 07:11:09,837 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 07:11:14,511 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.35 vs. limit=22.5 2023-11-20 07:11:21,249 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.22 vs. limit=8.0 2023-11-20 07:11:28,773 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 4200, loss[loss=0.1071, simple_loss=0.1291, pruned_loss=0.03203, audio_tagging_loss=0.01054, over 15385.00 frames. ], tot_loss[loss=0.08091, simple_loss=0.1011, pruned_loss=0.02027, audio_tagging_loss=0.01009, over 3039827.07 frames. ], batch size: 55, lr: 5.35e-03, grad_scale: 8.0 2023-11-20 07:11:37,208 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.84 vs. limit=15.0 2023-11-20 07:11:46,388 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.989e+01 8.148e+01 8.675e+01 9.910e+01 1.292e+02, threshold=1.735e+02, percent-clipped=0.0 2023-11-20 07:11:47,740 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 148500 2023-11-20 07:12:18,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=990086.6666666666, ans=0.0 2023-11-20 07:12:32,953 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 4250, loss[loss=0.09176, simple_loss=0.1195, pruned_loss=0.0217, audio_tagging_loss=0.01032, over 15469.00 frames. ], tot_loss[loss=0.08117, simple_loss=0.1017, pruned_loss=0.02029, audio_tagging_loss=0.01006, over 3035658.93 frames. ], batch size: 57, lr: 5.35e-03, grad_scale: 8.0 2023-11-20 07:12:34,977 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.22 vs. limit=15.0 2023-11-20 07:12:44,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=990286.6666666666, ans=0.0 2023-11-20 07:12:47,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=990286.6666666666, ans=0.125 2023-11-20 07:12:50,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=990286.6666666666, ans=0.0 2023-11-20 07:12:52,610 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 148550 2023-11-20 07:12:55,740 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.02 vs. limit=15.0 2023-11-20 07:13:14,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=990420.0, ans=0.125 2023-11-20 07:13:21,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=990420.0, ans=0.125 2023-11-20 07:13:31,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=990486.6666666666, ans=0.125 2023-11-20 07:13:37,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=990553.3333333334, ans=0.035 2023-11-20 07:13:38,287 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 4300, loss[loss=0.06658, simple_loss=0.08681, pruned_loss=0.01575, audio_tagging_loss=0.007432, over 15082.00 frames. ], tot_loss[loss=0.08141, simple_loss=0.1021, pruned_loss=0.02032, audio_tagging_loss=0.01005, over 3033847.88 frames. ], batch size: 58, lr: 5.35e-03, grad_scale: 8.0 2023-11-20 07:13:48,022 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.74 vs. limit=12.0 2023-11-20 07:13:56,607 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.158e+01 8.001e+01 8.705e+01 9.608e+01 1.261e+02, threshold=1.741e+02, percent-clipped=0.0 2023-11-20 07:13:57,915 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 148600 2023-11-20 07:13:59,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=990620.0, ans=0.125 2023-11-20 07:13:59,741 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.56 vs. limit=15.0 2023-11-20 07:14:13,544 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.60 vs. limit=15.0 2023-11-20 07:14:23,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=990753.3333333334, ans=0.1 2023-11-20 07:14:39,838 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=990820.0, ans=0.0 2023-11-20 07:14:42,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=990886.6666666666, ans=0.0 2023-11-20 07:14:43,233 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 4350, loss[loss=0.06982, simple_loss=0.08179, pruned_loss=0.01756, audio_tagging_loss=0.01137, over 14994.00 frames. ], tot_loss[loss=0.08135, simple_loss=0.1021, pruned_loss=0.02043, audio_tagging_loss=0.009888, over 3026301.09 frames. ], batch size: 57, lr: 5.35e-03, grad_scale: 8.0 2023-11-20 07:14:44,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=990886.6666666666, ans=0.0 2023-11-20 07:15:02,485 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 148650 2023-11-20 07:15:05,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=990953.3333333334, ans=0.125 2023-11-20 07:15:16,008 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.40 vs. limit=15.0 2023-11-20 07:15:21,552 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.90 vs. limit=15.0 2023-11-20 07:15:40,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=991153.3333333334, ans=0.1 2023-11-20 07:15:47,992 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 4400, loss[loss=0.0647, simple_loss=0.08397, pruned_loss=0.0127, audio_tagging_loss=0.01001, over 14451.00 frames. ], tot_loss[loss=0.081, simple_loss=0.1016, pruned_loss=0.02031, audio_tagging_loss=0.009882, over 3029150.84 frames. ], batch size: 55, lr: 5.34e-03, grad_scale: 16.0 2023-11-20 07:16:05,879 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.660e+01 8.299e+01 8.915e+01 9.764e+01 1.212e+02, threshold=1.783e+02, percent-clipped=0.0 2023-11-20 07:16:07,180 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 148700 2023-11-20 07:16:15,411 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.31 vs. limit=15.0 2023-11-20 07:16:17,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=991353.3333333334, ans=0.125 2023-11-20 07:16:29,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=991420.0, ans=0.05 2023-11-20 07:16:49,344 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.27 vs. limit=15.0 2023-11-20 07:16:52,165 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 4450, loss[loss=0.09226, simple_loss=0.124, pruned_loss=0.02217, audio_tagging_loss=0.0081, over 16749.00 frames. ], tot_loss[loss=0.08123, simple_loss=0.1019, pruned_loss=0.02042, audio_tagging_loss=0.009869, over 3037570.05 frames. ], batch size: 60, lr: 5.34e-03, grad_scale: 16.0 2023-11-20 07:17:12,474 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 148750 2023-11-20 07:17:21,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=991686.6666666666, ans=0.1 2023-11-20 07:17:30,849 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.21 vs. limit=15.0 2023-11-20 07:17:48,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=991820.0, ans=0.125 2023-11-20 07:17:48,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=991820.0, ans=0.1 2023-11-20 07:17:56,133 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.80 vs. limit=15.0 2023-11-20 07:17:56,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=991886.6666666666, ans=0.125 2023-11-20 07:17:58,001 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 4500, loss[loss=0.07601, simple_loss=0.1054, pruned_loss=0.0147, audio_tagging_loss=0.008617, over 15109.00 frames. ], tot_loss[loss=0.08108, simple_loss=0.1018, pruned_loss=0.02024, audio_tagging_loss=0.009939, over 3038062.96 frames. ], batch size: 55, lr: 5.34e-03, grad_scale: 16.0 2023-11-20 07:17:59,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys.whitening_limit, batch_count=991886.6666666666, ans=6.0 2023-11-20 07:18:12,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=991953.3333333334, ans=0.025 2023-11-20 07:18:12,305 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 07:18:13,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=991953.3333333334, ans=0.0 2023-11-20 07:18:14,284 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.59 vs. limit=6.0 2023-11-20 07:18:15,712 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.852e+01 8.390e+01 8.916e+01 9.990e+01 1.363e+02, threshold=1.783e+02, percent-clipped=0.0 2023-11-20 07:18:17,037 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 148800 2023-11-20 07:18:28,959 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.09 vs. limit=15.0 2023-11-20 07:18:34,649 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.09 vs. limit=22.5 2023-11-20 07:18:46,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=992086.6666666666, ans=0.125 2023-11-20 07:18:57,384 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.72 vs. limit=22.5 2023-11-20 07:19:02,817 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 4550, loss[loss=0.07788, simple_loss=0.103, pruned_loss=0.01852, audio_tagging_loss=0.007869, over 15242.00 frames. ], tot_loss[loss=0.08096, simple_loss=0.1016, pruned_loss=0.02016, audio_tagging_loss=0.01001, over 3039890.02 frames. ], batch size: 56, lr: 5.34e-03, grad_scale: 16.0 2023-11-20 07:19:04,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=992220.0, ans=0.125 2023-11-20 07:19:21,510 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 148850 2023-11-20 07:19:21,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=992286.6666666666, ans=0.1 2023-11-20 07:19:36,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=992353.3333333334, ans=0.0 2023-11-20 07:19:48,627 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.19 vs. limit=22.5 2023-11-20 07:19:51,853 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 07:19:58,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=992486.6666666666, ans=0.2 2023-11-20 07:20:04,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=992486.6666666666, ans=0.2 2023-11-20 07:20:06,664 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 4600, loss[loss=0.07685, simple_loss=0.09477, pruned_loss=0.01615, audio_tagging_loss=0.01331, over 14845.00 frames. ], tot_loss[loss=0.0803, simple_loss=0.1005, pruned_loss=0.01989, audio_tagging_loss=0.01017, over 3034475.70 frames. ], batch size: 55, lr: 5.34e-03, grad_scale: 16.0 2023-11-20 07:20:17,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=992553.3333333334, ans=0.125 2023-11-20 07:20:18,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=992620.0, ans=0.125 2023-11-20 07:20:25,854 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.427e+01 8.263e+01 8.928e+01 9.927e+01 1.394e+02, threshold=1.786e+02, percent-clipped=0.0 2023-11-20 07:20:27,178 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 148900 2023-11-20 07:20:59,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=992820.0, ans=0.125 2023-11-20 07:21:11,561 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 4650, loss[loss=0.07627, simple_loss=0.09963, pruned_loss=0.01777, audio_tagging_loss=0.008691, over 14659.00 frames. ], tot_loss[loss=0.08, simple_loss=0.09966, pruned_loss=0.01984, audio_tagging_loss=0.01034, over 3036317.55 frames. ], batch size: 55, lr: 5.34e-03, grad_scale: 16.0 2023-11-20 07:21:16,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=992886.6666666666, ans=0.125 2023-11-20 07:21:31,211 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 148950 2023-11-20 07:21:44,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=993020.0, ans=0.125 2023-11-20 07:22:12,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=993153.3333333334, ans=0.2 2023-11-20 07:22:16,906 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 4700, loss[loss=0.08324, simple_loss=0.09487, pruned_loss=0.02419, audio_tagging_loss=0.01163, over 14150.00 frames. ], tot_loss[loss=0.07956, simple_loss=0.09884, pruned_loss=0.01977, audio_tagging_loss=0.01037, over 3042494.23 frames. ], batch size: 56, lr: 5.34e-03, grad_scale: 16.0 2023-11-20 07:22:34,027 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.986e+01 8.015e+01 8.729e+01 9.415e+01 1.258e+02, threshold=1.746e+02, percent-clipped=0.0 2023-11-20 07:22:35,406 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 149000 2023-11-20 07:22:49,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=993353.3333333334, ans=0.0 2023-11-20 07:22:50,752 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 07:22:59,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=993420.0, ans=0.125 2023-11-20 07:23:09,997 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.87 vs. limit=22.5 2023-11-20 07:23:10,982 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.41 vs. limit=15.0 2023-11-20 07:23:21,508 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 4750, loss[loss=0.0705, simple_loss=0.07294, pruned_loss=0.02157, audio_tagging_loss=0.01246, over 14369.00 frames. ], tot_loss[loss=0.07983, simple_loss=0.09905, pruned_loss=0.01992, audio_tagging_loss=0.01039, over 3036981.92 frames. ], batch size: 56, lr: 5.34e-03, grad_scale: 16.0 2023-11-20 07:23:25,733 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.52 vs. limit=10.0 2023-11-20 07:23:37,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=993620.0, ans=0.125 2023-11-20 07:23:41,193 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 149050 2023-11-20 07:23:50,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=993686.6666666666, ans=0.125 2023-11-20 07:23:55,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=993686.6666666666, ans=0.1 2023-11-20 07:23:56,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=993686.6666666666, ans=0.125 2023-11-20 07:24:09,570 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.58 vs. limit=12.0 2023-11-20 07:24:11,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=993820.0, ans=0.125 2023-11-20 07:24:25,455 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 4800, loss[loss=0.09733, simple_loss=0.1249, pruned_loss=0.02506, audio_tagging_loss=0.009842, over 15038.00 frames. ], tot_loss[loss=0.08005, simple_loss=0.09952, pruned_loss=0.01991, audio_tagging_loss=0.01039, over 3048923.65 frames. ], batch size: 56, lr: 5.34e-03, grad_scale: 32.0 2023-11-20 07:24:27,907 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.25 vs. limit=22.5 2023-11-20 07:24:32,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=993886.6666666666, ans=0.05 2023-11-20 07:24:44,664 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.552e+01 8.333e+01 8.832e+01 9.618e+01 1.335e+02, threshold=1.766e+02, percent-clipped=0.0 2023-11-20 07:24:46,055 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 149100 2023-11-20 07:24:52,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=994020.0, ans=0.125 2023-11-20 07:25:06,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=994086.6666666666, ans=0.125 2023-11-20 07:25:31,561 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 4850, loss[loss=0.06588, simple_loss=0.08278, pruned_loss=0.01411, audio_tagging_loss=0.01037, over 16210.00 frames. ], tot_loss[loss=0.07994, simple_loss=0.09918, pruned_loss=0.01979, audio_tagging_loss=0.01056, over 3049119.93 frames. ], batch size: 62, lr: 5.34e-03, grad_scale: 32.0 2023-11-20 07:25:49,743 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 149150 2023-11-20 07:26:18,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=994420.0, ans=0.125 2023-11-20 07:26:22,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=994486.6666666666, ans=0.125 2023-11-20 07:26:27,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=994486.6666666666, ans=0.2 2023-11-20 07:26:34,939 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 4900, loss[loss=0.06861, simple_loss=0.08168, pruned_loss=0.01703, audio_tagging_loss=0.01074, over 16283.00 frames. ], tot_loss[loss=0.07982, simple_loss=0.09912, pruned_loss=0.01977, audio_tagging_loss=0.01049, over 3048519.74 frames. ], batch size: 63, lr: 5.34e-03, grad_scale: 32.0 2023-11-20 07:26:42,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=994553.3333333334, ans=0.0 2023-11-20 07:26:42,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=994553.3333333334, ans=15.0 2023-11-20 07:26:52,652 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.754e+01 8.440e+01 9.252e+01 1.012e+02 1.571e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-20 07:26:54,704 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 149200 2023-11-20 07:27:09,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=994686.6666666666, ans=0.125 2023-11-20 07:27:21,360 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.77 vs. limit=15.0 2023-11-20 07:27:22,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=994753.3333333334, ans=0.1 2023-11-20 07:27:25,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=994820.0, ans=0.125 2023-11-20 07:27:31,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=994820.0, ans=0.125 2023-11-20 07:27:39,753 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 4950, loss[loss=0.09072, simple_loss=0.1137, pruned_loss=0.02532, audio_tagging_loss=0.008555, over 16409.00 frames. ], tot_loss[loss=0.08063, simple_loss=0.1009, pruned_loss=0.02, audio_tagging_loss=0.01018, over 3058993.17 frames. ], batch size: 62, lr: 5.33e-03, grad_scale: 32.0 2023-11-20 07:27:57,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=994953.3333333334, ans=10.0 2023-11-20 07:27:59,568 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 149250 2023-11-20 07:28:07,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=995020.0, ans=0.1 2023-11-20 07:28:16,534 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=995020.0, ans=0.1 2023-11-20 07:28:23,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=995086.6666666666, ans=0.0 2023-11-20 07:28:27,362 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.25 vs. limit=15.0 2023-11-20 07:28:28,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=995086.6666666666, ans=0.2 2023-11-20 07:28:30,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=995153.3333333334, ans=0.125 2023-11-20 07:28:31,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=995153.3333333334, ans=0.0 2023-11-20 07:28:35,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=995153.3333333334, ans=0.125 2023-11-20 07:28:38,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=995153.3333333334, ans=0.0 2023-11-20 07:28:45,221 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 5000, loss[loss=0.07146, simple_loss=0.1003, pruned_loss=0.01507, audio_tagging_loss=0.006248, over 15386.00 frames. ], tot_loss[loss=0.08046, simple_loss=0.101, pruned_loss=0.01986, audio_tagging_loss=0.01009, over 3051572.34 frames. ], batch size: 57, lr: 5.33e-03, grad_scale: 16.0 2023-11-20 07:28:49,723 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.87 vs. limit=15.0 2023-11-20 07:29:04,096 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.403e+01 7.937e+01 8.592e+01 9.230e+01 1.200e+02, threshold=1.718e+02, percent-clipped=0.0 2023-11-20 07:29:04,233 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 149300 2023-11-20 07:29:07,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=995286.6666666666, ans=0.2 2023-11-20 07:29:13,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=995353.3333333334, ans=0.125 2023-11-20 07:29:22,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=995420.0, ans=0.125 2023-11-20 07:29:35,735 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 07:29:49,386 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 5050, loss[loss=0.09524, simple_loss=0.1191, pruned_loss=0.02616, audio_tagging_loss=0.009522, over 15242.00 frames. ], tot_loss[loss=0.08095, simple_loss=0.1017, pruned_loss=0.0201, audio_tagging_loss=0.01, over 3045000.58 frames. ], batch size: 56, lr: 5.33e-03, grad_scale: 16.0 2023-11-20 07:30:08,479 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 149350 2023-11-20 07:30:19,771 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 07:30:24,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=995686.6666666666, ans=0.125 2023-11-20 07:30:45,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=995820.0, ans=0.1 2023-11-20 07:30:46,541 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 07:30:47,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=995820.0, ans=0.125 2023-11-20 07:30:53,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=995886.6666666666, ans=0.0 2023-11-20 07:30:54,239 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 5100, loss[loss=0.08585, simple_loss=0.1038, pruned_loss=0.01989, audio_tagging_loss=0.01407, over 15227.00 frames. ], tot_loss[loss=0.08062, simple_loss=0.101, pruned_loss=0.02007, audio_tagging_loss=0.01004, over 3047650.00 frames. ], batch size: 60, lr: 5.33e-03, grad_scale: 16.0 2023-11-20 07:31:13,981 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.044e+01 8.152e+01 8.992e+01 9.912e+01 1.301e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-20 07:31:14,118 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 149400 2023-11-20 07:31:18,242 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 07:31:38,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=996086.6666666666, ans=0.0 2023-11-20 07:31:44,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=996086.6666666666, ans=0.1 2023-11-20 07:31:59,928 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 5150, loss[loss=0.06104, simple_loss=0.07282, pruned_loss=0.01457, audio_tagging_loss=0.01006, over 14834.00 frames. ], tot_loss[loss=0.08021, simple_loss=0.1005, pruned_loss=0.01997, audio_tagging_loss=0.01001, over 3043239.74 frames. ], batch size: 58, lr: 5.33e-03, grad_scale: 16.0 2023-11-20 07:32:13,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=996286.6666666666, ans=0.2 2023-11-20 07:32:18,894 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 149450 2023-11-20 07:32:28,126 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.68 vs. limit=15.0 2023-11-20 07:32:30,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=996353.3333333334, ans=0.125 2023-11-20 07:32:37,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=996420.0, ans=0.2 2023-11-20 07:32:37,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=996420.0, ans=0.125 2023-11-20 07:32:41,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=996420.0, ans=0.125 2023-11-20 07:33:03,268 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 5200, loss[loss=0.06279, simple_loss=0.06696, pruned_loss=0.01724, audio_tagging_loss=0.01207, over 15055.00 frames. ], tot_loss[loss=0.08018, simple_loss=0.1004, pruned_loss=0.02001, audio_tagging_loss=0.009993, over 3041914.59 frames. ], batch size: 59, lr: 5.33e-03, grad_scale: 32.0 2023-11-20 07:33:22,721 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.044e+01 8.127e+01 8.748e+01 9.361e+01 1.233e+02, threshold=1.750e+02, percent-clipped=0.0 2023-11-20 07:33:22,888 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 149500 2023-11-20 07:33:45,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=996753.3333333334, ans=0.1 2023-11-20 07:34:07,794 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 5250, loss[loss=0.07188, simple_loss=0.08974, pruned_loss=0.01813, audio_tagging_loss=0.008875, over 15169.00 frames. ], tot_loss[loss=0.08017, simple_loss=0.1003, pruned_loss=0.02006, audio_tagging_loss=0.009959, over 3038991.85 frames. ], batch size: 58, lr: 5.33e-03, grad_scale: 16.0 2023-11-20 07:34:28,152 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 149550 2023-11-20 07:34:28,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=996953.3333333334, ans=15.0 2023-11-20 07:34:42,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=997020.0, ans=0.1 2023-11-20 07:34:52,641 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.10 vs. limit=22.5 2023-11-20 07:34:59,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=997153.3333333334, ans=0.05 2023-11-20 07:35:13,132 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 5300, loss[loss=0.07797, simple_loss=0.09866, pruned_loss=0.01976, audio_tagging_loss=0.008884, over 15870.00 frames. ], tot_loss[loss=0.08041, simple_loss=0.1006, pruned_loss=0.02015, audio_tagging_loss=0.009953, over 3037811.45 frames. ], batch size: 60, lr: 5.33e-03, grad_scale: 16.0 2023-11-20 07:35:17,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=997220.0, ans=0.0 2023-11-20 07:35:22,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=997220.0, ans=0.125 2023-11-20 07:35:26,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=997286.6666666666, ans=0.0 2023-11-20 07:35:32,186 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 149600 2023-11-20 07:35:33,247 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.168e+01 7.882e+01 8.704e+01 9.363e+01 1.429e+02, threshold=1.741e+02, percent-clipped=0.0 2023-11-20 07:35:43,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=997353.3333333334, ans=0.0 2023-11-20 07:36:18,198 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 5350, loss[loss=0.06885, simple_loss=0.08069, pruned_loss=0.01561, audio_tagging_loss=0.0129, over 17177.00 frames. ], tot_loss[loss=0.0804, simple_loss=0.1004, pruned_loss=0.02018, audio_tagging_loss=0.009995, over 3042561.01 frames. ], batch size: 67, lr: 5.33e-03, grad_scale: 16.0 2023-11-20 07:36:21,452 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.53 vs. limit=22.5 2023-11-20 07:36:31,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=997620.0, ans=0.125 2023-11-20 07:36:33,851 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.35 vs. limit=15.0 2023-11-20 07:36:37,658 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 149650 2023-11-20 07:36:50,129 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.35 vs. limit=10.0 2023-11-20 07:36:52,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=997686.6666666666, ans=0.2 2023-11-20 07:37:05,371 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.87 vs. limit=15.0 2023-11-20 07:37:11,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=997820.0, ans=0.035 2023-11-20 07:37:20,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=997886.6666666666, ans=0.0 2023-11-20 07:37:21,720 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 5400, loss[loss=0.06253, simple_loss=0.0791, pruned_loss=0.01393, audio_tagging_loss=0.009049, over 13757.00 frames. ], tot_loss[loss=0.08036, simple_loss=0.1003, pruned_loss=0.02017, audio_tagging_loss=0.01004, over 3038022.54 frames. ], batch size: 54, lr: 5.33e-03, grad_scale: 16.0 2023-11-20 07:37:29,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=997886.6666666666, ans=0.0 2023-11-20 07:37:41,830 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 149700 2023-11-20 07:37:42,885 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.237e+01 8.118e+01 8.667e+01 9.522e+01 1.475e+02, threshold=1.733e+02, percent-clipped=0.0 2023-11-20 07:38:18,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=998153.3333333334, ans=0.125 2023-11-20 07:38:26,845 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 5450, loss[loss=0.09235, simple_loss=0.1081, pruned_loss=0.02575, audio_tagging_loss=0.01258, over 15426.00 frames. ], tot_loss[loss=0.07986, simple_loss=0.09937, pruned_loss=0.02005, audio_tagging_loss=0.01012, over 3038412.83 frames. ], batch size: 57, lr: 5.33e-03, grad_scale: 8.0 2023-11-20 07:38:32,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=998220.0, ans=0.0 2023-11-20 07:38:32,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=998220.0, ans=0.09899494936611666 2023-11-20 07:38:45,942 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 149750 2023-11-20 07:38:47,655 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.56 vs. limit=6.0 2023-11-20 07:38:58,411 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.73 vs. limit=22.5 2023-11-20 07:39:06,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=998420.0, ans=0.125 2023-11-20 07:39:20,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=998486.6666666666, ans=0.1 2023-11-20 07:39:26,659 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.42 vs. limit=6.0 2023-11-20 07:39:30,523 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.44 vs. limit=15.0 2023-11-20 07:39:31,025 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 5500, loss[loss=0.0729, simple_loss=0.1005, pruned_loss=0.01207, audio_tagging_loss=0.01057, over 15063.00 frames. ], tot_loss[loss=0.07997, simple_loss=0.09988, pruned_loss=0.01995, audio_tagging_loss=0.01008, over 3042333.80 frames. ], batch size: 57, lr: 5.32e-03, grad_scale: 8.0 2023-11-20 07:39:42,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=998620.0, ans=0.125 2023-11-20 07:39:47,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=998620.0, ans=0.125 2023-11-20 07:39:49,338 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 149800 2023-11-20 07:39:52,070 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.523e+01 8.049e+01 8.675e+01 9.506e+01 1.252e+02, threshold=1.735e+02, percent-clipped=0.0 2023-11-20 07:40:08,621 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.57 vs. limit=15.0 2023-11-20 07:40:09,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=998753.3333333334, ans=0.125 2023-11-20 07:40:23,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=998820.0, ans=0.2 2023-11-20 07:40:34,393 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 5550, loss[loss=0.08414, simple_loss=0.1105, pruned_loss=0.02001, audio_tagging_loss=0.00887, over 15741.00 frames. ], tot_loss[loss=0.08041, simple_loss=0.1006, pruned_loss=0.01994, audio_tagging_loss=0.01015, over 3047376.82 frames. ], batch size: 57, lr: 5.32e-03, grad_scale: 8.0 2023-11-20 07:40:46,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=998953.3333333334, ans=0.0 2023-11-20 07:40:54,565 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 149850 2023-11-20 07:40:59,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=999020.0, ans=0.0 2023-11-20 07:41:15,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=999086.6666666666, ans=0.125 2023-11-20 07:41:20,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=999086.6666666666, ans=0.09899494936611666 2023-11-20 07:41:25,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=999153.3333333334, ans=0.0 2023-11-20 07:41:31,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=999153.3333333334, ans=0.125 2023-11-20 07:41:40,104 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 5600, loss[loss=0.07935, simple_loss=0.09696, pruned_loss=0.02086, audio_tagging_loss=0.01001, over 14648.00 frames. ], tot_loss[loss=0.08091, simple_loss=0.101, pruned_loss=0.02013, audio_tagging_loss=0.01027, over 3045412.94 frames. ], batch size: 54, lr: 5.32e-03, grad_scale: 16.0 2023-11-20 07:41:59,485 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 149900 2023-11-20 07:42:01,821 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.809e+01 8.057e+01 8.712e+01 9.430e+01 1.236e+02, threshold=1.742e+02, percent-clipped=0.0 2023-11-20 07:42:07,285 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.77 vs. limit=15.0 2023-11-20 07:42:08,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=999353.3333333334, ans=0.0 2023-11-20 07:42:11,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=999353.3333333334, ans=0.125 2023-11-20 07:42:14,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=999353.3333333334, ans=0.125 2023-11-20 07:42:19,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=999420.0, ans=0.125 2023-11-20 07:42:20,346 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=999420.0, ans=0.0 2023-11-20 07:42:24,351 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 07:42:28,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=999420.0, ans=0.125 2023-11-20 07:42:32,305 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.74 vs. limit=22.5 2023-11-20 07:42:44,599 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 5650, loss[loss=0.06247, simple_loss=0.06369, pruned_loss=0.0164, audio_tagging_loss=0.01423, over 14321.00 frames. ], tot_loss[loss=0.08044, simple_loss=0.1002, pruned_loss=0.01993, audio_tagging_loss=0.01043, over 3047778.29 frames. ], batch size: 53, lr: 5.32e-03, grad_scale: 16.0 2023-11-20 07:42:48,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=999553.3333333334, ans=0.1 2023-11-20 07:43:03,283 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 149950 2023-11-20 07:43:12,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=999686.6666666666, ans=0.125 2023-11-20 07:43:45,491 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=16.76 vs. limit=15.0 2023-11-20 07:43:48,663 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 5700, loss[loss=0.07097, simple_loss=0.08213, pruned_loss=0.01901, audio_tagging_loss=0.01089, over 15754.00 frames. ], tot_loss[loss=0.08011, simple_loss=0.0997, pruned_loss=0.01988, audio_tagging_loss=0.01037, over 3051512.00 frames. ], batch size: 58, lr: 5.32e-03, grad_scale: 16.0 2023-11-20 07:43:56,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=999886.6666666666, ans=0.1 2023-11-20 07:44:07,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=999953.3333333334, ans=0.125 2023-11-20 07:44:08,600 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 150000 2023-11-20 07:44:11,203 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.782e+01 7.942e+01 8.498e+01 9.233e+01 1.252e+02, threshold=1.700e+02, percent-clipped=0.0 2023-11-20 07:44:33,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=1000086.6666666666, ans=0.125 2023-11-20 07:44:53,184 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 5750, loss[loss=0.07238, simple_loss=0.09775, pruned_loss=0.01614, audio_tagging_loss=0.007368, over 13919.00 frames. ], tot_loss[loss=0.08049, simple_loss=0.1005, pruned_loss=0.02008, audio_tagging_loss=0.01015, over 3057805.98 frames. ], batch size: 54, lr: 5.32e-03, grad_scale: 16.0 2023-11-20 07:45:03,370 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.09 vs. limit=15.0 2023-11-20 07:45:08,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1000286.6666666666, ans=0.0 2023-11-20 07:45:13,400 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 150050 2023-11-20 07:45:25,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1000353.3333333334, ans=0.125 2023-11-20 07:45:41,024 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 07:45:42,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1000420.0, ans=0.0 2023-11-20 07:45:52,389 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.13 vs. limit=12.0 2023-11-20 07:45:55,897 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.70 vs. limit=15.0 2023-11-20 07:45:57,679 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.09 vs. limit=15.0 2023-11-20 07:45:58,350 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 5800, loss[loss=0.06967, simple_loss=0.08759, pruned_loss=0.01502, audio_tagging_loss=0.01085, over 15459.00 frames. ], tot_loss[loss=0.08043, simple_loss=0.1007, pruned_loss=0.02009, audio_tagging_loss=0.009975, over 3057562.36 frames. ], batch size: 59, lr: 5.32e-03, grad_scale: 16.0 2023-11-20 07:46:16,864 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 150100 2023-11-20 07:46:18,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1000620.0, ans=0.125 2023-11-20 07:46:19,211 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.622e+01 8.139e+01 8.776e+01 9.398e+01 1.793e+02, threshold=1.755e+02, percent-clipped=1.0 2023-11-20 07:46:20,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1000620.0, ans=0.1 2023-11-20 07:46:24,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1000686.6666666666, ans=0.125 2023-11-20 07:46:42,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1000753.3333333334, ans=0.2 2023-11-20 07:46:50,757 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.44 vs. limit=8.0 2023-11-20 07:46:51,659 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.61 vs. limit=10.0 2023-11-20 07:46:56,619 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.93 vs. limit=15.0 2023-11-20 07:47:01,815 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 5850, loss[loss=0.08017, simple_loss=0.09285, pruned_loss=0.02043, audio_tagging_loss=0.01331, over 14242.00 frames. ], tot_loss[loss=0.07944, simple_loss=0.09935, pruned_loss=0.01976, audio_tagging_loss=0.01001, over 3050815.49 frames. ], batch size: 57, lr: 5.32e-03, grad_scale: 16.0 2023-11-20 07:47:19,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1000953.3333333334, ans=0.125 2023-11-20 07:47:21,471 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 150150 2023-11-20 07:47:24,524 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.93 vs. limit=15.0 2023-11-20 07:47:27,126 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.31 vs. limit=15.0 2023-11-20 07:47:46,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1001086.6666666666, ans=0.1 2023-11-20 07:48:05,890 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 5900, loss[loss=0.08167, simple_loss=0.1091, pruned_loss=0.02109, audio_tagging_loss=0.006041, over 15520.00 frames. ], tot_loss[loss=0.07911, simple_loss=0.09876, pruned_loss=0.01973, audio_tagging_loss=0.01, over 3045626.73 frames. ], batch size: 59, lr: 5.32e-03, grad_scale: 16.0 2023-11-20 07:48:18,922 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1001286.6666666666, ans=0.5 2023-11-20 07:48:25,959 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 150200 2023-11-20 07:48:27,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1001286.6666666666, ans=0.0 2023-11-20 07:48:28,737 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.681e+01 8.240e+01 8.857e+01 9.692e+01 1.369e+02, threshold=1.771e+02, percent-clipped=0.0 2023-11-20 07:48:33,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1001353.3333333334, ans=0.125 2023-11-20 07:49:07,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1001486.6666666666, ans=0.125 2023-11-20 07:49:10,155 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 5950, loss[loss=0.07755, simple_loss=0.0911, pruned_loss=0.01989, audio_tagging_loss=0.0121, over 14961.00 frames. ], tot_loss[loss=0.07913, simple_loss=0.09884, pruned_loss=0.01963, audio_tagging_loss=0.01007, over 3046383.50 frames. ], batch size: 55, lr: 5.32e-03, grad_scale: 16.0 2023-11-20 07:49:10,898 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.05 vs. limit=15.0 2023-11-20 07:49:15,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1001553.3333333334, ans=0.125 2023-11-20 07:49:20,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1001553.3333333334, ans=0.125 2023-11-20 07:49:29,858 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 150250 2023-11-20 07:49:29,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1001620.0, ans=0.5 2023-11-20 07:49:30,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1001620.0, ans=0.125 2023-11-20 07:50:07,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1001820.0, ans=0.125 2023-11-20 07:50:14,415 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 6000, loss[loss=0.05837, simple_loss=0.06548, pruned_loss=0.009426, audio_tagging_loss=0.01621, over 15799.00 frames. ], tot_loss[loss=0.07995, simple_loss=0.09984, pruned_loss=0.01999, audio_tagging_loss=0.01004, over 3039393.18 frames. ], batch size: 63, lr: 5.32e-03, grad_scale: 32.0 2023-11-20 07:50:14,416 INFO [train_asr.py:1285] (1/4) Computing validation loss 2023-11-20 07:50:53,675 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.1790, 3.4965, 5.0598, 4.8119], device='cuda:1') 2023-11-20 07:50:55,331 INFO [train_asr.py:1294] (1/4) Epoch 13, validation: loss=0.06203, simple_loss=0.05394, pruned_loss=0.00581, audio_tagging_loss=0.02925, over 4681554.00 frames. 2023-11-20 07:50:55,332 INFO [train_asr.py:1295] (1/4) Maximum memory allocated so far is 26082MB 2023-11-20 07:50:57,302 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.49 vs. limit=22.5 2023-11-20 07:51:09,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1001953.3333333334, ans=0.125 2023-11-20 07:51:10,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1001953.3333333334, ans=0.025 2023-11-20 07:51:15,049 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 150300 2023-11-20 07:51:17,453 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.658e+01 7.953e+01 8.643e+01 9.262e+01 1.038e+02, threshold=1.729e+02, percent-clipped=0.0 2023-11-20 07:51:21,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1002020.0, ans=0.1 2023-11-20 07:51:25,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1002020.0, ans=0.125 2023-11-20 07:51:27,994 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.06 vs. limit=15.0 2023-11-20 07:51:30,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1002020.0, ans=0.125 2023-11-20 07:51:31,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1002020.0, ans=0.1 2023-11-20 07:51:33,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1002086.6666666666, ans=0.0 2023-11-20 07:51:33,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1002086.6666666666, ans=0.0 2023-11-20 07:51:37,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1002086.6666666666, ans=0.2 2023-11-20 07:51:41,303 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 07:51:59,913 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 6050, loss[loss=0.08694, simple_loss=0.1117, pruned_loss=0.02076, audio_tagging_loss=0.01033, over 16208.00 frames. ], tot_loss[loss=0.07906, simple_loss=0.09885, pruned_loss=0.01957, audio_tagging_loss=0.01006, over 3038626.41 frames. ], batch size: 58, lr: 5.32e-03, grad_scale: 16.0 2023-11-20 07:52:05,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1002220.0, ans=0.04949747468305833 2023-11-20 07:52:15,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1002286.6666666666, ans=0.125 2023-11-20 07:52:18,912 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 150350 2023-11-20 07:52:30,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1002353.3333333334, ans=0.07 2023-11-20 07:52:39,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1002420.0, ans=0.2 2023-11-20 07:52:47,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1002420.0, ans=0.125 2023-11-20 07:53:00,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1002486.6666666666, ans=0.125 2023-11-20 07:53:03,908 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 6100, loss[loss=0.08297, simple_loss=0.1044, pruned_loss=0.01966, audio_tagging_loss=0.0111, over 15168.00 frames. ], tot_loss[loss=0.07999, simple_loss=0.1004, pruned_loss=0.01979, audio_tagging_loss=0.009995, over 3043889.39 frames. ], batch size: 56, lr: 5.31e-03, grad_scale: 16.0 2023-11-20 07:53:12,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1002553.3333333334, ans=0.125 2023-11-20 07:53:23,644 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 150400 2023-11-20 07:53:23,747 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 07:53:27,491 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.228e+01 8.482e+01 9.288e+01 1.055e+02 1.647e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-20 07:53:30,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1002686.6666666666, ans=0.125 2023-11-20 07:53:40,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1002686.6666666666, ans=0.035 2023-11-20 07:54:05,796 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.39 vs. limit=15.0 2023-11-20 07:54:08,345 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 6150, loss[loss=0.08588, simple_loss=0.1119, pruned_loss=0.02102, audio_tagging_loss=0.008925, over 15097.00 frames. ], tot_loss[loss=0.07984, simple_loss=0.1002, pruned_loss=0.01968, audio_tagging_loss=0.01003, over 3044243.81 frames. ], batch size: 56, lr: 5.31e-03, grad_scale: 16.0 2023-11-20 07:54:20,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1002953.3333333334, ans=0.0 2023-11-20 07:54:21,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1002953.3333333334, ans=0.125 2023-11-20 07:54:28,033 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 150450 2023-11-20 07:54:28,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1002953.3333333334, ans=0.0 2023-11-20 07:54:28,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1002953.3333333334, ans=0.125 2023-11-20 07:54:40,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1003020.0, ans=0.0 2023-11-20 07:54:41,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1003020.0, ans=0.1 2023-11-20 07:54:51,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1003086.6666666666, ans=0.0 2023-11-20 07:54:53,966 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.43 vs. limit=15.0 2023-11-20 07:54:58,147 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.64 vs. limit=15.0 2023-11-20 07:55:12,843 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 6200, loss[loss=0.0725, simple_loss=0.09145, pruned_loss=0.01716, audio_tagging_loss=0.009613, over 15249.00 frames. ], tot_loss[loss=0.07987, simple_loss=0.1, pruned_loss=0.0197, audio_tagging_loss=0.01016, over 3047410.72 frames. ], batch size: 56, lr: 5.31e-03, grad_scale: 16.0 2023-11-20 07:55:13,658 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.70 vs. limit=15.0 2023-11-20 07:55:17,386 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1003220.0, ans=0.2 2023-11-20 07:55:31,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1003286.6666666666, ans=0.2 2023-11-20 07:55:32,204 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 150500 2023-11-20 07:55:35,266 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.22 vs. limit=22.5 2023-11-20 07:55:35,789 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.566e+01 8.052e+01 8.616e+01 9.415e+01 1.229e+02, threshold=1.723e+02, percent-clipped=0.0 2023-11-20 07:55:43,917 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.89 vs. limit=22.5 2023-11-20 07:55:55,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1003420.0, ans=0.07 2023-11-20 07:55:58,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1003420.0, ans=0.125 2023-11-20 07:56:17,651 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 6250, loss[loss=0.09157, simple_loss=0.1083, pruned_loss=0.02214, audio_tagging_loss=0.01528, over 14932.00 frames. ], tot_loss[loss=0.08041, simple_loss=0.1006, pruned_loss=0.01988, audio_tagging_loss=0.01023, over 3048639.79 frames. ], batch size: 54, lr: 5.31e-03, grad_scale: 16.0 2023-11-20 07:56:35,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1003620.0, ans=0.125 2023-11-20 07:56:37,314 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 150550 2023-11-20 07:56:41,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1003620.0, ans=0.125 2023-11-20 07:56:42,547 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 07:56:43,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1003686.6666666666, ans=0.125 2023-11-20 07:56:47,830 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.92 vs. limit=15.0 2023-11-20 07:57:21,467 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 6300, loss[loss=0.08157, simple_loss=0.102, pruned_loss=0.02075, audio_tagging_loss=0.009839, over 14008.00 frames. ], tot_loss[loss=0.08085, simple_loss=0.1013, pruned_loss=0.02001, audio_tagging_loss=0.0102, over 3047945.13 frames. ], batch size: 56, lr: 5.31e-03, grad_scale: 16.0 2023-11-20 07:57:41,453 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 150600 2023-11-20 07:57:45,552 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.750e+01 8.528e+01 9.126e+01 1.017e+02 1.272e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-20 07:57:46,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1004020.0, ans=0.0 2023-11-20 07:58:03,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1004086.6666666666, ans=0.125 2023-11-20 07:58:27,076 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 6350, loss[loss=0.08306, simple_loss=0.1006, pruned_loss=0.01887, audio_tagging_loss=0.01387, over 14873.00 frames. ], tot_loss[loss=0.08041, simple_loss=0.1004, pruned_loss=0.01978, audio_tagging_loss=0.01045, over 3040407.37 frames. ], batch size: 54, lr: 5.31e-03, grad_scale: 16.0 2023-11-20 07:58:36,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1004220.0, ans=0.125 2023-11-20 07:58:38,078 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1004220.0, ans=0.125 2023-11-20 07:58:46,399 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 150650 2023-11-20 07:58:52,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1004353.3333333334, ans=0.125 2023-11-20 07:58:52,532 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.73 vs. limit=22.5 2023-11-20 07:58:53,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1004353.3333333334, ans=0.125 2023-11-20 07:59:04,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1004353.3333333334, ans=0.0 2023-11-20 07:59:27,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1004486.6666666666, ans=0.025 2023-11-20 07:59:27,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1004486.6666666666, ans=0.125 2023-11-20 07:59:32,159 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 6400, loss[loss=0.07342, simple_loss=0.08748, pruned_loss=0.02001, audio_tagging_loss=0.009667, over 15063.00 frames. ], tot_loss[loss=0.08042, simple_loss=0.1002, pruned_loss=0.01985, audio_tagging_loss=0.01048, over 3043768.25 frames. ], batch size: 57, lr: 5.31e-03, grad_scale: 32.0 2023-11-20 07:59:39,386 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1004553.3333333334, ans=0.09899494936611666 2023-11-20 07:59:49,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1004620.0, ans=0.0 2023-11-20 07:59:51,888 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 150700 2023-11-20 07:59:56,011 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.650e+01 8.100e+01 8.750e+01 9.554e+01 1.310e+02, threshold=1.750e+02, percent-clipped=0.0 2023-11-20 07:59:57,828 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.26 vs. limit=15.0 2023-11-20 08:00:00,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1004686.6666666666, ans=0.125 2023-11-20 08:00:01,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1004686.6666666666, ans=0.2 2023-11-20 08:00:23,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1004820.0, ans=0.0 2023-11-20 08:00:37,045 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 6450, loss[loss=0.078, simple_loss=0.09531, pruned_loss=0.01992, audio_tagging_loss=0.01043, over 15222.00 frames. ], tot_loss[loss=0.08039, simple_loss=0.1002, pruned_loss=0.01982, audio_tagging_loss=0.01048, over 3049511.87 frames. ], batch size: 58, lr: 5.31e-03, grad_scale: 32.0 2023-11-20 08:00:44,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1004886.6666666666, ans=0.2 2023-11-20 08:00:56,806 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 150750 2023-11-20 08:01:01,293 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.09 vs. limit=12.0 2023-11-20 08:01:04,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1005020.0, ans=0.0 2023-11-20 08:01:05,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1005020.0, ans=0.125 2023-11-20 08:01:42,368 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 6500, loss[loss=0.08598, simple_loss=0.1128, pruned_loss=0.01965, audio_tagging_loss=0.00993, over 15904.00 frames. ], tot_loss[loss=0.08069, simple_loss=0.1008, pruned_loss=0.01995, audio_tagging_loss=0.01033, over 3054968.73 frames. ], batch size: 57, lr: 5.31e-03, grad_scale: 32.0 2023-11-20 08:01:46,466 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 08:02:01,533 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 150800 2023-11-20 08:02:05,445 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.846e+01 8.112e+01 8.845e+01 9.450e+01 1.236e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-20 08:02:25,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1005420.0, ans=10.0 2023-11-20 08:02:34,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1005486.6666666666, ans=0.0 2023-11-20 08:02:39,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1005486.6666666666, ans=0.1 2023-11-20 08:02:43,302 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.37 vs. limit=15.0 2023-11-20 08:02:47,579 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 6550, loss[loss=0.07858, simple_loss=0.1089, pruned_loss=0.01775, audio_tagging_loss=0.006399, over 14372.00 frames. ], tot_loss[loss=0.08064, simple_loss=0.101, pruned_loss=0.01998, audio_tagging_loss=0.01017, over 3061474.91 frames. ], batch size: 53, lr: 5.31e-03, grad_scale: 32.0 2023-11-20 08:02:47,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1005553.3333333334, ans=0.125 2023-11-20 08:02:50,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1005553.3333333334, ans=0.125 2023-11-20 08:02:52,804 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 08:03:01,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=1005620.0, ans=0.2 2023-11-20 08:03:06,833 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 150850 2023-11-20 08:03:15,593 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1005686.6666666666, ans=0.0 2023-11-20 08:03:19,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1005686.6666666666, ans=0.125 2023-11-20 08:03:25,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1005753.3333333334, ans=0.125 2023-11-20 08:03:50,923 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.90 vs. limit=10.0 2023-11-20 08:03:51,433 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 6600, loss[loss=0.07188, simple_loss=0.08394, pruned_loss=0.0176, audio_tagging_loss=0.01231, over 14288.00 frames. ], tot_loss[loss=0.08041, simple_loss=0.101, pruned_loss=0.01985, audio_tagging_loss=0.01005, over 3057057.91 frames. ], batch size: 54, lr: 5.31e-03, grad_scale: 16.0 2023-11-20 08:03:57,673 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.10 vs. limit=15.0 2023-11-20 08:04:04,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1005953.3333333334, ans=0.2 2023-11-20 08:04:07,731 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.45 vs. limit=10.0 2023-11-20 08:04:12,228 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 150900 2023-11-20 08:04:12,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1005953.3333333334, ans=0.0 2023-11-20 08:04:16,939 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.805e+01 8.131e+01 8.865e+01 9.580e+01 1.182e+02, threshold=1.773e+02, percent-clipped=0.0 2023-11-20 08:04:28,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1006020.0, ans=0.1 2023-11-20 08:04:32,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1006086.6666666666, ans=0.125 2023-11-20 08:04:57,521 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 6650, loss[loss=0.06166, simple_loss=0.08333, pruned_loss=0.0119, audio_tagging_loss=0.00809, over 15164.00 frames. ], tot_loss[loss=0.08019, simple_loss=0.101, pruned_loss=0.01976, audio_tagging_loss=0.009907, over 3057078.65 frames. ], batch size: 56, lr: 5.30e-03, grad_scale: 16.0 2023-11-20 08:05:07,042 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.21 vs. limit=22.5 2023-11-20 08:05:16,594 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 150950 2023-11-20 08:05:45,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.whiten.whitening_limit, batch_count=1006420.0, ans=12.0 2023-11-20 08:05:49,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1006486.6666666666, ans=0.125 2023-11-20 08:06:01,391 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 6700, loss[loss=0.08295, simple_loss=0.09371, pruned_loss=0.02349, audio_tagging_loss=0.01261, over 14728.00 frames. ], tot_loss[loss=0.08076, simple_loss=0.1016, pruned_loss=0.02002, audio_tagging_loss=0.009932, over 3052077.79 frames. ], batch size: 55, lr: 5.30e-03, grad_scale: 16.0 2023-11-20 08:06:09,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1006553.3333333334, ans=0.0 2023-11-20 08:06:11,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1006553.3333333334, ans=0.125 2023-11-20 08:06:16,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1006620.0, ans=0.5 2023-11-20 08:06:19,949 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 151000 2023-11-20 08:06:20,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1006620.0, ans=0.125 2023-11-20 08:06:25,049 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.540e+01 7.744e+01 8.706e+01 9.471e+01 1.164e+02, threshold=1.741e+02, percent-clipped=0.0 2023-11-20 08:06:40,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1006753.3333333334, ans=0.2 2023-11-20 08:06:52,614 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.22 vs. limit=15.0 2023-11-20 08:07:05,424 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 6750, loss[loss=0.06911, simple_loss=0.08365, pruned_loss=0.01548, audio_tagging_loss=0.0118, over 15917.00 frames. ], tot_loss[loss=0.08107, simple_loss=0.1017, pruned_loss=0.02021, audio_tagging_loss=0.01003, over 3054260.52 frames. ], batch size: 58, lr: 5.30e-03, grad_scale: 16.0 2023-11-20 08:07:08,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1006886.6666666666, ans=0.0 2023-11-20 08:07:19,705 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.82 vs. limit=6.0 2023-11-20 08:07:20,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1006953.3333333334, ans=0.0 2023-11-20 08:07:25,125 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 151050 2023-11-20 08:08:10,112 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 6800, loss[loss=0.07508, simple_loss=0.08577, pruned_loss=0.01857, audio_tagging_loss=0.01362, over 15588.00 frames. ], tot_loss[loss=0.08095, simple_loss=0.1014, pruned_loss=0.02022, audio_tagging_loss=0.01001, over 3048695.84 frames. ], batch size: 60, lr: 5.30e-03, grad_scale: 32.0 2023-11-20 08:08:29,250 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 151100 2023-11-20 08:08:33,970 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.708e+01 8.041e+01 8.769e+01 9.846e+01 1.398e+02, threshold=1.754e+02, percent-clipped=0.0 2023-11-20 08:08:37,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1007353.3333333334, ans=0.2 2023-11-20 08:08:40,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1007353.3333333334, ans=0.125 2023-11-20 08:09:02,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1007486.6666666666, ans=0.125 2023-11-20 08:09:04,095 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.48 vs. limit=15.0 2023-11-20 08:09:13,906 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 6850, loss[loss=0.07879, simple_loss=0.09646, pruned_loss=0.02009, audio_tagging_loss=0.01048, over 14571.00 frames. ], tot_loss[loss=0.08026, simple_loss=0.1006, pruned_loss=0.0199, audio_tagging_loss=0.01004, over 3049812.54 frames. ], batch size: 56, lr: 5.30e-03, grad_scale: 16.0 2023-11-20 08:09:19,615 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.08 vs. limit=6.0 2023-11-20 08:09:32,361 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 151150 2023-11-20 08:09:34,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1007620.0, ans=0.0 2023-11-20 08:09:40,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1007686.6666666666, ans=0.0 2023-11-20 08:09:59,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1007753.3333333334, ans=0.125 2023-11-20 08:09:59,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1007753.3333333334, ans=0.1 2023-11-20 08:10:05,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1007820.0, ans=0.2 2023-11-20 08:10:17,682 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 6900, loss[loss=0.09464, simple_loss=0.1163, pruned_loss=0.02743, audio_tagging_loss=0.009077, over 15318.00 frames. ], tot_loss[loss=0.08027, simple_loss=0.1007, pruned_loss=0.02003, audio_tagging_loss=0.009907, over 3052627.28 frames. ], batch size: 56, lr: 5.30e-03, grad_scale: 16.0 2023-11-20 08:10:21,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1007886.6666666666, ans=0.2 2023-11-20 08:10:37,656 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 151200 2023-11-20 08:10:44,617 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.186e+01 8.370e+01 9.169e+01 1.032e+02 1.396e+02, threshold=1.834e+02, percent-clipped=0.0 2023-11-20 08:11:08,921 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 08:11:22,929 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 6950, loss[loss=0.0727, simple_loss=0.09474, pruned_loss=0.016, audio_tagging_loss=0.009331, over 15353.00 frames. ], tot_loss[loss=0.08008, simple_loss=0.1005, pruned_loss=0.0199, audio_tagging_loss=0.009922, over 3047837.26 frames. ], batch size: 60, lr: 5.30e-03, grad_scale: 16.0 2023-11-20 08:11:25,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1008220.0, ans=0.04949747468305833 2023-11-20 08:11:30,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1008220.0, ans=0.125 2023-11-20 08:11:31,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1008220.0, ans=0.125 2023-11-20 08:11:38,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1008286.6666666666, ans=0.125 2023-11-20 08:11:42,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1008286.6666666666, ans=0.1 2023-11-20 08:11:43,500 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 151250 2023-11-20 08:11:44,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1008286.6666666666, ans=0.0 2023-11-20 08:11:51,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1008353.3333333334, ans=0.0 2023-11-20 08:12:09,390 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.96 vs. limit=6.0 2023-11-20 08:12:27,857 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 7000, loss[loss=0.09181, simple_loss=0.1185, pruned_loss=0.02223, audio_tagging_loss=0.01033, over 16011.00 frames. ], tot_loss[loss=0.07985, simple_loss=0.1002, pruned_loss=0.01975, audio_tagging_loss=0.009983, over 3047451.92 frames. ], batch size: 62, lr: 5.30e-03, grad_scale: 16.0 2023-11-20 08:12:32,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1008553.3333333334, ans=0.125 2023-11-20 08:12:33,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1008553.3333333334, ans=0.125 2023-11-20 08:12:46,743 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 151300 2023-11-20 08:12:52,721 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.717e+01 8.209e+01 8.895e+01 9.636e+01 1.347e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-20 08:13:32,215 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 7050, loss[loss=0.08051, simple_loss=0.1035, pruned_loss=0.02063, audio_tagging_loss=0.008121, over 15223.00 frames. ], tot_loss[loss=0.08028, simple_loss=0.1008, pruned_loss=0.0199, audio_tagging_loss=0.009973, over 3051114.37 frames. ], batch size: 55, lr: 5.30e-03, grad_scale: 16.0 2023-11-20 08:13:42,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1008886.6666666666, ans=0.125 2023-11-20 08:13:51,859 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 151350 2023-11-20 08:13:58,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1009020.0, ans=0.125 2023-11-20 08:14:13,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1009086.6666666666, ans=0.125 2023-11-20 08:14:18,633 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.80 vs. limit=15.0 2023-11-20 08:14:20,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1009086.6666666666, ans=0.2 2023-11-20 08:14:36,081 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 7100, loss[loss=0.07516, simple_loss=0.08709, pruned_loss=0.01813, audio_tagging_loss=0.01349, over 15036.00 frames. ], tot_loss[loss=0.07936, simple_loss=0.0993, pruned_loss=0.01952, audio_tagging_loss=0.0102, over 3049347.16 frames. ], batch size: 57, lr: 5.30e-03, grad_scale: 16.0 2023-11-20 08:14:36,829 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.67 vs. limit=22.5 2023-11-20 08:14:52,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1009286.6666666666, ans=0.0 2023-11-20 08:14:56,679 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 151400 2023-11-20 08:14:56,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1009286.6666666666, ans=0.125 2023-11-20 08:15:03,703 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.030e+01 8.148e+01 8.789e+01 9.448e+01 1.213e+02, threshold=1.758e+02, percent-clipped=0.0 2023-11-20 08:15:32,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1009486.6666666666, ans=0.125 2023-11-20 08:15:33,860 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.03 vs. limit=15.0 2023-11-20 08:15:37,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1009486.6666666666, ans=0.025 2023-11-20 08:15:41,835 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 7150, loss[loss=0.07823, simple_loss=0.09347, pruned_loss=0.02026, audio_tagging_loss=0.01123, over 15039.00 frames. ], tot_loss[loss=0.0805, simple_loss=0.1003, pruned_loss=0.02003, audio_tagging_loss=0.01031, over 3048048.06 frames. ], batch size: 56, lr: 5.30e-03, grad_scale: 16.0 2023-11-20 08:15:54,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1009620.0, ans=0.125 2023-11-20 08:15:56,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1009620.0, ans=0.1 2023-11-20 08:16:01,886 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 151450 2023-11-20 08:16:06,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1009686.6666666666, ans=0.04949747468305833 2023-11-20 08:16:06,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1009686.6666666666, ans=0.0 2023-11-20 08:16:12,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1009686.6666666666, ans=0.125 2023-11-20 08:16:18,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1009686.6666666666, ans=0.125 2023-11-20 08:16:20,815 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.22 vs. limit=15.0 2023-11-20 08:16:24,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1009753.3333333334, ans=0.125 2023-11-20 08:16:33,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1009820.0, ans=0.0 2023-11-20 08:16:46,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1009886.6666666666, ans=0.125 2023-11-20 08:16:47,322 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 7200, loss[loss=0.07664, simple_loss=0.09489, pruned_loss=0.01795, audio_tagging_loss=0.01124, over 14709.00 frames. ], tot_loss[loss=0.08068, simple_loss=0.1007, pruned_loss=0.02006, audio_tagging_loss=0.01027, over 3053125.66 frames. ], batch size: 54, lr: 5.30e-03, grad_scale: 32.0 2023-11-20 08:17:06,108 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 151500 2023-11-20 08:17:10,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1009953.3333333334, ans=0.0 2023-11-20 08:17:13,437 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.859e+01 8.360e+01 9.161e+01 1.018e+02 1.279e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-20 08:17:21,539 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.90 vs. limit=15.0 2023-11-20 08:17:24,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1010086.6666666666, ans=0.125 2023-11-20 08:17:30,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1010086.6666666666, ans=0.1 2023-11-20 08:17:45,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1010153.3333333334, ans=0.0 2023-11-20 08:17:50,978 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 7250, loss[loss=0.05319, simple_loss=0.05972, pruned_loss=0.01182, audio_tagging_loss=0.01152, over 15200.00 frames. ], tot_loss[loss=0.0808, simple_loss=0.1007, pruned_loss=0.02009, audio_tagging_loss=0.01037, over 3055988.64 frames. ], batch size: 60, lr: 5.29e-03, grad_scale: 32.0 2023-11-20 08:17:56,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1010220.0, ans=0.125 2023-11-20 08:18:10,763 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 151550 2023-11-20 08:18:36,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1010420.0, ans=0.2 2023-11-20 08:18:55,904 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 7300, loss[loss=0.06203, simple_loss=0.0743, pruned_loss=0.01305, audio_tagging_loss=0.01183, over 14745.00 frames. ], tot_loss[loss=0.08087, simple_loss=0.101, pruned_loss=0.02007, audio_tagging_loss=0.01028, over 3055858.12 frames. ], batch size: 56, lr: 5.29e-03, grad_scale: 32.0 2023-11-20 08:19:15,962 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 151600 2023-11-20 08:19:22,275 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.093e+01 8.295e+01 8.871e+01 9.520e+01 1.560e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-20 08:19:40,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1010753.3333333334, ans=0.1 2023-11-20 08:19:54,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1010820.0, ans=0.125 2023-11-20 08:20:01,265 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 7350, loss[loss=0.07632, simple_loss=0.09343, pruned_loss=0.02123, audio_tagging_loss=0.008374, over 15767.00 frames. ], tot_loss[loss=0.08094, simple_loss=0.1016, pruned_loss=0.02016, audio_tagging_loss=0.00999, over 3059889.05 frames. ], batch size: 57, lr: 5.29e-03, grad_scale: 32.0 2023-11-20 08:20:01,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1010886.6666666666, ans=0.1 2023-11-20 08:20:04,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1010886.6666666666, ans=0.125 2023-11-20 08:20:09,541 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.41 vs. limit=15.0 2023-11-20 08:20:11,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1010886.6666666666, ans=0.125 2023-11-20 08:20:19,876 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 151650 2023-11-20 08:20:45,325 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.71 vs. limit=15.0 2023-11-20 08:20:56,190 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.44 vs. limit=15.0 2023-11-20 08:20:59,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1011153.3333333334, ans=0.1 2023-11-20 08:21:05,193 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 7400, loss[loss=0.0938, simple_loss=0.1157, pruned_loss=0.02501, audio_tagging_loss=0.01094, over 16730.00 frames. ], tot_loss[loss=0.08054, simple_loss=0.1011, pruned_loss=0.02002, audio_tagging_loss=0.00997, over 3054779.37 frames. ], batch size: 63, lr: 5.29e-03, grad_scale: 32.0 2023-11-20 08:21:18,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1011286.6666666666, ans=0.125 2023-11-20 08:21:21,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1011286.6666666666, ans=0.125 2023-11-20 08:21:25,004 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 151700 2023-11-20 08:21:29,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1011353.3333333334, ans=0.1 2023-11-20 08:21:30,868 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.761e+01 8.067e+01 8.750e+01 9.547e+01 1.451e+02, threshold=1.750e+02, percent-clipped=0.0 2023-11-20 08:22:09,930 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 7450, loss[loss=0.09185, simple_loss=0.1146, pruned_loss=0.0264, audio_tagging_loss=0.008174, over 14839.00 frames. ], tot_loss[loss=0.08044, simple_loss=0.1008, pruned_loss=0.02015, audio_tagging_loss=0.009899, over 3053205.16 frames. ], batch size: 56, lr: 5.29e-03, grad_scale: 32.0 2023-11-20 08:22:15,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1011553.3333333334, ans=0.0 2023-11-20 08:22:17,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1011553.3333333334, ans=0.125 2023-11-20 08:22:29,156 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 151750 2023-11-20 08:22:29,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1011620.0, ans=0.125 2023-11-20 08:22:36,434 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.30 vs. limit=15.0 2023-11-20 08:22:54,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1011753.3333333334, ans=0.1 2023-11-20 08:23:09,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1011820.0, ans=0.2 2023-11-20 08:23:14,416 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 7500, loss[loss=0.0534, simple_loss=0.06235, pruned_loss=0.01028, audio_tagging_loss=0.01194, over 14308.00 frames. ], tot_loss[loss=0.08038, simple_loss=0.1007, pruned_loss=0.02019, audio_tagging_loss=0.009865, over 3049663.15 frames. ], batch size: 56, lr: 5.29e-03, grad_scale: 32.0 2023-11-20 08:23:25,349 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.89 vs. limit=6.0 2023-11-20 08:23:27,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1011953.3333333334, ans=0.125 2023-11-20 08:23:28,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1011953.3333333334, ans=0.1 2023-11-20 08:23:33,508 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 151800 2023-11-20 08:23:40,377 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.995e+01 8.501e+01 9.262e+01 1.002e+02 1.307e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-20 08:24:17,261 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.58 vs. limit=6.0 2023-11-20 08:24:19,039 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 7550, loss[loss=0.09953, simple_loss=0.1247, pruned_loss=0.03012, audio_tagging_loss=0.007079, over 15927.00 frames. ], tot_loss[loss=0.08053, simple_loss=0.101, pruned_loss=0.02025, audio_tagging_loss=0.009755, over 3054146.81 frames. ], batch size: 59, lr: 5.29e-03, grad_scale: 32.0 2023-11-20 08:24:31,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1012286.6666666666, ans=0.125 2023-11-20 08:24:38,065 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.89 vs. limit=15.0 2023-11-20 08:24:38,742 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 151850 2023-11-20 08:24:53,540 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.70 vs. limit=15.0 2023-11-20 08:25:01,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1012420.0, ans=0.125 2023-11-20 08:25:09,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1012486.6666666666, ans=0.0 2023-11-20 08:25:11,243 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.44 vs. limit=6.0 2023-11-20 08:25:21,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1012486.6666666666, ans=0.2 2023-11-20 08:25:22,122 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.05 vs. limit=15.0 2023-11-20 08:25:23,854 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 7600, loss[loss=0.09727, simple_loss=0.1249, pruned_loss=0.0243, audio_tagging_loss=0.01054, over 15257.00 frames. ], tot_loss[loss=0.08093, simple_loss=0.1015, pruned_loss=0.02035, audio_tagging_loss=0.009829, over 3052700.23 frames. ], batch size: 56, lr: 5.29e-03, grad_scale: 32.0 2023-11-20 08:25:29,515 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.10 vs. limit=22.5 2023-11-20 08:25:31,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1012553.3333333334, ans=0.125 2023-11-20 08:25:36,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1012620.0, ans=0.125 2023-11-20 08:25:39,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1012620.0, ans=0.0 2023-11-20 08:25:42,986 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 151900 2023-11-20 08:25:48,982 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.336e+01 8.168e+01 8.744e+01 9.399e+01 1.403e+02, threshold=1.749e+02, percent-clipped=0.0 2023-11-20 08:25:57,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=1012686.6666666666, ans=0.05 2023-11-20 08:25:58,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1012686.6666666666, ans=0.0 2023-11-20 08:26:17,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1012820.0, ans=0.125 2023-11-20 08:26:18,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1012820.0, ans=0.125 2023-11-20 08:26:19,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1012820.0, ans=0.1 2023-11-20 08:26:28,771 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 7650, loss[loss=0.08259, simple_loss=0.1096, pruned_loss=0.01638, audio_tagging_loss=0.01144, over 15693.00 frames. ], tot_loss[loss=0.08138, simple_loss=0.1018, pruned_loss=0.0206, audio_tagging_loss=0.009864, over 3048791.52 frames. ], batch size: 56, lr: 5.29e-03, grad_scale: 32.0 2023-11-20 08:26:38,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1012886.6666666666, ans=10.0 2023-11-20 08:26:48,154 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 151950 2023-11-20 08:26:55,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1013020.0, ans=0.0 2023-11-20 08:26:55,576 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.81 vs. limit=15.0 2023-11-20 08:27:10,406 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.18 vs. limit=15.0 2023-11-20 08:27:26,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1013153.3333333334, ans=0.125 2023-11-20 08:27:31,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1013153.3333333334, ans=0.125 2023-11-20 08:27:33,343 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 7700, loss[loss=0.08208, simple_loss=0.1006, pruned_loss=0.02346, audio_tagging_loss=0.008336, over 14807.00 frames. ], tot_loss[loss=0.08123, simple_loss=0.1015, pruned_loss=0.02057, audio_tagging_loss=0.009889, over 3047306.50 frames. ], batch size: 53, lr: 5.29e-03, grad_scale: 32.0 2023-11-20 08:27:48,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1013286.6666666666, ans=0.125 2023-11-20 08:27:53,672 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 152000 2023-11-20 08:28:03,801 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.653e+01 7.881e+01 8.536e+01 9.329e+01 1.282e+02, threshold=1.707e+02, percent-clipped=0.0 2023-11-20 08:28:05,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1013353.3333333334, ans=0.0 2023-11-20 08:28:10,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1013353.3333333334, ans=0.125 2023-11-20 08:28:13,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1013353.3333333334, ans=0.125 2023-11-20 08:28:19,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1013420.0, ans=0.125 2023-11-20 08:28:33,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1013486.6666666666, ans=0.2 2023-11-20 08:28:42,235 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.45 vs. limit=22.5 2023-11-20 08:28:42,940 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 7750, loss[loss=0.09969, simple_loss=0.1157, pruned_loss=0.03098, audio_tagging_loss=0.01086, over 15010.00 frames. ], tot_loss[loss=0.08159, simple_loss=0.1021, pruned_loss=0.02066, audio_tagging_loss=0.009864, over 3049841.60 frames. ], batch size: 55, lr: 5.29e-03, grad_scale: 32.0 2023-11-20 08:28:51,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=1013553.3333333334, ans=0.95 2023-11-20 08:28:51,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1013553.3333333334, ans=0.1 2023-11-20 08:28:51,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1013553.3333333334, ans=0.2 2023-11-20 08:29:01,673 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 152050 2023-11-20 08:29:05,742 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.78 vs. limit=12.0 2023-11-20 08:29:15,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1013686.6666666666, ans=0.2 2023-11-20 08:29:32,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1013820.0, ans=0.0 2023-11-20 08:29:35,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1013820.0, ans=0.0 2023-11-20 08:29:44,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1013820.0, ans=0.0 2023-11-20 08:29:46,229 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 7800, loss[loss=0.05714, simple_loss=0.06616, pruned_loss=0.01126, audio_tagging_loss=0.0128, over 16584.00 frames. ], tot_loss[loss=0.08076, simple_loss=0.1011, pruned_loss=0.02021, audio_tagging_loss=0.01001, over 3046290.43 frames. ], batch size: 63, lr: 5.28e-03, grad_scale: 32.0 2023-11-20 08:29:47,074 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.78 vs. limit=15.0 2023-11-20 08:29:59,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1013953.3333333334, ans=0.0 2023-11-20 08:30:05,404 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 152100 2023-11-20 08:30:05,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1013953.3333333334, ans=0.125 2023-11-20 08:30:12,164 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.723e+01 8.195e+01 8.711e+01 9.199e+01 1.205e+02, threshold=1.742e+02, percent-clipped=0.0 2023-11-20 08:30:17,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1014020.0, ans=0.2 2023-11-20 08:30:21,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1014020.0, ans=0.1 2023-11-20 08:30:24,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1014086.6666666666, ans=0.0 2023-11-20 08:30:40,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1014153.3333333334, ans=0.125 2023-11-20 08:30:51,149 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 7850, loss[loss=0.07641, simple_loss=0.09299, pruned_loss=0.01743, audio_tagging_loss=0.01249, over 13982.00 frames. ], tot_loss[loss=0.08112, simple_loss=0.1017, pruned_loss=0.02019, audio_tagging_loss=0.01008, over 3047941.70 frames. ], batch size: 54, lr: 5.28e-03, grad_scale: 32.0 2023-11-20 08:30:58,049 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.46 vs. limit=22.5 2023-11-20 08:31:11,648 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 152150 2023-11-20 08:31:14,406 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1014286.6666666666, ans=0.07 2023-11-20 08:31:28,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1014353.3333333334, ans=0.2 2023-11-20 08:31:28,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1014353.3333333334, ans=0.0 2023-11-20 08:31:37,975 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.97 vs. limit=15.0 2023-11-20 08:31:56,456 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 7900, loss[loss=0.07849, simple_loss=0.09909, pruned_loss=0.02005, audio_tagging_loss=0.008899, over 15434.00 frames. ], tot_loss[loss=0.08187, simple_loss=0.1024, pruned_loss=0.02046, audio_tagging_loss=0.0102, over 3049597.24 frames. ], batch size: 57, lr: 5.28e-03, grad_scale: 32.0 2023-11-20 08:31:59,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1014553.3333333334, ans=0.0 2023-11-20 08:31:59,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1014553.3333333334, ans=0.2 2023-11-20 08:32:15,259 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 152200 2023-11-20 08:32:22,239 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.716e+01 8.112e+01 8.963e+01 9.750e+01 1.229e+02, threshold=1.793e+02, percent-clipped=0.0 2023-11-20 08:32:25,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1014686.6666666666, ans=0.1 2023-11-20 08:32:44,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1014753.3333333334, ans=0.1 2023-11-20 08:33:00,854 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 7950, loss[loss=0.06481, simple_loss=0.0848, pruned_loss=0.01308, audio_tagging_loss=0.009319, over 16579.00 frames. ], tot_loss[loss=0.08056, simple_loss=0.1007, pruned_loss=0.01989, audio_tagging_loss=0.01032, over 3049359.20 frames. ], batch size: 62, lr: 5.28e-03, grad_scale: 32.0 2023-11-20 08:33:05,179 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.23 vs. limit=22.5 2023-11-20 08:33:17,553 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 08:33:20,008 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 152250 2023-11-20 08:33:34,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1015020.0, ans=0.2 2023-11-20 08:33:56,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1015153.3333333334, ans=0.0 2023-11-20 08:33:59,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1015153.3333333334, ans=0.09899494936611666 2023-11-20 08:34:00,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1015153.3333333334, ans=0.125 2023-11-20 08:34:04,879 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 8000, loss[loss=0.08825, simple_loss=0.103, pruned_loss=0.02298, audio_tagging_loss=0.01379, over 15746.00 frames. ], tot_loss[loss=0.08075, simple_loss=0.1009, pruned_loss=0.0199, audio_tagging_loss=0.01042, over 3053188.37 frames. ], batch size: 57, lr: 5.28e-03, grad_scale: 32.0 2023-11-20 08:34:22,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1015286.6666666666, ans=0.2 2023-11-20 08:34:24,109 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 152300 2023-11-20 08:34:24,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1015286.6666666666, ans=0.125 2023-11-20 08:34:30,839 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.971e+01 8.267e+01 8.804e+01 9.546e+01 1.251e+02, threshold=1.761e+02, percent-clipped=0.0 2023-11-20 08:34:46,017 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.04 vs. limit=22.5 2023-11-20 08:35:08,900 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 8050, loss[loss=0.0698, simple_loss=0.08542, pruned_loss=0.01588, audio_tagging_loss=0.01121, over 15716.00 frames. ], tot_loss[loss=0.08005, simple_loss=0.09975, pruned_loss=0.01969, audio_tagging_loss=0.01049, over 3048838.10 frames. ], batch size: 60, lr: 5.28e-03, grad_scale: 32.0 2023-11-20 08:35:29,133 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 152350 2023-11-20 08:35:30,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1015620.0, ans=0.125 2023-11-20 08:35:39,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1015686.6666666666, ans=0.125 2023-11-20 08:35:42,425 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.30 vs. limit=22.5 2023-11-20 08:35:44,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=1015686.6666666666, ans=15.0 2023-11-20 08:36:10,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1015820.0, ans=0.125 2023-11-20 08:36:12,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1015820.0, ans=0.0 2023-11-20 08:36:14,448 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 8100, loss[loss=0.0599, simple_loss=0.06635, pruned_loss=0.01416, audio_tagging_loss=0.01256, over 15360.00 frames. ], tot_loss[loss=0.08029, simple_loss=0.1001, pruned_loss=0.01981, audio_tagging_loss=0.01042, over 3047738.97 frames. ], batch size: 61, lr: 5.28e-03, grad_scale: 32.0 2023-11-20 08:36:33,584 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 152400 2023-11-20 08:36:39,866 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.824e+01 8.032e+01 8.730e+01 9.665e+01 1.175e+02, threshold=1.746e+02, percent-clipped=0.0 2023-11-20 08:36:43,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1016020.0, ans=0.1 2023-11-20 08:36:45,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1016020.0, ans=0.0 2023-11-20 08:37:15,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1016153.3333333334, ans=0.1 2023-11-20 08:37:16,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1016153.3333333334, ans=0.125 2023-11-20 08:37:18,477 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 8150, loss[loss=0.07963, simple_loss=0.1034, pruned_loss=0.02025, audio_tagging_loss=0.0077, over 15947.00 frames. ], tot_loss[loss=0.08059, simple_loss=0.1007, pruned_loss=0.02007, audio_tagging_loss=0.01018, over 3052987.72 frames. ], batch size: 59, lr: 5.28e-03, grad_scale: 16.0 2023-11-20 08:37:37,880 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 152450 2023-11-20 08:37:47,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1016353.3333333334, ans=0.05 2023-11-20 08:37:48,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1016353.3333333334, ans=0.125 2023-11-20 08:37:50,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1016353.3333333334, ans=0.2 2023-11-20 08:37:55,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1016353.3333333334, ans=0.025 2023-11-20 08:38:22,212 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 8200, loss[loss=0.07476, simple_loss=0.09796, pruned_loss=0.01646, audio_tagging_loss=0.009319, over 15105.00 frames. ], tot_loss[loss=0.08005, simple_loss=0.1, pruned_loss=0.01996, audio_tagging_loss=0.01009, over 3049534.52 frames. ], batch size: 56, lr: 5.28e-03, grad_scale: 16.0 2023-11-20 08:38:23,435 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 08:38:29,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1016553.3333333334, ans=0.125 2023-11-20 08:38:37,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1016620.0, ans=0.2 2023-11-20 08:38:42,189 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 152500 2023-11-20 08:38:47,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1016686.6666666666, ans=0.0 2023-11-20 08:38:49,926 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.647e+01 7.999e+01 8.746e+01 9.539e+01 1.196e+02, threshold=1.749e+02, percent-clipped=0.0 2023-11-20 08:39:02,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1016753.3333333334, ans=0.125 2023-11-20 08:39:09,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1016753.3333333334, ans=0.0 2023-11-20 08:39:15,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1016820.0, ans=0.125 2023-11-20 08:39:15,585 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.26 vs. limit=12.0 2023-11-20 08:39:20,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1016820.0, ans=0.125 2023-11-20 08:39:20,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1016820.0, ans=0.1 2023-11-20 08:39:27,147 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 8250, loss[loss=0.08915, simple_loss=0.1254, pruned_loss=0.01959, audio_tagging_loss=0.006848, over 14643.00 frames. ], tot_loss[loss=0.08109, simple_loss=0.1015, pruned_loss=0.02038, audio_tagging_loss=0.009976, over 3046796.41 frames. ], batch size: 55, lr: 5.28e-03, grad_scale: 16.0 2023-11-20 08:39:46,078 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 152550 2023-11-20 08:40:16,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1017086.6666666666, ans=0.125 2023-11-20 08:40:26,789 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 08:40:31,464 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 8300, loss[loss=0.05663, simple_loss=0.0716, pruned_loss=0.01048, audio_tagging_loss=0.01036, over 15052.00 frames. ], tot_loss[loss=0.08096, simple_loss=0.1016, pruned_loss=0.02024, audio_tagging_loss=0.009905, over 3045595.38 frames. ], batch size: 57, lr: 5.28e-03, grad_scale: 16.0 2023-11-20 08:40:37,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1017220.0, ans=0.1 2023-11-20 08:40:44,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1017286.6666666666, ans=0.0 2023-11-20 08:40:49,662 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 152600 2023-11-20 08:40:58,010 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.507e+01 8.110e+01 8.922e+01 9.710e+01 1.456e+02, threshold=1.784e+02, percent-clipped=0.0 2023-11-20 08:41:03,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1017353.3333333334, ans=0.125 2023-11-20 08:41:06,847 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1017353.3333333334, ans=0.125 2023-11-20 08:41:24,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1017486.6666666666, ans=0.0 2023-11-20 08:41:25,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1017486.6666666666, ans=0.2 2023-11-20 08:41:28,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1017486.6666666666, ans=0.125 2023-11-20 08:41:35,109 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 8350, loss[loss=0.06968, simple_loss=0.0909, pruned_loss=0.01496, audio_tagging_loss=0.009266, over 16677.00 frames. ], tot_loss[loss=0.08032, simple_loss=0.101, pruned_loss=0.01999, audio_tagging_loss=0.009811, over 3047876.77 frames. ], batch size: 65, lr: 5.28e-03, grad_scale: 16.0 2023-11-20 08:41:48,628 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 08:41:53,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1017620.0, ans=0.125 2023-11-20 08:41:54,604 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 152650 2023-11-20 08:42:11,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1017686.6666666666, ans=0.04949747468305833 2023-11-20 08:42:39,290 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.13 vs. limit=10.0 2023-11-20 08:42:39,801 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 8400, loss[loss=0.08391, simple_loss=0.1079, pruned_loss=0.01954, audio_tagging_loss=0.0104, over 14905.00 frames. ], tot_loss[loss=0.07936, simple_loss=0.09956, pruned_loss=0.01958, audio_tagging_loss=0.01, over 3047112.61 frames. ], batch size: 58, lr: 5.27e-03, grad_scale: 32.0 2023-11-20 08:42:42,543 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1017886.6666666666, ans=0.125 2023-11-20 08:42:47,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1017886.6666666666, ans=0.5 2023-11-20 08:42:58,718 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.75 vs. limit=15.0 2023-11-20 08:42:59,241 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 152700 2023-11-20 08:43:06,583 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.274e+01 8.016e+01 8.880e+01 9.653e+01 1.299e+02, threshold=1.776e+02, percent-clipped=0.0 2023-11-20 08:43:31,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1018153.3333333334, ans=0.125 2023-11-20 08:43:44,750 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 8450, loss[loss=0.07432, simple_loss=0.09405, pruned_loss=0.01878, audio_tagging_loss=0.008521, over 16557.00 frames. ], tot_loss[loss=0.0792, simple_loss=0.09918, pruned_loss=0.01961, audio_tagging_loss=0.01001, over 3034307.95 frames. ], batch size: 62, lr: 5.27e-03, grad_scale: 32.0 2023-11-20 08:44:03,251 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 152750 2023-11-20 08:44:06,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1018286.6666666666, ans=0.125 2023-11-20 08:44:39,838 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1018486.6666666666, ans=0.125 2023-11-20 08:44:40,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1018486.6666666666, ans=0.125 2023-11-20 08:44:45,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1018486.6666666666, ans=0.0 2023-11-20 08:44:48,043 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 8500, loss[loss=0.08638, simple_loss=0.1058, pruned_loss=0.02553, audio_tagging_loss=0.007921, over 15707.00 frames. ], tot_loss[loss=0.07948, simple_loss=0.09985, pruned_loss=0.01962, audio_tagging_loss=0.009936, over 3041632.21 frames. ], batch size: 59, lr: 5.27e-03, grad_scale: 32.0 2023-11-20 08:44:51,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1018553.3333333334, ans=0.0 2023-11-20 08:45:00,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1018620.0, ans=0.2 2023-11-20 08:45:07,840 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 152800 2023-11-20 08:45:15,425 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.959e+01 8.066e+01 8.928e+01 9.740e+01 1.439e+02, threshold=1.786e+02, percent-clipped=0.0 2023-11-20 08:45:26,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1018753.3333333334, ans=0.0 2023-11-20 08:45:28,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1018753.3333333334, ans=0.125 2023-11-20 08:45:53,050 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 8550, loss[loss=0.08506, simple_loss=0.101, pruned_loss=0.01962, audio_tagging_loss=0.01494, over 15542.00 frames. ], tot_loss[loss=0.07976, simple_loss=0.1002, pruned_loss=0.01967, audio_tagging_loss=0.01001, over 3041018.36 frames. ], batch size: 58, lr: 5.27e-03, grad_scale: 32.0 2023-11-20 08:45:53,577 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.08 vs. limit=22.5 2023-11-20 08:46:12,923 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 152850 2023-11-20 08:46:19,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1019020.0, ans=0.1 2023-11-20 08:46:42,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1019086.6666666666, ans=0.1 2023-11-20 08:46:48,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1019153.3333333334, ans=0.125 2023-11-20 08:46:57,778 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 8600, loss[loss=0.08435, simple_loss=0.1131, pruned_loss=0.02111, audio_tagging_loss=0.00669, over 15272.00 frames. ], tot_loss[loss=0.07942, simple_loss=0.09942, pruned_loss=0.01965, audio_tagging_loss=0.01007, over 3037235.66 frames. ], batch size: 55, lr: 5.27e-03, grad_scale: 32.0 2023-11-20 08:47:05,358 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.45 vs. limit=12.0 2023-11-20 08:47:16,751 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 152900 2023-11-20 08:47:24,682 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.691e+01 7.921e+01 8.657e+01 9.489e+01 1.169e+02, threshold=1.731e+02, percent-clipped=0.0 2023-11-20 08:47:28,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1019353.3333333334, ans=0.0 2023-11-20 08:47:36,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1019420.0, ans=0.0 2023-11-20 08:47:58,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1019486.6666666666, ans=0.125 2023-11-20 08:48:02,344 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 8650, loss[loss=0.09484, simple_loss=0.121, pruned_loss=0.02723, audio_tagging_loss=0.007092, over 15618.00 frames. ], tot_loss[loss=0.0798, simple_loss=0.09988, pruned_loss=0.01978, audio_tagging_loss=0.01008, over 3043591.03 frames. ], batch size: 58, lr: 5.27e-03, grad_scale: 32.0 2023-11-20 08:48:08,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1019553.3333333334, ans=0.2 2023-11-20 08:48:13,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1019553.3333333334, ans=0.1 2023-11-20 08:48:20,423 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=1019620.0, ans=0.2 2023-11-20 08:48:22,120 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 152950 2023-11-20 08:48:27,757 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.87 vs. limit=10.0 2023-11-20 08:48:34,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1019686.6666666666, ans=0.0 2023-11-20 08:48:36,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1019686.6666666666, ans=0.125 2023-11-20 08:48:46,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1019753.3333333334, ans=0.2 2023-11-20 08:48:54,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1019820.0, ans=0.04949747468305833 2023-11-20 08:49:06,031 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 8700, loss[loss=0.06197, simple_loss=0.07504, pruned_loss=0.01336, audio_tagging_loss=0.01108, over 14864.00 frames. ], tot_loss[loss=0.07975, simple_loss=0.09945, pruned_loss=0.01977, audio_tagging_loss=0.01027, over 3049778.88 frames. ], batch size: 57, lr: 5.27e-03, grad_scale: 16.0 2023-11-20 08:49:25,916 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 153000 2023-11-20 08:49:26,031 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 08:49:35,434 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.857e+01 8.289e+01 9.010e+01 9.707e+01 1.388e+02, threshold=1.802e+02, percent-clipped=0.0 2023-11-20 08:49:52,707 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.39 vs. limit=6.0 2023-11-20 08:50:11,230 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 8750, loss[loss=0.09544, simple_loss=0.1135, pruned_loss=0.02738, audio_tagging_loss=0.01133, over 15568.00 frames. ], tot_loss[loss=0.08054, simple_loss=0.1006, pruned_loss=0.01992, audio_tagging_loss=0.0103, over 3053441.83 frames. ], batch size: 58, lr: 5.27e-03, grad_scale: 16.0 2023-11-20 08:50:18,278 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1020220.0, ans=0.0 2023-11-20 08:50:30,891 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 153050 2023-11-20 08:50:30,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1020286.6666666666, ans=0.2 2023-11-20 08:50:40,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1020353.3333333334, ans=0.125 2023-11-20 08:50:59,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1020420.0, ans=0.125 2023-11-20 08:51:16,259 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 8800, loss[loss=0.06891, simple_loss=0.08775, pruned_loss=0.0138, audio_tagging_loss=0.01123, over 14666.00 frames. ], tot_loss[loss=0.08115, simple_loss=0.1014, pruned_loss=0.0201, audio_tagging_loss=0.01036, over 3053145.43 frames. ], batch size: 56, lr: 5.27e-03, grad_scale: 32.0 2023-11-20 08:51:27,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1020620.0, ans=0.125 2023-11-20 08:51:33,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1020620.0, ans=0.0 2023-11-20 08:51:34,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1020620.0, ans=0.125 2023-11-20 08:51:34,718 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.25 vs. limit=15.0 2023-11-20 08:51:35,336 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 153100 2023-11-20 08:51:44,283 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.487e+01 8.394e+01 9.116e+01 1.011e+02 1.329e+02, threshold=1.823e+02, percent-clipped=0.0 2023-11-20 08:51:52,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1020686.6666666666, ans=0.1 2023-11-20 08:51:58,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1020753.3333333334, ans=0.125 2023-11-20 08:52:08,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1020820.0, ans=0.09899494936611666 2023-11-20 08:52:20,888 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 8850, loss[loss=0.08115, simple_loss=0.1047, pruned_loss=0.01618, audio_tagging_loss=0.0126, over 14702.00 frames. ], tot_loss[loss=0.08159, simple_loss=0.1022, pruned_loss=0.02012, audio_tagging_loss=0.01038, over 3048463.67 frames. ], batch size: 55, lr: 5.27e-03, grad_scale: 32.0 2023-11-20 08:52:30,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1020886.6666666666, ans=0.1 2023-11-20 08:52:33,727 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 08:52:39,936 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 153150 2023-11-20 08:52:41,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=1020953.3333333334, ans=0.05 2023-11-20 08:52:51,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=1021020.0, ans=0.2 2023-11-20 08:53:07,520 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.40 vs. limit=12.0 2023-11-20 08:53:26,050 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 8900, loss[loss=0.09324, simple_loss=0.1082, pruned_loss=0.02808, audio_tagging_loss=0.01108, over 15430.00 frames. ], tot_loss[loss=0.08112, simple_loss=0.102, pruned_loss=0.01995, audio_tagging_loss=0.01018, over 3049249.61 frames. ], batch size: 55, lr: 5.27e-03, grad_scale: 16.0 2023-11-20 08:53:41,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1021286.6666666666, ans=0.0 2023-11-20 08:53:45,876 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 153200 2023-11-20 08:53:52,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1021353.3333333334, ans=0.0 2023-11-20 08:53:55,548 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.44 vs. limit=10.0 2023-11-20 08:53:55,905 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.066e+01 8.130e+01 8.663e+01 9.532e+01 1.311e+02, threshold=1.733e+02, percent-clipped=0.0 2023-11-20 08:54:13,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1021420.0, ans=0.1 2023-11-20 08:54:16,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1021420.0, ans=0.125 2023-11-20 08:54:31,104 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 8950, loss[loss=0.07697, simple_loss=0.09771, pruned_loss=0.01901, audio_tagging_loss=0.009104, over 15164.00 frames. ], tot_loss[loss=0.08116, simple_loss=0.1024, pruned_loss=0.01992, audio_tagging_loss=0.01005, over 3055226.13 frames. ], batch size: 57, lr: 5.26e-03, grad_scale: 16.0 2023-11-20 08:54:35,458 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.83 vs. limit=15.0 2023-11-20 08:54:39,270 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 08:54:50,572 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 153250 2023-11-20 08:55:13,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1021753.3333333334, ans=0.0 2023-11-20 08:55:36,038 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 9000, loss[loss=0.1108, simple_loss=0.1485, pruned_loss=0.03032, audio_tagging_loss=0.006276, over 16271.00 frames. ], tot_loss[loss=0.08097, simple_loss=0.102, pruned_loss=0.0199, audio_tagging_loss=0.01005, over 3049671.51 frames. ], batch size: 59, lr: 5.26e-03, grad_scale: 8.0 2023-11-20 08:55:36,039 INFO [train_asr.py:1285] (1/4) Computing validation loss 2023-11-20 08:56:11,064 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([6.0034, 5.8416, 5.7152, 5.5081], device='cuda:1') 2023-11-20 08:56:18,731 INFO [train_asr.py:1294] (1/4) Epoch 13, validation: loss=0.06245, simple_loss=0.0538, pruned_loss=0.005768, audio_tagging_loss=0.02978, over 4681554.00 frames. 2023-11-20 08:56:18,732 INFO [train_asr.py:1295] (1/4) Maximum memory allocated so far is 26082MB 2023-11-20 08:56:36,984 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 153300 2023-11-20 08:56:48,914 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.462e+01 8.276e+01 8.856e+01 9.604e+01 3.298e+02, threshold=1.771e+02, percent-clipped=1.0 2023-11-20 08:57:17,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1022153.3333333334, ans=0.2 2023-11-20 08:57:20,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1022153.3333333334, ans=0.1 2023-11-20 08:57:22,126 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 9050, loss[loss=0.05379, simple_loss=0.06525, pruned_loss=0.01277, audio_tagging_loss=0.008401, over 14743.00 frames. ], tot_loss[loss=0.08087, simple_loss=0.102, pruned_loss=0.01996, audio_tagging_loss=0.009892, over 3053270.14 frames. ], batch size: 56, lr: 5.26e-03, grad_scale: 8.0 2023-11-20 08:57:23,867 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.72 vs. limit=15.0 2023-11-20 08:57:24,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1022220.0, ans=0.0 2023-11-20 08:57:27,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1022220.0, ans=0.2 2023-11-20 08:57:33,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1022220.0, ans=0.0 2023-11-20 08:57:36,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1022286.6666666666, ans=0.0 2023-11-20 08:57:37,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1022286.6666666666, ans=0.1 2023-11-20 08:57:41,892 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 153350 2023-11-20 08:58:23,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1022486.6666666666, ans=0.0 2023-11-20 08:58:26,379 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.24 vs. limit=10.0 2023-11-20 08:58:26,689 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 9100, loss[loss=0.07242, simple_loss=0.08591, pruned_loss=0.01838, audio_tagging_loss=0.01108, over 15079.00 frames. ], tot_loss[loss=0.08117, simple_loss=0.1029, pruned_loss=0.0199, audio_tagging_loss=0.009802, over 3054683.18 frames. ], batch size: 56, lr: 5.26e-03, grad_scale: 8.0 2023-11-20 08:58:45,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1022620.0, ans=0.0 2023-11-20 08:58:46,281 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 153400 2023-11-20 08:58:55,898 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.16 vs. limit=10.0 2023-11-20 08:58:57,445 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.670e+01 8.168e+01 8.794e+01 9.522e+01 1.542e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-20 08:59:03,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1022753.3333333334, ans=0.0 2023-11-20 08:59:18,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1022820.0, ans=0.1 2023-11-20 08:59:19,198 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.83 vs. limit=6.0 2023-11-20 08:59:23,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=1022820.0, ans=0.025 2023-11-20 08:59:24,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1022820.0, ans=0.0 2023-11-20 08:59:29,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1022820.0, ans=0.0 2023-11-20 08:59:31,260 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 9150, loss[loss=0.08649, simple_loss=0.1205, pruned_loss=0.018, audio_tagging_loss=0.008253, over 15629.00 frames. ], tot_loss[loss=0.0809, simple_loss=0.1026, pruned_loss=0.0199, audio_tagging_loss=0.009728, over 3050813.33 frames. ], batch size: 56, lr: 5.26e-03, grad_scale: 8.0 2023-11-20 08:59:44,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1022953.3333333334, ans=0.0 2023-11-20 08:59:49,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1022953.3333333334, ans=0.125 2023-11-20 08:59:50,149 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 153450 2023-11-20 08:59:57,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1023020.0, ans=0.07 2023-11-20 09:00:14,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1023086.6666666666, ans=0.0 2023-11-20 09:00:35,465 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 9200, loss[loss=0.08081, simple_loss=0.09646, pruned_loss=0.02253, audio_tagging_loss=0.01005, over 14376.00 frames. ], tot_loss[loss=0.08016, simple_loss=0.101, pruned_loss=0.01979, audio_tagging_loss=0.009862, over 3050881.84 frames. ], batch size: 53, lr: 5.26e-03, grad_scale: 16.0 2023-11-20 09:00:45,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1023220.0, ans=0.1 2023-11-20 09:00:48,320 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.24 vs. limit=15.0 2023-11-20 09:00:55,648 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 153500 2023-11-20 09:01:07,215 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.430e+01 8.062e+01 8.603e+01 9.204e+01 1.228e+02, threshold=1.721e+02, percent-clipped=0.0 2023-11-20 09:01:07,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1023353.3333333334, ans=0.04949747468305833 2023-11-20 09:01:40,794 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 9250, loss[loss=0.07955, simple_loss=0.09543, pruned_loss=0.02172, audio_tagging_loss=0.01011, over 14893.00 frames. ], tot_loss[loss=0.08008, simple_loss=0.1008, pruned_loss=0.01982, audio_tagging_loss=0.009868, over 3052406.98 frames. ], batch size: 56, lr: 5.26e-03, grad_scale: 16.0 2023-11-20 09:01:42,600 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.96 vs. limit=15.0 2023-11-20 09:01:56,642 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.37 vs. limit=10.0 2023-11-20 09:01:59,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1023620.0, ans=0.125 2023-11-20 09:02:00,732 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 153550 2023-11-20 09:02:17,037 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.08 vs. limit=15.0 2023-11-20 09:02:31,655 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 09:02:31,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1023820.0, ans=0.125 2023-11-20 09:02:46,003 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 9300, loss[loss=0.07401, simple_loss=0.09425, pruned_loss=0.01626, audio_tagging_loss=0.01062, over 15504.00 frames. ], tot_loss[loss=0.08041, simple_loss=0.1013, pruned_loss=0.01995, audio_tagging_loss=0.009821, over 3053618.42 frames. ], batch size: 58, lr: 5.26e-03, grad_scale: 16.0 2023-11-20 09:02:51,803 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 09:02:54,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1023886.6666666666, ans=0.0 2023-11-20 09:03:05,358 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 153600 2023-11-20 09:03:07,010 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.28 vs. limit=15.0 2023-11-20 09:03:12,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1024020.0, ans=0.125 2023-11-20 09:03:17,354 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.484e+01 8.349e+01 9.030e+01 1.018e+02 1.384e+02, threshold=1.806e+02, percent-clipped=0.0 2023-11-20 09:03:17,660 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1024020.0, ans=0.125 2023-11-20 09:03:20,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1024020.0, ans=0.125 2023-11-20 09:03:20,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1024020.0, ans=0.125 2023-11-20 09:03:37,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1024153.3333333334, ans=0.125 2023-11-20 09:03:38,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1024153.3333333334, ans=0.2 2023-11-20 09:03:39,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1024153.3333333334, ans=0.0 2023-11-20 09:03:42,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1024153.3333333334, ans=0.1 2023-11-20 09:03:51,227 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 9350, loss[loss=0.06187, simple_loss=0.06676, pruned_loss=0.01497, audio_tagging_loss=0.01351, over 15318.00 frames. ], tot_loss[loss=0.08032, simple_loss=0.1009, pruned_loss=0.01988, audio_tagging_loss=0.009978, over 3051601.71 frames. ], batch size: 62, lr: 5.26e-03, grad_scale: 16.0 2023-11-20 09:03:51,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1024220.0, ans=0.125 2023-11-20 09:03:58,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1024220.0, ans=0.0 2023-11-20 09:04:02,855 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.62 vs. limit=15.0 2023-11-20 09:04:04,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1024286.6666666666, ans=0.125 2023-11-20 09:04:10,002 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 153650 2023-11-20 09:04:32,963 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.35 vs. limit=15.0 2023-11-20 09:04:35,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1024420.0, ans=0.0 2023-11-20 09:04:54,528 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 9400, loss[loss=0.06745, simple_loss=0.08262, pruned_loss=0.01281, audio_tagging_loss=0.01333, over 15377.00 frames. ], tot_loss[loss=0.0801, simple_loss=0.1007, pruned_loss=0.01974, audio_tagging_loss=0.01003, over 3050374.61 frames. ], batch size: 56, lr: 5.26e-03, grad_scale: 16.0 2023-11-20 09:05:07,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1024620.0, ans=0.1 2023-11-20 09:05:14,405 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 153700 2023-11-20 09:05:26,174 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.113e+01 8.189e+01 8.740e+01 9.691e+01 1.507e+02, threshold=1.748e+02, percent-clipped=0.0 2023-11-20 09:05:41,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=1024753.3333333334, ans=0.0 2023-11-20 09:05:43,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1024753.3333333334, ans=0.2 2023-11-20 09:05:57,739 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 09:05:59,499 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 9450, loss[loss=0.05877, simple_loss=0.07249, pruned_loss=0.008864, audio_tagging_loss=0.01366, over 14831.00 frames. ], tot_loss[loss=0.0803, simple_loss=0.1009, pruned_loss=0.01978, audio_tagging_loss=0.01008, over 3053163.18 frames. ], batch size: 56, lr: 5.26e-03, grad_scale: 16.0 2023-11-20 09:06:01,196 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 09:06:18,728 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 153750 2023-11-20 09:06:20,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1024953.3333333334, ans=0.125 2023-11-20 09:06:29,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1025020.0, ans=10.0 2023-11-20 09:06:38,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1025086.6666666666, ans=0.125 2023-11-20 09:06:46,175 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.73 vs. limit=22.5 2023-11-20 09:07:04,148 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 9500, loss[loss=0.08248, simple_loss=0.1032, pruned_loss=0.02305, audio_tagging_loss=0.007807, over 13549.00 frames. ], tot_loss[loss=0.08045, simple_loss=0.1008, pruned_loss=0.01987, audio_tagging_loss=0.01019, over 3050746.68 frames. ], batch size: 52, lr: 5.26e-03, grad_scale: 16.0 2023-11-20 09:07:23,670 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 153800 2023-11-20 09:07:31,624 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 09:07:35,603 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.073e+01 8.164e+01 8.858e+01 9.394e+01 2.637e+02, threshold=1.772e+02, percent-clipped=1.0 2023-11-20 09:07:38,918 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.78 vs. limit=22.5 2023-11-20 09:08:09,394 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 9550, loss[loss=0.07725, simple_loss=0.09622, pruned_loss=0.01953, audio_tagging_loss=0.009613, over 14366.00 frames. ], tot_loss[loss=0.08072, simple_loss=0.1011, pruned_loss=0.01991, audio_tagging_loss=0.01024, over 3047881.81 frames. ], batch size: 55, lr: 5.25e-03, grad_scale: 16.0 2023-11-20 09:08:13,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1025553.3333333334, ans=0.1 2023-11-20 09:08:19,230 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.47 vs. limit=22.5 2023-11-20 09:08:27,470 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.93 vs. limit=22.5 2023-11-20 09:08:29,293 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 153850 2023-11-20 09:08:30,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1025620.0, ans=0.1 2023-11-20 09:08:33,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1025620.0, ans=0.0 2023-11-20 09:08:46,628 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1025686.6666666666, ans=0.07 2023-11-20 09:08:55,462 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 09:09:02,397 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 09:09:14,878 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 9600, loss[loss=0.072, simple_loss=0.09167, pruned_loss=0.01559, audio_tagging_loss=0.01057, over 15120.00 frames. ], tot_loss[loss=0.08046, simple_loss=0.1007, pruned_loss=0.01977, audio_tagging_loss=0.01033, over 3045983.63 frames. ], batch size: 57, lr: 5.25e-03, grad_scale: 32.0 2023-11-20 09:09:26,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1025953.3333333334, ans=0.1 2023-11-20 09:09:34,187 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 153900 2023-11-20 09:09:38,386 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.87 vs. limit=10.0 2023-11-20 09:09:44,970 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.678e+01 7.953e+01 8.946e+01 9.937e+01 1.277e+02, threshold=1.789e+02, percent-clipped=0.0 2023-11-20 09:09:59,844 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.29 vs. limit=15.0 2023-11-20 09:10:14,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1026153.3333333334, ans=0.125 2023-11-20 09:10:19,603 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 9650, loss[loss=0.08767, simple_loss=0.1092, pruned_loss=0.02588, audio_tagging_loss=0.007165, over 15191.00 frames. ], tot_loss[loss=0.08042, simple_loss=0.1008, pruned_loss=0.01974, audio_tagging_loss=0.01025, over 3046686.82 frames. ], batch size: 57, lr: 5.25e-03, grad_scale: 32.0 2023-11-20 09:10:25,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1026220.0, ans=0.125 2023-11-20 09:10:38,240 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.57 vs. limit=15.0 2023-11-20 09:10:38,854 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 153950 2023-11-20 09:10:41,877 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.61 vs. limit=22.5 2023-11-20 09:10:58,805 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.16 vs. limit=12.0 2023-11-20 09:11:04,197 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.27 vs. limit=22.5 2023-11-20 09:11:13,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1026486.6666666666, ans=0.07 2023-11-20 09:11:23,343 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 9700, loss[loss=0.08024, simple_loss=0.0993, pruned_loss=0.01926, audio_tagging_loss=0.01133, over 14613.00 frames. ], tot_loss[loss=0.0806, simple_loss=0.1014, pruned_loss=0.01988, audio_tagging_loss=0.01001, over 3046532.04 frames. ], batch size: 57, lr: 5.25e-03, grad_scale: 32.0 2023-11-20 09:11:43,179 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 154000 2023-11-20 09:11:44,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1026620.0, ans=0.0 2023-11-20 09:11:54,853 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.479e+01 8.116e+01 8.846e+01 9.566e+01 1.154e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-20 09:12:05,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1026753.3333333334, ans=0.05 2023-11-20 09:12:12,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1026753.3333333334, ans=0.125 2023-11-20 09:12:18,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1026820.0, ans=0.125 2023-11-20 09:12:27,819 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 9750, loss[loss=0.07932, simple_loss=0.09402, pruned_loss=0.01864, audio_tagging_loss=0.01367, over 15143.00 frames. ], tot_loss[loss=0.08054, simple_loss=0.1012, pruned_loss=0.01997, audio_tagging_loss=0.009959, over 3044428.60 frames. ], batch size: 55, lr: 5.25e-03, grad_scale: 32.0 2023-11-20 09:12:45,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1026953.3333333334, ans=0.125 2023-11-20 09:12:48,192 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 154050 2023-11-20 09:13:01,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1027020.0, ans=0.125 2023-11-20 09:13:03,108 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 09:13:10,001 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 09:13:32,848 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 9800, loss[loss=0.07352, simple_loss=0.09399, pruned_loss=0.01686, audio_tagging_loss=0.009669, over 15039.00 frames. ], tot_loss[loss=0.08114, simple_loss=0.1023, pruned_loss=0.02018, audio_tagging_loss=0.009794, over 3038894.76 frames. ], batch size: 56, lr: 5.25e-03, grad_scale: 32.0 2023-11-20 09:13:46,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1027286.6666666666, ans=0.125 2023-11-20 09:13:51,935 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 154100 2023-11-20 09:14:03,357 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.948e+01 8.195e+01 8.924e+01 9.693e+01 1.492e+02, threshold=1.785e+02, percent-clipped=0.0 2023-11-20 09:14:10,742 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.69 vs. limit=15.0 2023-11-20 09:14:11,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1027420.0, ans=0.07 2023-11-20 09:14:16,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1027420.0, ans=0.95 2023-11-20 09:14:16,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1027420.0, ans=0.1 2023-11-20 09:14:28,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1027486.6666666666, ans=0.0 2023-11-20 09:14:30,958 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 09:14:37,020 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 9850, loss[loss=0.08912, simple_loss=0.1064, pruned_loss=0.02404, audio_tagging_loss=0.01188, over 15708.00 frames. ], tot_loss[loss=0.08159, simple_loss=0.103, pruned_loss=0.0204, audio_tagging_loss=0.009698, over 3031239.92 frames. ], batch size: 57, lr: 5.25e-03, grad_scale: 32.0 2023-11-20 09:14:50,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1027620.0, ans=0.0 2023-11-20 09:14:50,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1027620.0, ans=0.5 2023-11-20 09:14:51,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1027620.0, ans=0.125 2023-11-20 09:14:56,554 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 154150 2023-11-20 09:15:06,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1027686.6666666666, ans=0.125 2023-11-20 09:15:09,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1027686.6666666666, ans=0.0 2023-11-20 09:15:16,184 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.53 vs. limit=12.0 2023-11-20 09:15:20,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1027753.3333333334, ans=0.125 2023-11-20 09:15:33,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1027820.0, ans=0.0 2023-11-20 09:15:38,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=1027820.0, ans=0.5 2023-11-20 09:15:41,458 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 9900, loss[loss=0.08382, simple_loss=0.1053, pruned_loss=0.02246, audio_tagging_loss=0.008726, over 14906.00 frames. ], tot_loss[loss=0.08071, simple_loss=0.1019, pruned_loss=0.02004, audio_tagging_loss=0.009698, over 3032536.84 frames. ], batch size: 55, lr: 5.25e-03, grad_scale: 16.0 2023-11-20 09:15:47,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1027886.6666666666, ans=0.125 2023-11-20 09:16:01,296 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 154200 2023-11-20 09:16:14,638 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.724e+01 8.225e+01 8.765e+01 9.379e+01 1.572e+02, threshold=1.753e+02, percent-clipped=0.0 2023-11-20 09:16:21,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1028086.6666666666, ans=0.2 2023-11-20 09:16:37,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1028153.3333333334, ans=0.0 2023-11-20 09:16:47,243 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 9950, loss[loss=0.07888, simple_loss=0.1053, pruned_loss=0.01994, audio_tagging_loss=0.006285, over 15212.00 frames. ], tot_loss[loss=0.08043, simple_loss=0.1015, pruned_loss=0.01999, audio_tagging_loss=0.009677, over 3038112.11 frames. ], batch size: 55, lr: 5.25e-03, grad_scale: 16.0 2023-11-20 09:16:53,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1028220.0, ans=0.125 2023-11-20 09:16:55,437 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.51 vs. limit=22.5 2023-11-20 09:16:59,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1028286.6666666666, ans=0.0 2023-11-20 09:17:06,164 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 154250 2023-11-20 09:17:21,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1028353.3333333334, ans=0.1 2023-11-20 09:17:36,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1028420.0, ans=0.0 2023-11-20 09:17:38,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1028486.6666666666, ans=0.125 2023-11-20 09:17:46,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1028486.6666666666, ans=0.125 2023-11-20 09:17:48,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1028486.6666666666, ans=0.125 2023-11-20 09:17:51,773 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 10000, loss[loss=0.06933, simple_loss=0.07667, pruned_loss=0.01968, audio_tagging_loss=0.01132, over 15519.00 frames. ], tot_loss[loss=0.08026, simple_loss=0.1013, pruned_loss=0.01987, audio_tagging_loss=0.009723, over 3039163.71 frames. ], batch size: 57, lr: 5.25e-03, grad_scale: 32.0 2023-11-20 09:17:58,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1028553.3333333334, ans=0.0 2023-11-20 09:18:10,791 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 154300 2023-11-20 09:18:12,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1028620.0, ans=0.125 2023-11-20 09:18:18,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=1028686.6666666666, ans=15.0 2023-11-20 09:18:23,632 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.484e+01 8.210e+01 8.752e+01 9.474e+01 1.370e+02, threshold=1.750e+02, percent-clipped=0.0 2023-11-20 09:18:29,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1028753.3333333334, ans=0.0 2023-11-20 09:18:38,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1028753.3333333334, ans=0.2 2023-11-20 09:18:41,026 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.98 vs. limit=12.0 2023-11-20 09:18:49,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1028820.0, ans=0.1 2023-11-20 09:18:50,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1028820.0, ans=0.125 2023-11-20 09:18:52,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1028820.0, ans=0.1 2023-11-20 09:18:56,657 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 10050, loss[loss=0.09856, simple_loss=0.1211, pruned_loss=0.02937, audio_tagging_loss=0.00864, over 14586.00 frames. ], tot_loss[loss=0.0798, simple_loss=0.1006, pruned_loss=0.01974, audio_tagging_loss=0.009746, over 3034099.07 frames. ], batch size: 54, lr: 5.25e-03, grad_scale: 32.0 2023-11-20 09:18:58,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1028886.6666666666, ans=0.125 2023-11-20 09:19:13,716 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.76 vs. limit=15.0 2023-11-20 09:19:15,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1028953.3333333334, ans=0.125 2023-11-20 09:19:16,396 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 154350 2023-11-20 09:19:20,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1028953.3333333334, ans=0.125 2023-11-20 09:19:55,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1029153.3333333334, ans=0.1 2023-11-20 09:19:56,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1029153.3333333334, ans=0.0 2023-11-20 09:19:58,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1029153.3333333334, ans=0.1 2023-11-20 09:20:01,368 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 10100, loss[loss=0.08252, simple_loss=0.1031, pruned_loss=0.0216, audio_tagging_loss=0.009349, over 14600.00 frames. ], tot_loss[loss=0.08094, simple_loss=0.1022, pruned_loss=0.02002, audio_tagging_loss=0.009815, over 3035641.60 frames. ], batch size: 56, lr: 5.25e-03, grad_scale: 16.0 2023-11-20 09:20:13,650 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.33 vs. limit=6.0 2023-11-20 09:20:18,636 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.63 vs. limit=15.0 2023-11-20 09:20:20,345 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 154400 2023-11-20 09:20:21,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1029286.6666666666, ans=0.125 2023-11-20 09:20:35,226 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.925e+01 8.287e+01 9.301e+01 1.019e+02 1.504e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-20 09:20:53,508 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 09:21:05,867 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 10150, loss[loss=0.107, simple_loss=0.1385, pruned_loss=0.0279, audio_tagging_loss=0.009836, over 15591.00 frames. ], tot_loss[loss=0.08081, simple_loss=0.1019, pruned_loss=0.01996, audio_tagging_loss=0.009894, over 3046253.85 frames. ], batch size: 56, lr: 5.24e-03, grad_scale: 16.0 2023-11-20 09:21:08,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1029553.3333333334, ans=0.125 2023-11-20 09:21:21,358 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.79 vs. limit=15.0 2023-11-20 09:21:25,597 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 154450 2023-11-20 09:21:36,018 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 09:21:56,959 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.54 vs. limit=15.0 2023-11-20 09:22:08,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=1029820.0, ans=15.0 2023-11-20 09:22:10,471 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 10200, loss[loss=0.1003, simple_loss=0.1316, pruned_loss=0.02685, audio_tagging_loss=0.007685, over 16155.00 frames. ], tot_loss[loss=0.08064, simple_loss=0.1016, pruned_loss=0.01986, audio_tagging_loss=0.009984, over 3047324.77 frames. ], batch size: 57, lr: 5.24e-03, grad_scale: 16.0 2023-11-20 09:22:29,573 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 154500 2023-11-20 09:22:34,894 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 09:22:43,348 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.371e+01 8.118e+01 8.919e+01 9.960e+01 1.274e+02, threshold=1.784e+02, percent-clipped=0.0 2023-11-20 09:23:05,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1030153.3333333334, ans=0.0 2023-11-20 09:23:05,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1030153.3333333334, ans=0.2 2023-11-20 09:23:13,997 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 10250, loss[loss=0.1008, simple_loss=0.1321, pruned_loss=0.0274, audio_tagging_loss=0.007391, over 15869.00 frames. ], tot_loss[loss=0.08012, simple_loss=0.1006, pruned_loss=0.01967, audio_tagging_loss=0.01017, over 3052047.48 frames. ], batch size: 57, lr: 5.24e-03, grad_scale: 16.0 2023-11-20 09:23:17,054 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=11.44 vs. limit=12.0 2023-11-20 09:23:28,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1030286.6666666666, ans=0.025 2023-11-20 09:23:33,181 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 154550 2023-11-20 09:23:33,776 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.79 vs. limit=6.0 2023-11-20 09:23:48,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=1030353.3333333334, ans=15.0 2023-11-20 09:24:00,341 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.23 vs. limit=15.0 2023-11-20 09:24:06,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1030486.6666666666, ans=0.0 2023-11-20 09:24:19,161 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 10300, loss[loss=0.06081, simple_loss=0.08077, pruned_loss=0.01172, audio_tagging_loss=0.008705, over 14048.00 frames. ], tot_loss[loss=0.08081, simple_loss=0.1015, pruned_loss=0.01999, audio_tagging_loss=0.01009, over 3051231.52 frames. ], batch size: 56, lr: 5.24e-03, grad_scale: 16.0 2023-11-20 09:24:19,441 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 09:24:27,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1030553.3333333334, ans=0.1 2023-11-20 09:24:30,736 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.20 vs. limit=15.0 2023-11-20 09:24:34,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1030620.0, ans=0.125 2023-11-20 09:24:38,884 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 154600 2023-11-20 09:24:53,299 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.249e+01 8.311e+01 8.875e+01 9.603e+01 1.202e+02, threshold=1.775e+02, percent-clipped=0.0 2023-11-20 09:25:03,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1030753.3333333334, ans=0.125 2023-11-20 09:25:10,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1030820.0, ans=0.125 2023-11-20 09:25:14,015 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 09:25:15,698 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.32 vs. limit=22.5 2023-11-20 09:25:24,172 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 10350, loss[loss=0.09252, simple_loss=0.1152, pruned_loss=0.02587, audio_tagging_loss=0.009032, over 14442.00 frames. ], tot_loss[loss=0.08074, simple_loss=0.1011, pruned_loss=0.01992, audio_tagging_loss=0.01025, over 3044147.76 frames. ], batch size: 53, lr: 5.24e-03, grad_scale: 16.0 2023-11-20 09:25:30,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1030886.6666666666, ans=0.1 2023-11-20 09:25:30,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1030886.6666666666, ans=0.125 2023-11-20 09:25:35,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1030886.6666666666, ans=0.1 2023-11-20 09:25:42,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1030953.3333333334, ans=0.125 2023-11-20 09:25:43,746 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 154650 2023-11-20 09:25:55,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1031020.0, ans=0.125 2023-11-20 09:26:22,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1031153.3333333334, ans=0.0 2023-11-20 09:26:29,377 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 10400, loss[loss=0.0674, simple_loss=0.07642, pruned_loss=0.0177, audio_tagging_loss=0.01149, over 14404.00 frames. ], tot_loss[loss=0.08077, simple_loss=0.1012, pruned_loss=0.01993, audio_tagging_loss=0.01026, over 3045875.49 frames. ], batch size: 57, lr: 5.24e-03, grad_scale: 32.0 2023-11-20 09:26:48,768 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 154700 2023-11-20 09:27:03,084 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.158e+01 8.127e+01 8.781e+01 9.645e+01 1.274e+02, threshold=1.756e+02, percent-clipped=0.0 2023-11-20 09:27:34,480 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 10450, loss[loss=0.09775, simple_loss=0.1284, pruned_loss=0.02637, audio_tagging_loss=0.007157, over 15779.00 frames. ], tot_loss[loss=0.08108, simple_loss=0.1013, pruned_loss=0.02007, audio_tagging_loss=0.01034, over 3049682.49 frames. ], batch size: 56, lr: 5.24e-03, grad_scale: 32.0 2023-11-20 09:27:53,815 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 154750 2023-11-20 09:27:57,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1031620.0, ans=0.0 2023-11-20 09:28:11,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1031686.6666666666, ans=0.1 2023-11-20 09:28:13,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1031753.3333333334, ans=0.2 2023-11-20 09:28:19,769 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.96 vs. limit=22.5 2023-11-20 09:28:30,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1031820.0, ans=0.09899494936611666 2023-11-20 09:28:33,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1031820.0, ans=0.125 2023-11-20 09:28:38,633 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 10500, loss[loss=0.083, simple_loss=0.108, pruned_loss=0.01803, audio_tagging_loss=0.01096, over 15306.00 frames. ], tot_loss[loss=0.08111, simple_loss=0.1016, pruned_loss=0.02009, audio_tagging_loss=0.01024, over 3050542.84 frames. ], batch size: 57, lr: 5.24e-03, grad_scale: 32.0 2023-11-20 09:28:47,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1031886.6666666666, ans=0.125 2023-11-20 09:28:59,012 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 154800 2023-11-20 09:29:00,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1031953.3333333334, ans=0.125 2023-11-20 09:29:03,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1031953.3333333334, ans=0.125 2023-11-20 09:29:13,393 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.926e+01 8.208e+01 9.112e+01 1.062e+02 1.393e+02, threshold=1.822e+02, percent-clipped=0.0 2023-11-20 09:29:24,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1032086.6666666666, ans=0.0 2023-11-20 09:29:41,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1032153.3333333334, ans=0.07 2023-11-20 09:29:44,680 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.34 vs. limit=15.0 2023-11-20 09:29:45,127 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 10550, loss[loss=0.08954, simple_loss=0.1148, pruned_loss=0.02488, audio_tagging_loss=0.007264, over 15536.00 frames. ], tot_loss[loss=0.08135, simple_loss=0.1022, pruned_loss=0.0202, audio_tagging_loss=0.01007, over 3048898.87 frames. ], batch size: 58, lr: 5.24e-03, grad_scale: 32.0 2023-11-20 09:29:45,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1032220.0, ans=0.125 2023-11-20 09:29:57,772 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.57 vs. limit=15.0 2023-11-20 09:30:04,324 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 154850 2023-11-20 09:30:13,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1032353.3333333334, ans=0.1 2023-11-20 09:30:24,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1032420.0, ans=0.0 2023-11-20 09:30:49,032 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 10600, loss[loss=0.08126, simple_loss=0.1016, pruned_loss=0.01879, audio_tagging_loss=0.01168, over 14869.00 frames. ], tot_loss[loss=0.08092, simple_loss=0.1017, pruned_loss=0.0201, audio_tagging_loss=0.009953, over 3051130.76 frames. ], batch size: 55, lr: 5.24e-03, grad_scale: 32.0 2023-11-20 09:30:51,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1032553.3333333334, ans=0.125 2023-11-20 09:30:56,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1032553.3333333334, ans=0.125 2023-11-20 09:31:05,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1032620.0, ans=0.125 2023-11-20 09:31:06,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1032620.0, ans=0.1 2023-11-20 09:31:08,146 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 154900 2023-11-20 09:31:15,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1032686.6666666666, ans=0.125 2023-11-20 09:31:21,989 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.038e+01 8.175e+01 8.791e+01 9.542e+01 1.185e+02, threshold=1.758e+02, percent-clipped=0.0 2023-11-20 09:31:28,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1032753.3333333334, ans=0.125 2023-11-20 09:31:29,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=1032753.3333333334, ans=0.2 2023-11-20 09:31:29,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1032753.3333333334, ans=0.125 2023-11-20 09:31:40,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1032820.0, ans=0.0 2023-11-20 09:31:52,001 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 10650, loss[loss=0.104, simple_loss=0.1345, pruned_loss=0.02812, audio_tagging_loss=0.008588, over 16369.00 frames. ], tot_loss[loss=0.08025, simple_loss=0.1009, pruned_loss=0.01985, audio_tagging_loss=0.00996, over 3043680.10 frames. ], batch size: 59, lr: 5.24e-03, grad_scale: 32.0 2023-11-20 09:32:12,487 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 154950 2023-11-20 09:32:29,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1033020.0, ans=0.0 2023-11-20 09:32:43,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1033153.3333333334, ans=0.09899494936611666 2023-11-20 09:32:56,687 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 10700, loss[loss=0.08261, simple_loss=0.1031, pruned_loss=0.02135, audio_tagging_loss=0.009719, over 14933.00 frames. ], tot_loss[loss=0.0809, simple_loss=0.102, pruned_loss=0.02004, audio_tagging_loss=0.009867, over 3044765.80 frames. ], batch size: 55, lr: 5.24e-03, grad_scale: 32.0 2023-11-20 09:33:09,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1033286.6666666666, ans=0.125 2023-11-20 09:33:13,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1033286.6666666666, ans=0.025 2023-11-20 09:33:15,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1033286.6666666666, ans=0.95 2023-11-20 09:33:16,892 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 155000 2023-11-20 09:33:24,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1033353.3333333334, ans=0.025 2023-11-20 09:33:25,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1033353.3333333334, ans=0.125 2023-11-20 09:33:30,430 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.016e+01 8.303e+01 8.834e+01 9.458e+01 1.451e+02, threshold=1.767e+02, percent-clipped=0.0 2023-11-20 09:33:36,687 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.92 vs. limit=6.0 2023-11-20 09:34:02,463 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 10750, loss[loss=0.07671, simple_loss=0.09733, pruned_loss=0.0191, audio_tagging_loss=0.008946, over 15673.00 frames. ], tot_loss[loss=0.0802, simple_loss=0.1012, pruned_loss=0.01973, audio_tagging_loss=0.00989, over 3045376.06 frames. ], batch size: 59, lr: 5.23e-03, grad_scale: 32.0 2023-11-20 09:34:06,675 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.10 vs. limit=15.0 2023-11-20 09:34:07,511 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 09:34:14,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1033620.0, ans=0.125 2023-11-20 09:34:16,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1033620.0, ans=0.2 2023-11-20 09:34:16,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1033620.0, ans=0.09899494936611666 2023-11-20 09:34:18,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1033620.0, ans=0.0 2023-11-20 09:34:21,016 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 155050 2023-11-20 09:34:59,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1033820.0, ans=0.0 2023-11-20 09:35:00,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1033820.0, ans=0.125 2023-11-20 09:35:01,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1033820.0, ans=0.1 2023-11-20 09:35:06,220 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 10800, loss[loss=0.0945, simple_loss=0.1231, pruned_loss=0.02576, audio_tagging_loss=0.007188, over 14330.00 frames. ], tot_loss[loss=0.08068, simple_loss=0.1018, pruned_loss=0.01994, audio_tagging_loss=0.00982, over 3054342.44 frames. ], batch size: 53, lr: 5.23e-03, grad_scale: 32.0 2023-11-20 09:35:07,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1033886.6666666666, ans=0.125 2023-11-20 09:35:14,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1033886.6666666666, ans=0.0 2023-11-20 09:35:23,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1033953.3333333334, ans=0.2 2023-11-20 09:35:24,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1033953.3333333334, ans=0.0 2023-11-20 09:35:26,108 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 155100 2023-11-20 09:35:40,781 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.539e+01 8.046e+01 8.532e+01 9.175e+01 1.216e+02, threshold=1.706e+02, percent-clipped=0.0 2023-11-20 09:35:57,697 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.03 vs. limit=22.5 2023-11-20 09:36:04,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1034153.3333333334, ans=0.1 2023-11-20 09:36:11,284 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 10850, loss[loss=0.07355, simple_loss=0.08953, pruned_loss=0.01901, audio_tagging_loss=0.009773, over 14972.00 frames. ], tot_loss[loss=0.08094, simple_loss=0.102, pruned_loss=0.02008, audio_tagging_loss=0.009881, over 3052001.24 frames. ], batch size: 57, lr: 5.23e-03, grad_scale: 32.0 2023-11-20 09:36:11,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1034220.0, ans=0.2 2023-11-20 09:36:28,308 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.34 vs. limit=6.0 2023-11-20 09:36:32,058 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 155150 2023-11-20 09:36:37,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1034353.3333333334, ans=0.125 2023-11-20 09:36:38,918 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.25 vs. limit=22.5 2023-11-20 09:36:41,151 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.28 vs. limit=6.0 2023-11-20 09:36:56,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1034420.0, ans=0.0 2023-11-20 09:37:12,656 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 09:37:14,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1034486.6666666666, ans=0.2 2023-11-20 09:37:15,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1034553.3333333334, ans=0.1 2023-11-20 09:37:16,314 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 10900, loss[loss=0.09597, simple_loss=0.1173, pruned_loss=0.02771, audio_tagging_loss=0.009599, over 14688.00 frames. ], tot_loss[loss=0.08077, simple_loss=0.1019, pruned_loss=0.02001, audio_tagging_loss=0.009822, over 3053945.91 frames. ], batch size: 53, lr: 5.23e-03, grad_scale: 32.0 2023-11-20 09:37:35,724 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 155200 2023-11-20 09:37:50,145 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.033e+01 8.200e+01 8.772e+01 9.722e+01 1.243e+02, threshold=1.754e+02, percent-clipped=0.0 2023-11-20 09:37:59,878 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.72 vs. limit=22.5 2023-11-20 09:38:00,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1034753.3333333334, ans=0.07 2023-11-20 09:38:20,690 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 10950, loss[loss=0.07781, simple_loss=0.09756, pruned_loss=0.01803, audio_tagging_loss=0.01101, over 15101.00 frames. ], tot_loss[loss=0.08001, simple_loss=0.1004, pruned_loss=0.01974, audio_tagging_loss=0.01006, over 3044240.91 frames. ], batch size: 55, lr: 5.23e-03, grad_scale: 32.0 2023-11-20 09:38:31,794 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.33 vs. limit=15.0 2023-11-20 09:38:39,881 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 155250 2023-11-20 09:39:14,108 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.85 vs. limit=15.0 2023-11-20 09:39:24,991 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 11000, loss[loss=0.07897, simple_loss=0.09221, pruned_loss=0.02018, audio_tagging_loss=0.01268, over 15016.00 frames. ], tot_loss[loss=0.07903, simple_loss=0.09894, pruned_loss=0.01945, audio_tagging_loss=0.01011, over 3045438.91 frames. ], batch size: 56, lr: 5.23e-03, grad_scale: 32.0 2023-11-20 09:39:27,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1035220.0, ans=0.125 2023-11-20 09:39:28,161 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.84 vs. limit=15.0 2023-11-20 09:39:35,548 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 09:39:44,324 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 155300 2023-11-20 09:39:59,023 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.020e+01 8.120e+01 8.817e+01 9.505e+01 1.234e+02, threshold=1.763e+02, percent-clipped=0.0 2023-11-20 09:40:04,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1035420.0, ans=0.125 2023-11-20 09:40:05,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1035420.0, ans=0.125 2023-11-20 09:40:09,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1035420.0, ans=0.1 2023-11-20 09:40:17,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1035486.6666666666, ans=0.125 2023-11-20 09:40:20,994 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1035486.6666666666, ans=0.125 2023-11-20 09:40:29,862 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 11050, loss[loss=0.05428, simple_loss=0.05947, pruned_loss=0.01227, audio_tagging_loss=0.01227, over 14608.00 frames. ], tot_loss[loss=0.07983, simple_loss=0.09979, pruned_loss=0.01973, audio_tagging_loss=0.0102, over 3048882.89 frames. ], batch size: 57, lr: 5.23e-03, grad_scale: 16.0 2023-11-20 09:40:30,740 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.68 vs. limit=15.0 2023-11-20 09:40:33,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1035553.3333333334, ans=0.1 2023-11-20 09:40:49,944 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 155350 2023-11-20 09:40:50,550 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.40 vs. limit=15.0 2023-11-20 09:41:12,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1035753.3333333334, ans=0.125 2023-11-20 09:41:15,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1035753.3333333334, ans=0.125 2023-11-20 09:41:24,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1035820.0, ans=0.07 2023-11-20 09:41:34,876 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 11100, loss[loss=0.09485, simple_loss=0.1221, pruned_loss=0.02492, audio_tagging_loss=0.00889, over 15683.00 frames. ], tot_loss[loss=0.08067, simple_loss=0.1006, pruned_loss=0.01998, audio_tagging_loss=0.01036, over 3047762.44 frames. ], batch size: 58, lr: 5.23e-03, grad_scale: 16.0 2023-11-20 09:41:37,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1035886.6666666666, ans=0.1 2023-11-20 09:41:54,060 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 155400 2023-11-20 09:41:54,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1035953.3333333334, ans=0.125 2023-11-20 09:42:09,687 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.838e+01 8.254e+01 9.028e+01 9.759e+01 1.162e+02, threshold=1.806e+02, percent-clipped=0.0 2023-11-20 09:42:10,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=1036020.0, ans=15.0 2023-11-20 09:42:18,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1036086.6666666666, ans=0.0 2023-11-20 09:42:25,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1036153.3333333334, ans=0.125 2023-11-20 09:42:33,539 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.68 vs. limit=12.0 2023-11-20 09:42:39,726 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 11150, loss[loss=0.057, simple_loss=0.05933, pruned_loss=0.01114, audio_tagging_loss=0.0162, over 15192.00 frames. ], tot_loss[loss=0.07947, simple_loss=0.09889, pruned_loss=0.01945, audio_tagging_loss=0.01057, over 3050117.24 frames. ], batch size: 60, lr: 5.23e-03, grad_scale: 16.0 2023-11-20 09:42:54,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.whiten.whitening_limit, batch_count=1036286.6666666666, ans=12.0 2023-11-20 09:42:58,710 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 155450 2023-11-20 09:42:58,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1036286.6666666666, ans=0.125 2023-11-20 09:43:26,956 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.29 vs. limit=15.0 2023-11-20 09:43:38,906 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.78 vs. limit=10.0 2023-11-20 09:43:44,259 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 11200, loss[loss=0.08151, simple_loss=0.09809, pruned_loss=0.02069, audio_tagging_loss=0.01177, over 16178.00 frames. ], tot_loss[loss=0.07972, simple_loss=0.09915, pruned_loss=0.01948, audio_tagging_loss=0.01066, over 3052019.50 frames. ], batch size: 60, lr: 5.23e-03, grad_scale: 32.0 2023-11-20 09:44:03,212 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 155500 2023-11-20 09:44:03,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1036620.0, ans=0.125 2023-11-20 09:44:03,800 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.57 vs. limit=15.0 2023-11-20 09:44:06,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1036620.0, ans=0.125 2023-11-20 09:44:15,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1036686.6666666666, ans=0.125 2023-11-20 09:44:19,028 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.839e+01 8.019e+01 8.498e+01 9.323e+01 1.224e+02, threshold=1.700e+02, percent-clipped=0.0 2023-11-20 09:44:21,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1036753.3333333334, ans=0.2 2023-11-20 09:44:23,122 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=8.741e-02 2023-11-20 09:44:32,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1036753.3333333334, ans=0.125 2023-11-20 09:44:42,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1036820.0, ans=0.0 2023-11-20 09:44:44,334 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.22 vs. limit=6.0 2023-11-20 09:44:48,545 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 11250, loss[loss=0.07417, simple_loss=0.09555, pruned_loss=0.01469, audio_tagging_loss=0.01171, over 15267.00 frames. ], tot_loss[loss=0.07861, simple_loss=0.09777, pruned_loss=0.01914, audio_tagging_loss=0.01058, over 3045257.71 frames. ], batch size: 57, lr: 5.23e-03, grad_scale: 32.0 2023-11-20 09:45:00,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1036953.3333333334, ans=0.035 2023-11-20 09:45:08,203 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 155550 2023-11-20 09:45:23,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1037020.0, ans=0.0 2023-11-20 09:45:39,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1037153.3333333334, ans=0.0 2023-11-20 09:45:53,992 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 11300, loss[loss=0.07413, simple_loss=0.09689, pruned_loss=0.01491, audio_tagging_loss=0.01078, over 14186.00 frames. ], tot_loss[loss=0.07912, simple_loss=0.09895, pruned_loss=0.01928, audio_tagging_loss=0.01036, over 3044549.76 frames. ], batch size: 53, lr: 5.23e-03, grad_scale: 32.0 2023-11-20 09:46:07,176 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 09:46:13,172 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 155600 2023-11-20 09:46:16,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1037286.6666666666, ans=0.05 2023-11-20 09:46:17,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1037286.6666666666, ans=0.125 2023-11-20 09:46:26,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1037353.3333333334, ans=0.125 2023-11-20 09:46:27,023 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.31 vs. limit=22.5 2023-11-20 09:46:28,629 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.44 vs. limit=15.0 2023-11-20 09:46:28,934 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.896e+01 8.236e+01 9.129e+01 9.698e+01 1.564e+02, threshold=1.826e+02, percent-clipped=0.0 2023-11-20 09:46:31,211 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.91 vs. limit=15.0 2023-11-20 09:46:35,697 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.08 vs. limit=15.0 2023-11-20 09:46:57,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1037486.6666666666, ans=0.1 2023-11-20 09:46:59,331 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 11350, loss[loss=0.09182, simple_loss=0.1107, pruned_loss=0.03033, audio_tagging_loss=0.006128, over 15699.00 frames. ], tot_loss[loss=0.08009, simple_loss=0.1005, pruned_loss=0.01976, audio_tagging_loss=0.0101, over 3039932.67 frames. ], batch size: 56, lr: 5.22e-03, grad_scale: 32.0 2023-11-20 09:47:06,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1037553.3333333334, ans=0.125 2023-11-20 09:47:16,929 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.77 vs. limit=15.0 2023-11-20 09:47:18,653 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 155650 2023-11-20 09:47:27,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1037686.6666666666, ans=0.125 2023-11-20 09:48:04,497 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 11400, loss[loss=0.07621, simple_loss=0.09741, pruned_loss=0.0165, audio_tagging_loss=0.01101, over 14373.00 frames. ], tot_loss[loss=0.08028, simple_loss=0.1009, pruned_loss=0.01986, audio_tagging_loss=0.009964, over 3043233.51 frames. ], batch size: 54, lr: 5.22e-03, grad_scale: 32.0 2023-11-20 09:48:14,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1037886.6666666666, ans=0.2 2023-11-20 09:48:17,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1037953.3333333334, ans=0.125 2023-11-20 09:48:24,545 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 155700 2023-11-20 09:48:25,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1037953.3333333334, ans=0.1 2023-11-20 09:48:39,697 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.071e+01 7.956e+01 8.738e+01 9.892e+01 2.201e+02, threshold=1.748e+02, percent-clipped=1.0 2023-11-20 09:48:49,645 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.24 vs. limit=15.0 2023-11-20 09:48:57,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1038153.3333333334, ans=0.0 2023-11-20 09:48:58,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1038153.3333333334, ans=0.125 2023-11-20 09:48:59,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1038153.3333333334, ans=0.125 2023-11-20 09:49:07,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1038220.0, ans=0.0 2023-11-20 09:49:07,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1038220.0, ans=0.2 2023-11-20 09:49:09,133 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 11450, loss[loss=0.08108, simple_loss=0.1025, pruned_loss=0.02035, audio_tagging_loss=0.009478, over 14802.00 frames. ], tot_loss[loss=0.08026, simple_loss=0.1009, pruned_loss=0.01983, audio_tagging_loss=0.009981, over 3040994.17 frames. ], batch size: 56, lr: 5.22e-03, grad_scale: 32.0 2023-11-20 09:49:14,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1038220.0, ans=0.2 2023-11-20 09:49:24,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1038286.6666666666, ans=0.125 2023-11-20 09:49:28,744 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 155750 2023-11-20 09:49:38,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1038353.3333333334, ans=0.0 2023-11-20 09:49:40,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1038353.3333333334, ans=0.125 2023-11-20 09:49:49,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1038420.0, ans=0.125 2023-11-20 09:50:13,872 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 11500, loss[loss=0.07081, simple_loss=0.08841, pruned_loss=0.01444, audio_tagging_loss=0.01217, over 14279.00 frames. ], tot_loss[loss=0.07992, simple_loss=0.1005, pruned_loss=0.01971, audio_tagging_loss=0.009981, over 3038109.26 frames. ], batch size: 54, lr: 5.22e-03, grad_scale: 32.0 2023-11-20 09:50:21,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1038553.3333333334, ans=0.1 2023-11-20 09:50:32,912 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 155800 2023-11-20 09:50:47,688 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.982e+01 8.235e+01 8.586e+01 9.090e+01 1.242e+02, threshold=1.717e+02, percent-clipped=0.0 2023-11-20 09:51:18,154 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 11550, loss[loss=0.07141, simple_loss=0.09092, pruned_loss=0.0167, audio_tagging_loss=0.009245, over 14310.00 frames. ], tot_loss[loss=0.08015, simple_loss=0.1007, pruned_loss=0.01977, audio_tagging_loss=0.01, over 3044174.21 frames. ], batch size: 56, lr: 5.22e-03, grad_scale: 32.0 2023-11-20 09:51:31,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1038953.3333333334, ans=0.2 2023-11-20 09:51:36,529 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 155850 2023-11-20 09:51:47,051 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.82 vs. limit=15.0 2023-11-20 09:51:53,628 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.88 vs. limit=6.0 2023-11-20 09:51:56,503 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 09:52:01,484 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 09:52:07,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1039086.6666666666, ans=0.125 2023-11-20 09:52:13,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1039153.3333333334, ans=0.1 2023-11-20 09:52:21,325 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 11600, loss[loss=0.08629, simple_loss=0.1055, pruned_loss=0.02191, audio_tagging_loss=0.01161, over 14206.00 frames. ], tot_loss[loss=0.08046, simple_loss=0.1015, pruned_loss=0.01985, audio_tagging_loss=0.009847, over 3042963.56 frames. ], batch size: 57, lr: 5.22e-03, grad_scale: 32.0 2023-11-20 09:52:21,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1039220.0, ans=0.0 2023-11-20 09:52:21,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1039220.0, ans=0.0 2023-11-20 09:52:25,593 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.11 vs. limit=10.0 2023-11-20 09:52:41,570 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 155900 2023-11-20 09:52:53,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=1039353.3333333334, ans=0.025 2023-11-20 09:52:56,398 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.551e+01 8.203e+01 8.943e+01 9.744e+01 1.251e+02, threshold=1.789e+02, percent-clipped=0.0 2023-11-20 09:53:10,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1039420.0, ans=0.0 2023-11-20 09:53:11,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1039486.6666666666, ans=0.125 2023-11-20 09:53:25,673 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 11650, loss[loss=0.08419, simple_loss=0.1181, pruned_loss=0.0154, audio_tagging_loss=0.009754, over 17149.00 frames. ], tot_loss[loss=0.08041, simple_loss=0.1015, pruned_loss=0.01978, audio_tagging_loss=0.009868, over 3039459.99 frames. ], batch size: 62, lr: 5.22e-03, grad_scale: 32.0 2023-11-20 09:53:37,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1039620.0, ans=0.125 2023-11-20 09:53:41,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1039620.0, ans=0.0 2023-11-20 09:53:45,309 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 155950 2023-11-20 09:54:05,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1039753.3333333334, ans=0.125 2023-11-20 09:54:12,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1039753.3333333334, ans=0.125 2023-11-20 09:54:14,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1039753.3333333334, ans=0.125 2023-11-20 09:54:30,840 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 11700, loss[loss=0.05958, simple_loss=0.06994, pruned_loss=0.01167, audio_tagging_loss=0.01294, over 15166.00 frames. ], tot_loss[loss=0.0797, simple_loss=0.1004, pruned_loss=0.01955, audio_tagging_loss=0.009945, over 3030936.84 frames. ], batch size: 60, lr: 5.22e-03, grad_scale: 32.0 2023-11-20 09:54:33,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1039886.6666666666, ans=0.125 2023-11-20 09:54:45,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1039953.3333333334, ans=0.125 2023-11-20 09:54:49,304 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 156000 2023-11-20 09:54:58,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1040020.0, ans=0.0 2023-11-20 09:55:09,539 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.408e+01 8.314e+01 9.144e+01 1.029e+02 1.424e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-20 09:55:26,364 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 09:55:38,360 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 11750, loss[loss=0.103, simple_loss=0.1311, pruned_loss=0.03026, audio_tagging_loss=0.007154, over 15972.00 frames. ], tot_loss[loss=0.07988, simple_loss=0.1006, pruned_loss=0.01963, audio_tagging_loss=0.009949, over 3037076.64 frames. ], batch size: 59, lr: 5.22e-03, grad_scale: 32.0 2023-11-20 09:55:42,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1040220.0, ans=0.2 2023-11-20 09:55:58,484 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 156050 2023-11-20 09:55:59,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1040286.6666666666, ans=0.125 2023-11-20 09:56:01,723 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.47 vs. limit=12.0 2023-11-20 09:56:04,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1040353.3333333334, ans=0.125 2023-11-20 09:56:11,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1040353.3333333334, ans=0.125 2023-11-20 09:56:28,838 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1040486.6666666666, ans=0.125 2023-11-20 09:56:32,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1040486.6666666666, ans=0.125 2023-11-20 09:56:42,570 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 11800, loss[loss=0.07724, simple_loss=0.09511, pruned_loss=0.01926, audio_tagging_loss=0.01042, over 15334.00 frames. ], tot_loss[loss=0.07988, simple_loss=0.1005, pruned_loss=0.01961, audio_tagging_loss=0.01003, over 3035153.49 frames. ], batch size: 58, lr: 5.22e-03, grad_scale: 32.0 2023-11-20 09:56:46,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1040553.3333333334, ans=0.125 2023-11-20 09:57:02,652 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 156100 2023-11-20 09:57:17,073 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.890e+01 8.238e+01 8.781e+01 9.493e+01 1.182e+02, threshold=1.756e+02, percent-clipped=0.0 2023-11-20 09:57:38,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1040820.0, ans=0.125 2023-11-20 09:57:38,448 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.48 vs. limit=15.0 2023-11-20 09:57:46,283 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 11850, loss[loss=0.05156, simple_loss=0.06474, pruned_loss=0.01115, audio_tagging_loss=0.008044, over 15014.00 frames. ], tot_loss[loss=0.08009, simple_loss=0.1006, pruned_loss=0.01968, audio_tagging_loss=0.0101, over 3041066.29 frames. ], batch size: 58, lr: 5.22e-03, grad_scale: 32.0 2023-11-20 09:58:05,460 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 156150 2023-11-20 09:58:15,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1041020.0, ans=0.125 2023-11-20 09:58:50,142 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 11900, loss[loss=0.07885, simple_loss=0.1029, pruned_loss=0.01867, audio_tagging_loss=0.008716, over 15173.00 frames. ], tot_loss[loss=0.08009, simple_loss=0.1007, pruned_loss=0.01959, audio_tagging_loss=0.01013, over 3039997.40 frames. ], batch size: 57, lr: 5.21e-03, grad_scale: 32.0 2023-11-20 09:59:09,361 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 156200 2023-11-20 09:59:25,438 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.397e+01 8.078e+01 8.558e+01 9.294e+01 1.166e+02, threshold=1.712e+02, percent-clipped=0.0 2023-11-20 09:59:26,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1041353.3333333334, ans=0.2 2023-11-20 09:59:30,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1041420.0, ans=0.0 2023-11-20 09:59:31,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1041420.0, ans=0.2 2023-11-20 09:59:50,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1041486.6666666666, ans=0.0 2023-11-20 09:59:54,113 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 11950, loss[loss=0.06658, simple_loss=0.08733, pruned_loss=0.01405, audio_tagging_loss=0.008861, over 14395.00 frames. ], tot_loss[loss=0.08011, simple_loss=0.1006, pruned_loss=0.01954, audio_tagging_loss=0.01025, over 3040883.75 frames. ], batch size: 56, lr: 5.21e-03, grad_scale: 32.0 2023-11-20 09:59:55,806 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.96 vs. limit=12.0 2023-11-20 10:00:14,334 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 156250 2023-11-20 10:00:37,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1041753.3333333334, ans=0.125 2023-11-20 10:00:50,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1041820.0, ans=0.125 2023-11-20 10:00:56,381 INFO [train_asr.py:1262] (1/4) Epoch 13, batch 12000, loss[loss=0.05548, simple_loss=0.06367, pruned_loss=0.01266, audio_tagging_loss=0.01098, over 14663.00 frames. ], tot_loss[loss=0.08066, simple_loss=0.1011, pruned_loss=0.01971, audio_tagging_loss=0.01037, over 3044420.76 frames. ], batch size: 57, lr: 5.21e-03, grad_scale: 32.0 2023-11-20 10:00:56,381 INFO [train_asr.py:1285] (1/4) Computing validation loss 2023-11-20 10:01:33,997 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.4188, 3.6778, 2.3847, 3.7352], device='cuda:1') 2023-11-20 10:01:36,795 INFO [train_asr.py:1294] (1/4) Epoch 13, validation: loss=0.0624, simple_loss=0.05383, pruned_loss=0.00582, audio_tagging_loss=0.02967, over 4681554.00 frames. 2023-11-20 10:01:36,796 INFO [train_asr.py:1295] (1/4) Maximum memory allocated so far is 26082MB 2023-11-20 10:01:40,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1041886.6666666666, ans=0.0 2023-11-20 10:01:54,229 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 156300 2023-11-20 10:01:54,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1041953.3333333334, ans=0.025 2023-11-20 10:02:41,376 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 0, loss[loss=0.0845, simple_loss=0.0887, pruned_loss=0.01431, audio_tagging_loss=0.02584, over 15005.00 frames. ], tot_loss[loss=0.0845, simple_loss=0.0887, pruned_loss=0.01431, audio_tagging_loss=0.02584, over 15005.00 frames. ], batch size: 56, lr: 5.02e-03, grad_scale: 32.0 2023-11-20 10:02:41,377 INFO [train_asr.py:1285] (1/4) Computing validation loss 2023-11-20 10:03:13,418 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([3.9861, 3.1844, 2.9203, 3.0127, 3.4779, 2.7541, 3.3849, 2.6792], device='cuda:1') 2023-11-20 10:03:18,479 INFO [train_asr.py:1294] (1/4) Epoch 14, validation: loss=0.0621, simple_loss=0.05383, pruned_loss=0.005845, audio_tagging_loss=0.02934, over 4681554.00 frames. 2023-11-20 10:03:18,480 INFO [train_asr.py:1295] (1/4) Maximum memory allocated so far is 26082MB 2023-11-20 10:03:22,233 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.506e+01 8.326e+01 8.983e+01 9.877e+01 1.645e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-20 10:03:22,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1042046.6666666666, ans=0.2 2023-11-20 10:03:48,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1042180.0, ans=0.0 2023-11-20 10:03:59,840 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.28 vs. limit=15.0 2023-11-20 10:04:03,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1042246.6666666666, ans=0.125 2023-11-20 10:04:12,133 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 156350 2023-11-20 10:04:20,270 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.57 vs. limit=15.0 2023-11-20 10:04:23,732 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 50, loss[loss=0.1168, simple_loss=0.141, pruned_loss=0.03046, audio_tagging_loss=0.01582, over 15476.00 frames. ], tot_loss[loss=0.09323, simple_loss=0.1068, pruned_loss=0.02134, audio_tagging_loss=0.0185, over 689588.15 frames. ], batch size: 56, lr: 5.02e-03, grad_scale: 32.0 2023-11-20 10:04:27,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1042380.0, ans=0.0 2023-11-20 10:04:28,097 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.13 vs. limit=15.0 2023-11-20 10:04:42,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1042446.6666666666, ans=0.0 2023-11-20 10:04:45,416 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.37 vs. limit=6.0 2023-11-20 10:05:00,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1042513.3333333334, ans=0.035 2023-11-20 10:05:03,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1042580.0, ans=0.125 2023-11-20 10:05:08,029 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 10:05:16,941 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 156400 2023-11-20 10:05:22,651 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.09 vs. limit=6.0 2023-11-20 10:05:29,534 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 100, loss[loss=0.07309, simple_loss=0.07648, pruned_loss=0.01557, audio_tagging_loss=0.01928, over 14780.00 frames. ], tot_loss[loss=0.08998, simple_loss=0.103, pruned_loss=0.0202, audio_tagging_loss=0.01826, over 1209681.70 frames. ], batch size: 57, lr: 5.02e-03, grad_scale: 32.0 2023-11-20 10:05:33,222 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.056e+01 8.681e+01 9.274e+01 1.011e+02 1.384e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-20 10:05:33,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1042713.3333333334, ans=0.125 2023-11-20 10:05:45,914 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.08 vs. limit=15.0 2023-11-20 10:05:49,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1042780.0, ans=0.125 2023-11-20 10:06:22,041 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 156450 2023-11-20 10:06:33,140 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 150, loss[loss=0.08312, simple_loss=0.1066, pruned_loss=0.02073, audio_tagging_loss=0.00909, over 14596.00 frames. ], tot_loss[loss=0.0882, simple_loss=0.1038, pruned_loss=0.01999, audio_tagging_loss=0.01632, over 1615177.33 frames. ], batch size: 54, lr: 5.02e-03, grad_scale: 32.0 2023-11-20 10:06:40,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1043046.6666666666, ans=0.0 2023-11-20 10:07:13,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1043246.6666666666, ans=0.0 2023-11-20 10:07:18,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1043246.6666666666, ans=0.125 2023-11-20 10:07:27,246 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 156500 2023-11-20 10:07:36,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1043313.3333333334, ans=0.0 2023-11-20 10:07:38,215 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 200, loss[loss=0.08747, simple_loss=0.1239, pruned_loss=0.01855, audio_tagging_loss=0.006982, over 15685.00 frames. ], tot_loss[loss=0.08667, simple_loss=0.1036, pruned_loss=0.02029, audio_tagging_loss=0.01456, over 1935278.42 frames. ], batch size: 57, lr: 5.02e-03, grad_scale: 32.0 2023-11-20 10:07:39,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1043380.0, ans=0.1 2023-11-20 10:07:42,544 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.035e+01 8.224e+01 9.022e+01 9.818e+01 1.305e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-20 10:07:43,297 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.21 vs. limit=22.5 2023-11-20 10:08:27,101 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.90 vs. limit=15.0 2023-11-20 10:08:31,957 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 156550 2023-11-20 10:08:36,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1043646.6666666666, ans=0.1 2023-11-20 10:08:43,591 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 250, loss[loss=0.1073, simple_loss=0.1457, pruned_loss=0.02729, audio_tagging_loss=0.007201, over 15725.00 frames. ], tot_loss[loss=0.0847, simple_loss=0.1029, pruned_loss=0.01997, audio_tagging_loss=0.01326, over 2186694.95 frames. ], batch size: 55, lr: 5.02e-03, grad_scale: 16.0 2023-11-20 10:08:57,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1043780.0, ans=0.125 2023-11-20 10:09:20,096 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 10:09:29,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1043913.3333333334, ans=0.0 2023-11-20 10:09:36,979 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 156600 2023-11-20 10:09:39,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1043980.0, ans=0.5 2023-11-20 10:09:48,899 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 300, loss[loss=0.08844, simple_loss=0.1099, pruned_loss=0.02204, audio_tagging_loss=0.01143, over 16132.00 frames. ], tot_loss[loss=0.08336, simple_loss=0.1023, pruned_loss=0.0199, audio_tagging_loss=0.01233, over 2371880.36 frames. ], batch size: 59, lr: 5.02e-03, grad_scale: 16.0 2023-11-20 10:09:54,292 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.950e+01 8.223e+01 8.932e+01 9.585e+01 1.475e+02, threshold=1.786e+02, percent-clipped=0.0 2023-11-20 10:10:41,622 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.77 vs. limit=10.0 2023-11-20 10:10:42,885 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 156650 2023-11-20 10:10:53,833 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 350, loss[loss=0.0798, simple_loss=0.1068, pruned_loss=0.01772, audio_tagging_loss=0.00869, over 15399.00 frames. ], tot_loss[loss=0.08257, simple_loss=0.1021, pruned_loss=0.01987, audio_tagging_loss=0.01167, over 2519071.87 frames. ], batch size: 57, lr: 5.02e-03, grad_scale: 4.0 2023-11-20 10:11:08,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1044446.6666666666, ans=0.2 2023-11-20 10:11:18,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1044513.3333333334, ans=0.125 2023-11-20 10:11:46,519 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 156700 2023-11-20 10:11:52,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1044646.6666666666, ans=0.0 2023-11-20 10:11:58,260 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 400, loss[loss=0.06142, simple_loss=0.06857, pruned_loss=0.01526, audio_tagging_loss=0.01187, over 15296.00 frames. ], tot_loss[loss=0.08102, simple_loss=0.1003, pruned_loss=0.01954, audio_tagging_loss=0.01135, over 2633186.31 frames. ], batch size: 61, lr: 5.02e-03, grad_scale: 8.0 2023-11-20 10:12:00,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1044713.3333333334, ans=0.125 2023-11-20 10:12:06,229 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.716e+01 8.326e+01 8.879e+01 9.512e+01 2.019e+02, threshold=1.776e+02, percent-clipped=1.0 2023-11-20 10:12:19,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1044780.0, ans=0.07 2023-11-20 10:12:36,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1044913.3333333334, ans=0.2 2023-11-20 10:12:39,949 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.66 vs. limit=15.0 2023-11-20 10:12:40,845 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.20 vs. limit=15.0 2023-11-20 10:12:46,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1044913.3333333334, ans=0.125 2023-11-20 10:12:52,193 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 156750 2023-11-20 10:12:52,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=1044980.0, ans=0.2 2023-11-20 10:13:03,931 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 450, loss[loss=0.09125, simple_loss=0.1142, pruned_loss=0.02677, audio_tagging_loss=0.007372, over 15252.00 frames. ], tot_loss[loss=0.08118, simple_loss=0.1008, pruned_loss=0.01979, audio_tagging_loss=0.011, over 2722973.16 frames. ], batch size: 56, lr: 5.02e-03, grad_scale: 8.0 2023-11-20 10:13:05,772 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.34 vs. limit=15.0 2023-11-20 10:13:16,214 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 10:13:42,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=1045246.6666666666, ans=0.125 2023-11-20 10:13:57,626 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 156800 2023-11-20 10:14:00,298 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.42 vs. limit=15.0 2023-11-20 10:14:03,809 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.96 vs. limit=15.0 2023-11-20 10:14:09,381 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 500, loss[loss=0.07179, simple_loss=0.08838, pruned_loss=0.017, audio_tagging_loss=0.0106, over 15153.00 frames. ], tot_loss[loss=0.08069, simple_loss=0.09999, pruned_loss=0.01981, audio_tagging_loss=0.01088, over 2786066.46 frames. ], batch size: 58, lr: 5.02e-03, grad_scale: 8.0 2023-11-20 10:14:16,731 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.847e+01 8.307e+01 8.961e+01 9.765e+01 1.460e+02, threshold=1.792e+02, percent-clipped=0.0 2023-11-20 10:14:49,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1045580.0, ans=0.1 2023-11-20 10:14:54,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1045580.0, ans=0.0 2023-11-20 10:14:55,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1045580.0, ans=0.125 2023-11-20 10:15:02,737 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 156850 2023-11-20 10:15:04,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1045646.6666666666, ans=0.0 2023-11-20 10:15:14,466 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 550, loss[loss=0.104, simple_loss=0.131, pruned_loss=0.02825, audio_tagging_loss=0.01027, over 15770.00 frames. ], tot_loss[loss=0.0794, simple_loss=0.09866, pruned_loss=0.01929, audio_tagging_loss=0.01078, over 2834898.51 frames. ], batch size: 59, lr: 5.01e-03, grad_scale: 8.0 2023-11-20 10:15:20,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1045713.3333333334, ans=0.0 2023-11-20 10:15:20,463 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.412e-01 2023-11-20 10:15:20,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1045713.3333333334, ans=0.0 2023-11-20 10:15:20,934 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.92 vs. limit=22.5 2023-11-20 10:16:08,716 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 156900 2023-11-20 10:16:14,091 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.92 vs. limit=15.0 2023-11-20 10:16:18,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1046046.6666666666, ans=0.0 2023-11-20 10:16:19,733 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 600, loss[loss=0.07799, simple_loss=0.1039, pruned_loss=0.01598, audio_tagging_loss=0.01005, over 15093.00 frames. ], tot_loss[loss=0.07965, simple_loss=0.09932, pruned_loss=0.0193, audio_tagging_loss=0.01069, over 2874195.30 frames. ], batch size: 55, lr: 5.01e-03, grad_scale: 8.0 2023-11-20 10:16:21,635 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.45 vs. limit=10.0 2023-11-20 10:16:21,759 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.52 vs. limit=15.0 2023-11-20 10:16:22,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1046046.6666666666, ans=0.5 2023-11-20 10:16:27,248 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.847e+01 7.933e+01 8.592e+01 9.443e+01 1.249e+02, threshold=1.718e+02, percent-clipped=0.0 2023-11-20 10:16:29,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1046046.6666666666, ans=0.2 2023-11-20 10:16:41,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1046113.3333333334, ans=0.125 2023-11-20 10:16:47,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1046180.0, ans=0.125 2023-11-20 10:17:13,201 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 156950 2023-11-20 10:17:24,846 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 650, loss[loss=0.08817, simple_loss=0.1075, pruned_loss=0.02306, audio_tagging_loss=0.01134, over 15134.00 frames. ], tot_loss[loss=0.07969, simple_loss=0.09964, pruned_loss=0.01938, audio_tagging_loss=0.01048, over 2919413.95 frames. ], batch size: 57, lr: 5.01e-03, grad_scale: 8.0 2023-11-20 10:17:42,325 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.85 vs. limit=15.0 2023-11-20 10:17:44,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1046446.6666666666, ans=0.125 2023-11-20 10:17:56,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=1046513.3333333334, ans=0.2 2023-11-20 10:18:06,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1046580.0, ans=0.1 2023-11-20 10:18:10,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1046580.0, ans=0.2 2023-11-20 10:18:17,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1046646.6666666666, ans=0.1 2023-11-20 10:18:17,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1046646.6666666666, ans=0.2 2023-11-20 10:18:18,331 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 157000 2023-11-20 10:18:24,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1046646.6666666666, ans=0.0 2023-11-20 10:18:26,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1046646.6666666666, ans=0.125 2023-11-20 10:18:30,354 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 700, loss[loss=0.1043, simple_loss=0.1344, pruned_loss=0.02915, audio_tagging_loss=0.007938, over 15384.00 frames. ], tot_loss[loss=0.0803, simple_loss=0.1006, pruned_loss=0.01961, audio_tagging_loss=0.01038, over 2953713.79 frames. ], batch size: 56, lr: 5.01e-03, grad_scale: 8.0 2023-11-20 10:18:31,217 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.70 vs. limit=6.0 2023-11-20 10:18:38,467 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.004e+01 8.472e+01 9.225e+01 1.029e+02 2.197e+02, threshold=1.845e+02, percent-clipped=1.0 2023-11-20 10:18:43,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1046780.0, ans=0.125 2023-11-20 10:18:47,603 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 10:19:08,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1046913.3333333334, ans=0.0 2023-11-20 10:19:23,868 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 157050 2023-11-20 10:19:28,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=1046980.0, ans=0.125 2023-11-20 10:19:35,568 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 750, loss[loss=0.07841, simple_loss=0.1007, pruned_loss=0.01688, audio_tagging_loss=0.01116, over 15745.00 frames. ], tot_loss[loss=0.08019, simple_loss=0.1008, pruned_loss=0.01955, audio_tagging_loss=0.01026, over 2981312.20 frames. ], batch size: 58, lr: 5.01e-03, grad_scale: 8.0 2023-11-20 10:19:38,228 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1047046.6666666666, ans=0.125 2023-11-20 10:19:43,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1047046.6666666666, ans=0.1 2023-11-20 10:19:47,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1047113.3333333334, ans=0.125 2023-11-20 10:19:48,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1047113.3333333334, ans=0.0 2023-11-20 10:20:00,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1047180.0, ans=0.125 2023-11-20 10:20:21,036 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.34 vs. limit=10.0 2023-11-20 10:20:24,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1047246.6666666666, ans=0.125 2023-11-20 10:20:29,152 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 157100 2023-11-20 10:20:38,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1047313.3333333334, ans=0.125 2023-11-20 10:20:40,853 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 800, loss[loss=0.08887, simple_loss=0.1196, pruned_loss=0.0212, audio_tagging_loss=0.007844, over 15380.00 frames. ], tot_loss[loss=0.08044, simple_loss=0.1011, pruned_loss=0.01959, audio_tagging_loss=0.01031, over 2995872.61 frames. ], batch size: 55, lr: 5.01e-03, grad_scale: 16.0 2023-11-20 10:20:49,077 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.041e+01 8.237e+01 8.953e+01 9.687e+01 1.353e+02, threshold=1.791e+02, percent-clipped=0.0 2023-11-20 10:20:52,365 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.15 vs. limit=15.0 2023-11-20 10:21:10,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=1047513.3333333334, ans=0.0 2023-11-20 10:21:24,891 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.24 vs. limit=6.0 2023-11-20 10:21:28,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1047580.0, ans=0.125 2023-11-20 10:21:34,729 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 157150 2023-11-20 10:21:42,194 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.10 vs. limit=15.0 2023-11-20 10:21:46,962 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 850, loss[loss=0.08915, simple_loss=0.1171, pruned_loss=0.02156, audio_tagging_loss=0.009032, over 15646.00 frames. ], tot_loss[loss=0.08032, simple_loss=0.1006, pruned_loss=0.01967, audio_tagging_loss=0.01035, over 2998001.91 frames. ], batch size: 58, lr: 5.01e-03, grad_scale: 16.0 2023-11-20 10:21:47,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1047713.3333333334, ans=0.09899494936611666 2023-11-20 10:22:07,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1047780.0, ans=0.125 2023-11-20 10:22:39,783 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 157200 2023-11-20 10:22:50,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1048046.6666666666, ans=0.125 2023-11-20 10:22:51,809 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 900, loss[loss=0.05572, simple_loss=0.06454, pruned_loss=0.01096, audio_tagging_loss=0.01249, over 14018.00 frames. ], tot_loss[loss=0.07977, simple_loss=0.09987, pruned_loss=0.01945, audio_tagging_loss=0.01038, over 3006496.66 frames. ], batch size: 54, lr: 5.01e-03, grad_scale: 16.0 2023-11-20 10:22:59,167 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.565e+01 8.407e+01 9.404e+01 1.035e+02 1.329e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-20 10:23:00,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1048046.6666666666, ans=0.0 2023-11-20 10:23:03,896 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.87 vs. limit=15.0 2023-11-20 10:23:26,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1048180.0, ans=0.0 2023-11-20 10:23:26,866 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.52 vs. limit=15.0 2023-11-20 10:23:35,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1048246.6666666666, ans=0.125 2023-11-20 10:23:44,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1048313.3333333334, ans=0.125 2023-11-20 10:23:45,680 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 157250 2023-11-20 10:23:45,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1048313.3333333334, ans=0.2 2023-11-20 10:23:48,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=1048313.3333333334, ans=0.02 2023-11-20 10:23:57,281 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 950, loss[loss=0.06818, simple_loss=0.08999, pruned_loss=0.01354, audio_tagging_loss=0.009646, over 15010.00 frames. ], tot_loss[loss=0.07931, simple_loss=0.09981, pruned_loss=0.01922, audio_tagging_loss=0.01019, over 3011045.13 frames. ], batch size: 56, lr: 5.01e-03, grad_scale: 16.0 2023-11-20 10:24:06,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1048380.0, ans=0.2 2023-11-20 10:24:13,067 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 10:24:23,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1048513.3333333334, ans=0.125 2023-11-20 10:24:23,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1048513.3333333334, ans=0.1 2023-11-20 10:24:49,058 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.13 vs. limit=22.5 2023-11-20 10:24:50,732 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 157300 2023-11-20 10:24:59,543 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1048646.6666666667, ans=0.125 2023-11-20 10:25:02,392 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 1000, loss[loss=0.09913, simple_loss=0.1276, pruned_loss=0.0255, audio_tagging_loss=0.009831, over 14037.00 frames. ], tot_loss[loss=0.07999, simple_loss=0.1008, pruned_loss=0.01951, audio_tagging_loss=0.0101, over 3021501.29 frames. ], batch size: 54, lr: 5.01e-03, grad_scale: 16.0 2023-11-20 10:25:09,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1048713.3333333333, ans=0.125 2023-11-20 10:25:10,472 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.129e+01 7.769e+01 8.550e+01 9.087e+01 1.228e+02, threshold=1.710e+02, percent-clipped=0.0 2023-11-20 10:25:11,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1048713.3333333333, ans=0.125 2023-11-20 10:25:29,965 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 10:25:37,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1048846.6666666667, ans=0.125 2023-11-20 10:25:52,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1048913.3333333333, ans=0.0 2023-11-20 10:25:55,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=1048980.0, ans=0.05 2023-11-20 10:25:55,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1048980.0, ans=0.125 2023-11-20 10:25:56,459 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 157350 2023-11-20 10:26:08,089 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 1050, loss[loss=0.06458, simple_loss=0.08284, pruned_loss=0.01333, audio_tagging_loss=0.009826, over 14171.00 frames. ], tot_loss[loss=0.07992, simple_loss=0.101, pruned_loss=0.01949, audio_tagging_loss=0.009944, over 3022746.07 frames. ], batch size: 54, lr: 5.01e-03, grad_scale: 16.0 2023-11-20 10:26:21,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1049113.3333333333, ans=0.1 2023-11-20 10:26:32,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1049180.0, ans=0.125 2023-11-20 10:26:41,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1049180.0, ans=0.125 2023-11-20 10:26:43,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1049180.0, ans=0.125 2023-11-20 10:26:49,552 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.36 vs. limit=15.0 2023-11-20 10:26:56,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1049246.6666666667, ans=0.125 2023-11-20 10:27:01,158 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.99 vs. limit=22.5 2023-11-20 10:27:01,890 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 157400 2023-11-20 10:27:07,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1049313.3333333333, ans=0.125 2023-11-20 10:27:13,279 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 1100, loss[loss=0.07122, simple_loss=0.09387, pruned_loss=0.01451, audio_tagging_loss=0.009778, over 14417.00 frames. ], tot_loss[loss=0.0796, simple_loss=0.1005, pruned_loss=0.01948, audio_tagging_loss=0.009892, over 3024578.25 frames. ], batch size: 54, lr: 5.01e-03, grad_scale: 16.0 2023-11-20 10:27:17,853 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 10:27:18,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1049380.0, ans=0.125 2023-11-20 10:27:21,534 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.626e+01 8.064e+01 8.696e+01 9.715e+01 1.230e+02, threshold=1.739e+02, percent-clipped=0.0 2023-11-20 10:27:28,749 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.15 vs. limit=15.0 2023-11-20 10:27:47,814 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.59 vs. limit=12.0 2023-11-20 10:28:07,798 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 157450 2023-11-20 10:28:19,096 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 1150, loss[loss=0.05534, simple_loss=0.06751, pruned_loss=0.01282, audio_tagging_loss=0.008772, over 14708.00 frames. ], tot_loss[loss=0.07979, simple_loss=0.1006, pruned_loss=0.01966, audio_tagging_loss=0.009828, over 3033728.44 frames. ], batch size: 57, lr: 5.00e-03, grad_scale: 16.0 2023-11-20 10:28:26,634 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.29 vs. limit=15.0 2023-11-20 10:28:30,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1049713.3333333333, ans=0.1 2023-11-20 10:28:35,606 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.98 vs. limit=6.0 2023-11-20 10:28:36,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1049780.0, ans=0.1 2023-11-20 10:28:40,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1049780.0, ans=0.125 2023-11-20 10:28:44,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1049780.0, ans=0.125 2023-11-20 10:28:49,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1049846.6666666667, ans=0.125 2023-11-20 10:28:54,540 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.90 vs. limit=15.0 2023-11-20 10:29:14,239 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 157500 2023-11-20 10:29:14,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1049980.0, ans=0.125 2023-11-20 10:29:26,532 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 1200, loss[loss=0.06066, simple_loss=0.07572, pruned_loss=0.01216, audio_tagging_loss=0.01064, over 13822.00 frames. ], tot_loss[loss=0.07831, simple_loss=0.0988, pruned_loss=0.01906, audio_tagging_loss=0.009844, over 3034097.69 frames. ], batch size: 53, lr: 5.00e-03, grad_scale: 32.0 2023-11-20 10:29:33,805 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.234e+01 8.296e+01 8.818e+01 9.703e+01 1.332e+02, threshold=1.764e+02, percent-clipped=0.0 2023-11-20 10:29:34,357 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.16 vs. limit=15.0 2023-11-20 10:29:45,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1050113.3333333333, ans=0.125 2023-11-20 10:30:15,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1050246.6666666667, ans=0.125 2023-11-20 10:30:19,784 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 157550 2023-11-20 10:30:31,184 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 1250, loss[loss=0.08204, simple_loss=0.1019, pruned_loss=0.02082, audio_tagging_loss=0.01028, over 14659.00 frames. ], tot_loss[loss=0.0787, simple_loss=0.09916, pruned_loss=0.01931, audio_tagging_loss=0.009815, over 3032657.05 frames. ], batch size: 54, lr: 5.00e-03, grad_scale: 32.0 2023-11-20 10:30:40,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1050380.0, ans=0.125 2023-11-20 10:30:48,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1050446.6666666667, ans=0.125 2023-11-20 10:30:51,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1050446.6666666667, ans=0.125 2023-11-20 10:31:01,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff3.min_abs, batch_count=1050513.3333333333, ans=0.2 2023-11-20 10:31:24,034 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 157600 2023-11-20 10:31:25,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1050646.6666666667, ans=0.125 2023-11-20 10:31:29,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1050646.6666666667, ans=0.2 2023-11-20 10:31:32,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1050646.6666666667, ans=0.07 2023-11-20 10:31:35,668 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 1300, loss[loss=0.07794, simple_loss=0.09697, pruned_loss=0.01971, audio_tagging_loss=0.009743, over 14428.00 frames. ], tot_loss[loss=0.07892, simple_loss=0.0994, pruned_loss=0.01937, audio_tagging_loss=0.009852, over 3030917.39 frames. ], batch size: 56, lr: 5.00e-03, grad_scale: 32.0 2023-11-20 10:31:37,650 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.28 vs. limit=15.0 2023-11-20 10:31:37,702 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.23 vs. limit=10.0 2023-11-20 10:31:43,829 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.152e+01 8.044e+01 8.712e+01 9.271e+01 1.350e+02, threshold=1.742e+02, percent-clipped=0.0 2023-11-20 10:31:45,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1050713.3333333333, ans=0.125 2023-11-20 10:31:49,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1050780.0, ans=0.5 2023-11-20 10:31:50,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1050780.0, ans=0.125 2023-11-20 10:31:57,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1050780.0, ans=0.035 2023-11-20 10:31:57,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1050780.0, ans=0.125 2023-11-20 10:31:57,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1050780.0, ans=0.125 2023-11-20 10:32:11,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1050846.6666666667, ans=0.125 2023-11-20 10:32:29,173 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 157650 2023-11-20 10:32:34,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1050980.0, ans=0.2 2023-11-20 10:32:40,772 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 1350, loss[loss=0.06625, simple_loss=0.08736, pruned_loss=0.01211, audio_tagging_loss=0.01046, over 16100.00 frames. ], tot_loss[loss=0.07847, simple_loss=0.0986, pruned_loss=0.01919, audio_tagging_loss=0.009979, over 3035030.72 frames. ], batch size: 59, lr: 5.00e-03, grad_scale: 16.0 2023-11-20 10:32:44,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1051046.6666666667, ans=0.0 2023-11-20 10:32:54,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1051113.3333333333, ans=0.2 2023-11-20 10:32:54,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1051113.3333333333, ans=0.125 2023-11-20 10:32:54,827 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.68 vs. limit=15.0 2023-11-20 10:33:03,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1051113.3333333333, ans=0.1 2023-11-20 10:33:11,003 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 10:33:25,895 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.04 vs. limit=15.0 2023-11-20 10:33:27,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1051246.6666666667, ans=0.125 2023-11-20 10:33:28,890 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 10:33:34,509 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 157700 2023-11-20 10:33:46,180 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 1400, loss[loss=0.07376, simple_loss=0.09173, pruned_loss=0.01592, audio_tagging_loss=0.01198, over 15206.00 frames. ], tot_loss[loss=0.07914, simple_loss=0.09961, pruned_loss=0.01939, audio_tagging_loss=0.009942, over 3051124.41 frames. ], batch size: 55, lr: 5.00e-03, grad_scale: 16.0 2023-11-20 10:33:52,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1051380.0, ans=0.125 2023-11-20 10:33:55,350 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.246e+01 8.256e+01 8.950e+01 9.563e+01 1.336e+02, threshold=1.790e+02, percent-clipped=0.0 2023-11-20 10:34:31,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1051580.0, ans=0.125 2023-11-20 10:34:35,319 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 10:34:38,658 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 157750 2023-11-20 10:34:38,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1051646.6666666667, ans=0.125 2023-11-20 10:34:40,476 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.29 vs. limit=15.0 2023-11-20 10:34:49,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1051713.3333333333, ans=0.125 2023-11-20 10:34:49,854 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.16 vs. limit=22.5 2023-11-20 10:34:50,300 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 1450, loss[loss=0.08685, simple_loss=0.111, pruned_loss=0.02092, audio_tagging_loss=0.01042, over 15434.00 frames. ], tot_loss[loss=0.07909, simple_loss=0.09898, pruned_loss=0.01942, audio_tagging_loss=0.01018, over 3049598.80 frames. ], batch size: 56, lr: 5.00e-03, grad_scale: 16.0 2023-11-20 10:34:50,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1051713.3333333333, ans=0.125 2023-11-20 10:34:52,060 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.36 vs. limit=22.5 2023-11-20 10:35:18,464 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.27 vs. limit=15.0 2023-11-20 10:35:43,430 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 157800 2023-11-20 10:35:49,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=1051980.0, ans=0.0 2023-11-20 10:35:54,997 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.22 vs. limit=6.0 2023-11-20 10:35:55,546 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 1500, loss[loss=0.08479, simple_loss=0.09982, pruned_loss=0.02333, audio_tagging_loss=0.01155, over 15300.00 frames. ], tot_loss[loss=0.08024, simple_loss=0.1005, pruned_loss=0.01988, audio_tagging_loss=0.01011, over 3048662.10 frames. ], batch size: 56, lr: 5.00e-03, grad_scale: 16.0 2023-11-20 10:36:04,972 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.891e+01 8.201e+01 8.884e+01 9.746e+01 1.400e+02, threshold=1.777e+02, percent-clipped=0.0 2023-11-20 10:36:05,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1052046.6666666667, ans=0.125 2023-11-20 10:36:06,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1052046.6666666667, ans=0.125 2023-11-20 10:36:13,952 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.50 vs. limit=10.0 2023-11-20 10:36:29,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1052180.0, ans=0.0 2023-11-20 10:36:35,794 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.42 vs. limit=10.0 2023-11-20 10:36:36,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1052246.6666666667, ans=0.125 2023-11-20 10:36:42,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1052246.6666666667, ans=0.0 2023-11-20 10:36:50,147 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 157850 2023-11-20 10:36:52,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1052313.3333333333, ans=0.2 2023-11-20 10:36:59,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1052313.3333333333, ans=0.07 2023-11-20 10:37:01,346 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 1550, loss[loss=0.06509, simple_loss=0.07261, pruned_loss=0.01523, audio_tagging_loss=0.01356, over 14383.00 frames. ], tot_loss[loss=0.0799, simple_loss=0.1002, pruned_loss=0.01962, audio_tagging_loss=0.0102, over 3046930.91 frames. ], batch size: 57, lr: 5.00e-03, grad_scale: 16.0 2023-11-20 10:37:09,090 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.08 vs. limit=6.0 2023-11-20 10:37:12,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1052380.0, ans=0.0 2023-11-20 10:37:55,473 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 157900 2023-11-20 10:38:03,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1052646.6666666667, ans=0.0 2023-11-20 10:38:07,076 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 1600, loss[loss=0.06597, simple_loss=0.0828, pruned_loss=0.01332, audio_tagging_loss=0.01125, over 16707.00 frames. ], tot_loss[loss=0.07943, simple_loss=0.09942, pruned_loss=0.0195, audio_tagging_loss=0.01022, over 3043867.84 frames. ], batch size: 64, lr: 5.00e-03, grad_scale: 32.0 2023-11-20 10:38:07,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1052713.3333333333, ans=0.125 2023-11-20 10:38:08,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1052713.3333333333, ans=0.025 2023-11-20 10:38:13,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1052713.3333333333, ans=0.1 2023-11-20 10:38:15,677 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.476e+01 8.170e+01 8.883e+01 9.558e+01 1.180e+02, threshold=1.777e+02, percent-clipped=0.0 2023-11-20 10:38:45,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1052913.3333333333, ans=0.125 2023-11-20 10:38:52,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1052913.3333333333, ans=0.0 2023-11-20 10:39:00,487 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 157950 2023-11-20 10:39:04,794 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.83 vs. limit=22.5 2023-11-20 10:39:12,316 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 1650, loss[loss=0.08247, simple_loss=0.09597, pruned_loss=0.01955, audio_tagging_loss=0.01493, over 14427.00 frames. ], tot_loss[loss=0.07931, simple_loss=0.09913, pruned_loss=0.0194, audio_tagging_loss=0.01034, over 3045213.80 frames. ], batch size: 56, lr: 5.00e-03, grad_scale: 32.0 2023-11-20 10:39:50,022 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.62 vs. limit=15.0 2023-11-20 10:40:06,064 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 158000 2023-11-20 10:40:06,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1053313.3333333333, ans=0.0 2023-11-20 10:40:17,625 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 1700, loss[loss=0.08134, simple_loss=0.09418, pruned_loss=0.02172, audio_tagging_loss=0.01254, over 14747.00 frames. ], tot_loss[loss=0.07907, simple_loss=0.09854, pruned_loss=0.0193, audio_tagging_loss=0.0105, over 3041423.60 frames. ], batch size: 56, lr: 5.00e-03, grad_scale: 32.0 2023-11-20 10:40:26,728 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.487e+01 8.158e+01 8.615e+01 9.243e+01 1.140e+02, threshold=1.723e+02, percent-clipped=0.0 2023-11-20 10:41:10,239 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 158050 2023-11-20 10:41:22,437 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 1750, loss[loss=0.0477, simple_loss=0.04991, pruned_loss=0.01145, audio_tagging_loss=0.0113, over 14358.00 frames. ], tot_loss[loss=0.0782, simple_loss=0.09747, pruned_loss=0.01902, audio_tagging_loss=0.01045, over 3037946.53 frames. ], batch size: 57, lr: 5.00e-03, grad_scale: 32.0 2023-11-20 10:41:35,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1053780.0, ans=0.0 2023-11-20 10:41:49,259 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.21 vs. limit=15.0 2023-11-20 10:42:15,692 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 158100 2023-11-20 10:42:27,196 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 1800, loss[loss=0.1049, simple_loss=0.138, pruned_loss=0.02727, audio_tagging_loss=0.008669, over 15752.00 frames. ], tot_loss[loss=0.07892, simple_loss=0.09899, pruned_loss=0.01914, audio_tagging_loss=0.01028, over 3039899.77 frames. ], batch size: 55, lr: 4.99e-03, grad_scale: 16.0 2023-11-20 10:42:27,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1054046.6666666667, ans=0.125 2023-11-20 10:42:31,134 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 10:42:32,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1054046.6666666667, ans=0.125 2023-11-20 10:42:32,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1054046.6666666667, ans=0.0 2023-11-20 10:42:37,568 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.937e+01 8.074e+01 8.907e+01 9.490e+01 1.284e+02, threshold=1.781e+02, percent-clipped=0.0 2023-11-20 10:42:38,113 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.52 vs. limit=15.0 2023-11-20 10:43:05,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1054246.6666666667, ans=0.125 2023-11-20 10:43:16,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=1054246.6666666667, ans=22.5 2023-11-20 10:43:20,189 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 158150 2023-11-20 10:43:31,647 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 1850, loss[loss=0.08889, simple_loss=0.1132, pruned_loss=0.02453, audio_tagging_loss=0.007775, over 15850.00 frames. ], tot_loss[loss=0.079, simple_loss=0.09941, pruned_loss=0.01906, audio_tagging_loss=0.01024, over 3043133.79 frames. ], batch size: 57, lr: 4.99e-03, grad_scale: 16.0 2023-11-20 10:43:43,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1054446.6666666667, ans=0.2 2023-11-20 10:43:43,605 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.23 vs. limit=15.0 2023-11-20 10:43:56,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1054513.3333333333, ans=0.125 2023-11-20 10:43:57,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1054513.3333333333, ans=0.0 2023-11-20 10:44:13,765 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.73 vs. limit=15.0 2023-11-20 10:44:19,426 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.30 vs. limit=15.0 2023-11-20 10:44:20,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1054580.0, ans=0.125 2023-11-20 10:44:25,178 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 158200 2023-11-20 10:44:37,035 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 1900, loss[loss=0.07937, simple_loss=0.1049, pruned_loss=0.019, audio_tagging_loss=0.007913, over 14558.00 frames. ], tot_loss[loss=0.07844, simple_loss=0.0988, pruned_loss=0.01893, audio_tagging_loss=0.01011, over 3042672.32 frames. ], batch size: 54, lr: 4.99e-03, grad_scale: 16.0 2023-11-20 10:44:46,622 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 10:44:47,513 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.693e+01 8.196e+01 8.901e+01 9.660e+01 1.214e+02, threshold=1.780e+02, percent-clipped=0.0 2023-11-20 10:45:15,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1054913.3333333333, ans=0.07 2023-11-20 10:45:30,909 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 158250 2023-11-20 10:45:42,674 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 1950, loss[loss=0.07829, simple_loss=0.1008, pruned_loss=0.02095, audio_tagging_loss=0.006922, over 15141.00 frames. ], tot_loss[loss=0.07759, simple_loss=0.09755, pruned_loss=0.01882, audio_tagging_loss=0.009995, over 3034650.62 frames. ], batch size: 56, lr: 4.99e-03, grad_scale: 16.0 2023-11-20 10:45:45,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1055046.6666666667, ans=0.125 2023-11-20 10:46:03,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1055113.3333333333, ans=0.1 2023-11-20 10:46:14,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1055180.0, ans=0.1 2023-11-20 10:46:36,392 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 158300 2023-11-20 10:46:36,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1055313.3333333333, ans=0.05 2023-11-20 10:46:47,907 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 2000, loss[loss=0.08298, simple_loss=0.09315, pruned_loss=0.02341, audio_tagging_loss=0.013, over 14778.00 frames. ], tot_loss[loss=0.07819, simple_loss=0.09833, pruned_loss=0.01908, audio_tagging_loss=0.009953, over 3032902.52 frames. ], batch size: 56, lr: 4.99e-03, grad_scale: 32.0 2023-11-20 10:46:50,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1055380.0, ans=0.125 2023-11-20 10:46:57,727 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.488e+01 8.029e+01 8.442e+01 9.197e+01 1.092e+02, threshold=1.688e+02, percent-clipped=0.0 2023-11-20 10:47:06,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1055446.6666666667, ans=0.2 2023-11-20 10:47:24,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1055513.3333333333, ans=0.125 2023-11-20 10:47:41,009 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 158350 2023-11-20 10:47:50,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1055713.3333333333, ans=0.125 2023-11-20 10:47:52,029 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 2050, loss[loss=0.08278, simple_loss=0.1053, pruned_loss=0.02135, audio_tagging_loss=0.008786, over 14702.00 frames. ], tot_loss[loss=0.07848, simple_loss=0.09887, pruned_loss=0.01914, audio_tagging_loss=0.009908, over 3035010.10 frames. ], batch size: 54, lr: 4.99e-03, grad_scale: 32.0 2023-11-20 10:48:11,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1055780.0, ans=0.125 2023-11-20 10:48:45,926 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 158400 2023-11-20 10:48:58,047 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 2100, loss[loss=0.06135, simple_loss=0.07437, pruned_loss=0.01322, audio_tagging_loss=0.01094, over 16075.00 frames. ], tot_loss[loss=0.07913, simple_loss=0.09984, pruned_loss=0.01929, audio_tagging_loss=0.009916, over 3040930.48 frames. ], batch size: 61, lr: 4.99e-03, grad_scale: 32.0 2023-11-20 10:49:08,628 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.102e+01 8.360e+01 8.882e+01 9.714e+01 1.219e+02, threshold=1.776e+02, percent-clipped=0.0 2023-11-20 10:49:51,938 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 158450 2023-11-20 10:50:02,847 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 2150, loss[loss=0.07028, simple_loss=0.08447, pruned_loss=0.01918, audio_tagging_loss=0.008874, over 15294.00 frames. ], tot_loss[loss=0.07903, simple_loss=0.09988, pruned_loss=0.01919, audio_tagging_loss=0.0099, over 3036357.07 frames. ], batch size: 61, lr: 4.99e-03, grad_scale: 32.0 2023-11-20 10:50:16,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1056446.6666666667, ans=0.125 2023-11-20 10:50:29,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1056513.3333333333, ans=0.125 2023-11-20 10:50:35,814 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.08 vs. limit=10.0 2023-11-20 10:50:40,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1056513.3333333333, ans=0.0 2023-11-20 10:50:42,510 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 10:50:44,461 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.43 vs. limit=12.0 2023-11-20 10:50:54,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1056646.6666666667, ans=0.1 2023-11-20 10:50:56,676 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 158500 2023-11-20 10:51:07,662 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 2200, loss[loss=0.1006, simple_loss=0.1257, pruned_loss=0.02913, audio_tagging_loss=0.008638, over 15794.00 frames. ], tot_loss[loss=0.07874, simple_loss=0.09932, pruned_loss=0.01903, audio_tagging_loss=0.01005, over 3035936.01 frames. ], batch size: 55, lr: 4.99e-03, grad_scale: 32.0 2023-11-20 10:51:19,260 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.870e+01 8.263e+01 8.832e+01 9.449e+01 1.423e+02, threshold=1.766e+02, percent-clipped=0.0 2023-11-20 10:51:23,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1056780.0, ans=0.0 2023-11-20 10:51:28,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1056780.0, ans=0.0 2023-11-20 10:51:40,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1056846.6666666667, ans=0.0 2023-11-20 10:51:44,813 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=16.46 vs. limit=15.0 2023-11-20 10:51:45,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1056913.3333333333, ans=0.125 2023-11-20 10:51:47,315 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.74 vs. limit=10.0 2023-11-20 10:51:49,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1056913.3333333333, ans=0.125 2023-11-20 10:51:54,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1056913.3333333333, ans=0.0 2023-11-20 10:52:00,247 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 158550 2023-11-20 10:52:04,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1056980.0, ans=0.125 2023-11-20 10:52:12,224 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 2250, loss[loss=0.1009, simple_loss=0.1299, pruned_loss=0.02851, audio_tagging_loss=0.007469, over 15411.00 frames. ], tot_loss[loss=0.07901, simple_loss=0.09956, pruned_loss=0.01915, audio_tagging_loss=0.01007, over 3040326.19 frames. ], batch size: 55, lr: 4.99e-03, grad_scale: 8.0 2023-11-20 10:52:16,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1057046.6666666667, ans=0.125 2023-11-20 10:52:17,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1057046.6666666667, ans=0.0 2023-11-20 10:52:37,135 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.52 vs. limit=15.0 2023-11-20 10:52:39,661 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.00 vs. limit=6.0 2023-11-20 10:52:41,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1057180.0, ans=0.125 2023-11-20 10:52:49,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1057180.0, ans=0.2 2023-11-20 10:52:53,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1057246.6666666667, ans=0.125 2023-11-20 10:52:55,223 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.10 vs. limit=15.0 2023-11-20 10:53:01,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1057246.6666666667, ans=0.125 2023-11-20 10:53:05,852 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 158600 2023-11-20 10:53:15,010 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 10:53:17,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1057380.0, ans=0.125 2023-11-20 10:53:18,537 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 2300, loss[loss=0.09866, simple_loss=0.127, pruned_loss=0.02576, audio_tagging_loss=0.009422, over 16409.00 frames. ], tot_loss[loss=0.07971, simple_loss=0.1003, pruned_loss=0.01941, audio_tagging_loss=0.01014, over 3041671.94 frames. ], batch size: 58, lr: 4.99e-03, grad_scale: 8.0 2023-11-20 10:53:31,779 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.567e+01 8.157e+01 8.586e+01 9.219e+01 1.150e+02, threshold=1.717e+02, percent-clipped=0.0 2023-11-20 10:53:42,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1057446.6666666667, ans=0.1 2023-11-20 10:53:44,430 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.95 vs. limit=12.0 2023-11-20 10:53:45,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1057513.3333333333, ans=0.0 2023-11-20 10:53:55,872 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.09 vs. limit=15.0 2023-11-20 10:54:07,096 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.61 vs. limit=22.5 2023-11-20 10:54:13,219 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 158650 2023-11-20 10:54:16,833 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 10:54:20,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1057646.6666666667, ans=0.125 2023-11-20 10:54:24,132 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 2350, loss[loss=0.08709, simple_loss=0.1109, pruned_loss=0.02043, audio_tagging_loss=0.01121, over 15585.00 frames. ], tot_loss[loss=0.07982, simple_loss=0.1003, pruned_loss=0.01945, audio_tagging_loss=0.01021, over 3039591.40 frames. ], batch size: 55, lr: 4.99e-03, grad_scale: 8.0 2023-11-20 10:54:53,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1057846.6666666667, ans=0.125 2023-11-20 10:54:58,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=1057846.6666666667, ans=0.0 2023-11-20 10:54:59,273 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.19 vs. limit=22.5 2023-11-20 10:55:18,061 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 158700 2023-11-20 10:55:29,608 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 2400, loss[loss=0.06123, simple_loss=0.07485, pruned_loss=0.01313, audio_tagging_loss=0.01067, over 13944.00 frames. ], tot_loss[loss=0.08024, simple_loss=0.1007, pruned_loss=0.01953, audio_tagging_loss=0.01034, over 3035586.37 frames. ], batch size: 53, lr: 4.99e-03, grad_scale: 16.0 2023-11-20 10:55:42,633 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.833e+01 8.027e+01 8.593e+01 9.391e+01 2.644e+02, threshold=1.719e+02, percent-clipped=1.0 2023-11-20 10:56:22,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1058313.3333333333, ans=0.125 2023-11-20 10:56:23,282 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 158750 2023-11-20 10:56:25,073 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.29 vs. limit=15.0 2023-11-20 10:56:35,175 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 2450, loss[loss=0.07228, simple_loss=0.08209, pruned_loss=0.01844, audio_tagging_loss=0.01279, over 14675.00 frames. ], tot_loss[loss=0.07999, simple_loss=0.1005, pruned_loss=0.01942, audio_tagging_loss=0.01031, over 3039046.08 frames. ], batch size: 56, lr: 4.98e-03, grad_scale: 16.0 2023-11-20 10:57:23,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1058580.0, ans=0.0 2023-11-20 10:57:29,281 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 158800 2023-11-20 10:57:41,055 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 2500, loss[loss=0.07372, simple_loss=0.09911, pruned_loss=0.01568, audio_tagging_loss=0.008483, over 15787.00 frames. ], tot_loss[loss=0.0795, simple_loss=0.09984, pruned_loss=0.0193, audio_tagging_loss=0.01028, over 3040856.84 frames. ], batch size: 59, lr: 4.98e-03, grad_scale: 16.0 2023-11-20 10:57:45,499 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=11.22 vs. limit=12.0 2023-11-20 10:57:53,999 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.038e+01 8.094e+01 8.721e+01 9.744e+01 1.305e+02, threshold=1.744e+02, percent-clipped=0.0 2023-11-20 10:58:34,500 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 158850 2023-11-20 10:58:46,277 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 2550, loss[loss=0.09537, simple_loss=0.1182, pruned_loss=0.02537, audio_tagging_loss=0.0109, over 16016.00 frames. ], tot_loss[loss=0.07989, simple_loss=0.1002, pruned_loss=0.01954, audio_tagging_loss=0.01022, over 3036338.66 frames. ], batch size: 59, lr: 4.98e-03, grad_scale: 16.0 2023-11-20 10:58:47,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1059046.6666666667, ans=0.07 2023-11-20 10:58:59,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1059113.3333333333, ans=0.125 2023-11-20 10:59:01,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1059113.3333333333, ans=0.125 2023-11-20 10:59:19,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1059180.0, ans=0.0 2023-11-20 10:59:24,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1059246.6666666667, ans=0.0 2023-11-20 10:59:28,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=1059246.6666666667, ans=0.0 2023-11-20 10:59:28,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1059246.6666666667, ans=0.125 2023-11-20 10:59:31,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=1059246.6666666667, ans=15.0 2023-11-20 10:59:39,871 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 158900 2023-11-20 10:59:40,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1059313.3333333333, ans=0.1 2023-11-20 10:59:42,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1059313.3333333333, ans=0.025 2023-11-20 10:59:47,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1059313.3333333333, ans=0.0 2023-11-20 10:59:51,248 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 2600, loss[loss=0.1023, simple_loss=0.1305, pruned_loss=0.0316, audio_tagging_loss=0.005407, over 15646.00 frames. ], tot_loss[loss=0.07937, simple_loss=0.09959, pruned_loss=0.01951, audio_tagging_loss=0.01007, over 3043789.98 frames. ], batch size: 56, lr: 4.98e-03, grad_scale: 16.0 2023-11-20 11:00:02,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1059380.0, ans=0.1 2023-11-20 11:00:04,940 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.842e+01 8.545e+01 8.985e+01 9.606e+01 1.560e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-20 11:00:23,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1059513.3333333333, ans=0.0 2023-11-20 11:00:27,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1059513.3333333333, ans=0.1 2023-11-20 11:00:40,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1059580.0, ans=0.0 2023-11-20 11:00:41,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1059580.0, ans=0.125 2023-11-20 11:00:43,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1059646.6666666667, ans=0.125 2023-11-20 11:00:44,933 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 158950 2023-11-20 11:00:57,134 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 2650, loss[loss=0.06276, simple_loss=0.07792, pruned_loss=0.01177, audio_tagging_loss=0.01203, over 14586.00 frames. ], tot_loss[loss=0.07943, simple_loss=0.1001, pruned_loss=0.01949, audio_tagging_loss=0.009909, over 3044071.72 frames. ], batch size: 56, lr: 4.98e-03, grad_scale: 16.0 2023-11-20 11:01:01,151 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 11:01:04,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1059713.3333333333, ans=0.125 2023-11-20 11:01:13,480 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.08 vs. limit=10.0 2023-11-20 11:01:20,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1059780.0, ans=0.0 2023-11-20 11:01:21,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1059846.6666666667, ans=0.0 2023-11-20 11:01:50,652 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 159000 2023-11-20 11:01:58,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1059980.0, ans=0.2 2023-11-20 11:01:59,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1059980.0, ans=0.0 2023-11-20 11:02:02,629 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 2700, loss[loss=0.06613, simple_loss=0.08461, pruned_loss=0.01494, audio_tagging_loss=0.008885, over 15004.00 frames. ], tot_loss[loss=0.07937, simple_loss=0.09997, pruned_loss=0.01948, audio_tagging_loss=0.009909, over 3041793.72 frames. ], batch size: 56, lr: 4.98e-03, grad_scale: 16.0 2023-11-20 11:02:09,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1060046.6666666667, ans=0.125 2023-11-20 11:02:09,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1060046.6666666667, ans=0.125 2023-11-20 11:02:13,709 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.17 vs. limit=8.0 2023-11-20 11:02:15,852 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.786e+01 7.991e+01 8.664e+01 9.430e+01 1.129e+02, threshold=1.733e+02, percent-clipped=0.0 2023-11-20 11:02:25,357 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.16 vs. limit=12.0 2023-11-20 11:02:38,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=1060180.0, ans=0.07 2023-11-20 11:02:56,778 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 159050 2023-11-20 11:03:08,515 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 2750, loss[loss=0.06441, simple_loss=0.08446, pruned_loss=0.01385, audio_tagging_loss=0.008329, over 15268.00 frames. ], tot_loss[loss=0.07928, simple_loss=0.09974, pruned_loss=0.01945, audio_tagging_loss=0.009965, over 3046217.97 frames. ], batch size: 57, lr: 4.98e-03, grad_scale: 16.0 2023-11-20 11:03:10,529 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=23.02 vs. limit=15.0 2023-11-20 11:03:16,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1060380.0, ans=0.125 2023-11-20 11:03:25,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1060446.6666666667, ans=0.125 2023-11-20 11:03:34,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1060513.3333333333, ans=0.2 2023-11-20 11:04:01,976 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 159100 2023-11-20 11:04:04,432 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 11:04:07,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1060646.6666666667, ans=0.1 2023-11-20 11:04:11,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1060646.6666666667, ans=0.125 2023-11-20 11:04:13,751 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 2800, loss[loss=0.06328, simple_loss=0.06971, pruned_loss=0.01585, audio_tagging_loss=0.01258, over 14721.00 frames. ], tot_loss[loss=0.07854, simple_loss=0.09859, pruned_loss=0.0192, audio_tagging_loss=0.01004, over 3041319.64 frames. ], batch size: 58, lr: 4.98e-03, grad_scale: 32.0 2023-11-20 11:04:17,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1060713.3333333333, ans=0.0 2023-11-20 11:04:26,644 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.546e+01 8.183e+01 8.895e+01 9.590e+01 1.282e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-20 11:04:52,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1060913.3333333333, ans=0.1 2023-11-20 11:04:54,406 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1060913.3333333333, ans=0.125 2023-11-20 11:05:07,138 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 159150 2023-11-20 11:05:18,755 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 2850, loss[loss=0.06173, simple_loss=0.08424, pruned_loss=0.01204, audio_tagging_loss=0.007569, over 16637.00 frames. ], tot_loss[loss=0.07827, simple_loss=0.09836, pruned_loss=0.01913, audio_tagging_loss=0.009957, over 3044487.55 frames. ], batch size: 61, lr: 4.98e-03, grad_scale: 32.0 2023-11-20 11:05:29,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1061046.6666666667, ans=0.0 2023-11-20 11:05:30,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1061113.3333333333, ans=0.125 2023-11-20 11:05:56,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1061246.6666666667, ans=0.0 2023-11-20 11:06:12,317 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 159200 2023-11-20 11:06:17,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=1061313.3333333333, ans=10.0 2023-11-20 11:06:24,443 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 2900, loss[loss=0.1256, simple_loss=0.1552, pruned_loss=0.03802, audio_tagging_loss=0.009938, over 15622.00 frames. ], tot_loss[loss=0.07918, simple_loss=0.09949, pruned_loss=0.01941, audio_tagging_loss=0.01003, over 3044605.08 frames. ], batch size: 55, lr: 4.98e-03, grad_scale: 32.0 2023-11-20 11:06:37,439 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.449e+01 8.069e+01 8.700e+01 9.440e+01 1.245e+02, threshold=1.740e+02, percent-clipped=0.0 2023-11-20 11:06:54,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1061513.3333333333, ans=0.125 2023-11-20 11:07:03,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1061580.0, ans=0.0 2023-11-20 11:07:06,501 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.71 vs. limit=12.0 2023-11-20 11:07:09,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1061580.0, ans=0.1 2023-11-20 11:07:16,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1061646.6666666667, ans=0.0 2023-11-20 11:07:17,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1061646.6666666667, ans=0.1 2023-11-20 11:07:18,177 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 159250 2023-11-20 11:07:29,979 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 2950, loss[loss=0.0865, simple_loss=0.1026, pruned_loss=0.02146, audio_tagging_loss=0.01372, over 14451.00 frames. ], tot_loss[loss=0.07975, simple_loss=0.1003, pruned_loss=0.01962, audio_tagging_loss=0.009997, over 3038246.83 frames. ], batch size: 56, lr: 4.98e-03, grad_scale: 32.0 2023-11-20 11:08:11,324 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.67 vs. limit=15.0 2023-11-20 11:08:16,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1061913.3333333333, ans=0.1 2023-11-20 11:08:17,819 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.63 vs. limit=15.0 2023-11-20 11:08:22,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1061980.0, ans=0.0 2023-11-20 11:08:23,517 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 159300 2023-11-20 11:08:32,688 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.05 vs. limit=15.0 2023-11-20 11:08:34,573 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 3000, loss[loss=0.06048, simple_loss=0.07579, pruned_loss=0.01046, audio_tagging_loss=0.01213, over 14956.00 frames. ], tot_loss[loss=0.07975, simple_loss=0.1004, pruned_loss=0.01955, audio_tagging_loss=0.01001, over 3038418.66 frames. ], batch size: 55, lr: 4.98e-03, grad_scale: 32.0 2023-11-20 11:08:34,574 INFO [train_asr.py:1285] (1/4) Computing validation loss 2023-11-20 11:08:52,503 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([6.4912, 6.3317, 6.1877, 6.2190], device='cuda:1') 2023-11-20 11:09:14,357 INFO [train_asr.py:1294] (1/4) Epoch 14, validation: loss=0.06185, simple_loss=0.05368, pruned_loss=0.005702, audio_tagging_loss=0.02931, over 4681554.00 frames. 2023-11-20 11:09:14,358 INFO [train_asr.py:1295] (1/4) Maximum memory allocated so far is 26082MB 2023-11-20 11:09:24,179 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.36 vs. limit=22.5 2023-11-20 11:09:27,210 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.003e+01 8.131e+01 8.854e+01 9.762e+01 1.260e+02, threshold=1.771e+02, percent-clipped=0.0 2023-11-20 11:09:41,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1062180.0, ans=0.125 2023-11-20 11:09:49,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1062180.0, ans=0.0 2023-11-20 11:10:01,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1062246.6666666667, ans=0.125 2023-11-20 11:10:07,380 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 159350 2023-11-20 11:10:19,211 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 3050, loss[loss=0.07701, simple_loss=0.1008, pruned_loss=0.01464, audio_tagging_loss=0.01199, over 14670.00 frames. ], tot_loss[loss=0.07964, simple_loss=0.1001, pruned_loss=0.01945, audio_tagging_loss=0.01012, over 3043863.22 frames. ], batch size: 53, lr: 4.97e-03, grad_scale: 32.0 2023-11-20 11:10:25,012 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.90 vs. limit=22.5 2023-11-20 11:10:56,756 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 11:10:56,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=1062580.0, ans=0.125 2023-11-20 11:11:12,494 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 159400 2023-11-20 11:11:14,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1062646.6666666667, ans=0.125 2023-11-20 11:11:23,999 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 3100, loss[loss=0.09174, simple_loss=0.1133, pruned_loss=0.02893, audio_tagging_loss=0.006151, over 14788.00 frames. ], tot_loss[loss=0.08046, simple_loss=0.101, pruned_loss=0.01976, audio_tagging_loss=0.0102, over 3044908.95 frames. ], batch size: 53, lr: 4.97e-03, grad_scale: 32.0 2023-11-20 11:11:37,683 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.763e+01 7.933e+01 8.636e+01 9.301e+01 1.154e+02, threshold=1.727e+02, percent-clipped=0.0 2023-11-20 11:11:46,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1062780.0, ans=0.1 2023-11-20 11:12:07,170 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.85 vs. limit=12.0 2023-11-20 11:12:11,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1062913.3333333333, ans=0.125 2023-11-20 11:12:16,263 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.25 vs. limit=6.0 2023-11-20 11:12:18,307 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 159450 2023-11-20 11:12:29,857 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 3150, loss[loss=0.08713, simple_loss=0.121, pruned_loss=0.01896, audio_tagging_loss=0.00769, over 15949.00 frames. ], tot_loss[loss=0.08088, simple_loss=0.1014, pruned_loss=0.01991, audio_tagging_loss=0.01026, over 3053132.61 frames. ], batch size: 56, lr: 4.97e-03, grad_scale: 32.0 2023-11-20 11:12:30,501 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.93 vs. limit=15.0 2023-11-20 11:12:35,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1063046.6666666667, ans=0.125 2023-11-20 11:12:35,710 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1063046.6666666667, ans=0.125 2023-11-20 11:12:45,265 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.34 vs. limit=15.0 2023-11-20 11:12:49,072 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.84 vs. limit=10.0 2023-11-20 11:12:57,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1063180.0, ans=0.0 2023-11-20 11:13:24,108 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 159500 2023-11-20 11:13:35,643 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 3200, loss[loss=0.09421, simple_loss=0.1156, pruned_loss=0.02864, audio_tagging_loss=0.007758, over 15724.00 frames. ], tot_loss[loss=0.08147, simple_loss=0.1023, pruned_loss=0.0201, audio_tagging_loss=0.01024, over 3052551.54 frames. ], batch size: 61, lr: 4.97e-03, grad_scale: 32.0 2023-11-20 11:13:35,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1063380.0, ans=0.125 2023-11-20 11:13:36,176 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.75 vs. limit=15.0 2023-11-20 11:13:38,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1063380.0, ans=0.0 2023-11-20 11:13:47,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1063446.6666666667, ans=0.05 2023-11-20 11:13:47,893 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.818e+01 8.344e+01 9.152e+01 9.986e+01 1.362e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-20 11:13:53,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1063446.6666666667, ans=0.2 2023-11-20 11:14:04,112 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.18 vs. limit=15.0 2023-11-20 11:14:11,548 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 11:14:29,405 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 159550 2023-11-20 11:14:29,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1063646.6666666667, ans=0.0 2023-11-20 11:14:31,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1063646.6666666667, ans=0.1 2023-11-20 11:14:40,178 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 3250, loss[loss=0.07852, simple_loss=0.09556, pruned_loss=0.01917, audio_tagging_loss=0.01157, over 15973.00 frames. ], tot_loss[loss=0.08099, simple_loss=0.1016, pruned_loss=0.01991, audio_tagging_loss=0.01027, over 3058452.26 frames. ], batch size: 61, lr: 4.97e-03, grad_scale: 32.0 2023-11-20 11:14:59,352 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 11:15:04,015 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.19 vs. limit=15.0 2023-11-20 11:15:28,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1063913.3333333333, ans=0.125 2023-11-20 11:15:34,349 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 159600 2023-11-20 11:15:37,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1063980.0, ans=0.0 2023-11-20 11:15:39,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1063980.0, ans=0.125 2023-11-20 11:15:45,698 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 3300, loss[loss=0.07494, simple_loss=0.0937, pruned_loss=0.01724, audio_tagging_loss=0.01085, over 14552.00 frames. ], tot_loss[loss=0.08047, simple_loss=0.1008, pruned_loss=0.01971, audio_tagging_loss=0.01036, over 3057878.17 frames. ], batch size: 55, lr: 4.97e-03, grad_scale: 32.0 2023-11-20 11:15:52,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1064046.6666666667, ans=0.125 2023-11-20 11:15:58,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1064113.3333333333, ans=0.125 2023-11-20 11:15:58,819 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.658e+01 7.969e+01 8.807e+01 9.518e+01 1.189e+02, threshold=1.761e+02, percent-clipped=0.0 2023-11-20 11:16:17,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1064180.0, ans=0.0 2023-11-20 11:16:27,092 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.75 vs. limit=6.0 2023-11-20 11:16:39,559 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 159650 2023-11-20 11:16:42,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1064313.3333333333, ans=0.05 2023-11-20 11:16:47,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1064313.3333333333, ans=0.0 2023-11-20 11:16:51,798 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 3350, loss[loss=0.05665, simple_loss=0.06774, pruned_loss=0.01222, audio_tagging_loss=0.01056, over 15357.00 frames. ], tot_loss[loss=0.08041, simple_loss=0.101, pruned_loss=0.01971, audio_tagging_loss=0.01021, over 3054645.79 frames. ], batch size: 60, lr: 4.97e-03, grad_scale: 32.0 2023-11-20 11:16:55,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1064380.0, ans=0.2 2023-11-20 11:17:45,116 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 159700 2023-11-20 11:17:51,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=1064646.6666666667, ans=0.125 2023-11-20 11:17:51,997 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.10 vs. limit=22.5 2023-11-20 11:17:55,463 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.08 vs. limit=10.0 2023-11-20 11:17:56,233 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 3400, loss[loss=0.05925, simple_loss=0.05945, pruned_loss=0.01327, audio_tagging_loss=0.01626, over 14755.00 frames. ], tot_loss[loss=0.08014, simple_loss=0.1006, pruned_loss=0.01969, audio_tagging_loss=0.01015, over 3055388.23 frames. ], batch size: 59, lr: 4.97e-03, grad_scale: 16.0 2023-11-20 11:18:00,228 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=1064713.3333333333, ans=0.125 2023-11-20 11:18:10,582 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.407e+01 8.209e+01 8.921e+01 9.607e+01 2.745e+02, threshold=1.784e+02, percent-clipped=1.0 2023-11-20 11:18:17,340 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.50 vs. limit=15.0 2023-11-20 11:18:31,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1064846.6666666667, ans=0.0 2023-11-20 11:18:43,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1064913.3333333333, ans=0.125 2023-11-20 11:18:49,719 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 159750 2023-11-20 11:18:56,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1064980.0, ans=0.125 2023-11-20 11:19:01,441 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 3450, loss[loss=0.05628, simple_loss=0.06185, pruned_loss=0.01417, audio_tagging_loss=0.01119, over 14666.00 frames. ], tot_loss[loss=0.0801, simple_loss=0.1009, pruned_loss=0.01962, audio_tagging_loss=0.01003, over 3055237.84 frames. ], batch size: 57, lr: 4.97e-03, grad_scale: 8.0 2023-11-20 11:19:08,938 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.23 vs. limit=6.0 2023-11-20 11:19:09,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1065046.6666666667, ans=0.125 2023-11-20 11:19:46,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1065246.6666666667, ans=0.125 2023-11-20 11:19:50,350 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.38 vs. limit=15.0 2023-11-20 11:19:54,730 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 159800 2023-11-20 11:19:56,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1065313.3333333333, ans=0.125 2023-11-20 11:19:58,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1065313.3333333333, ans=0.125 2023-11-20 11:20:07,000 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 3500, loss[loss=0.07845, simple_loss=0.09434, pruned_loss=0.02013, audio_tagging_loss=0.01116, over 15867.00 frames. ], tot_loss[loss=0.07914, simple_loss=0.0995, pruned_loss=0.01943, audio_tagging_loss=0.009959, over 3051578.94 frames. ], batch size: 59, lr: 4.97e-03, grad_scale: 8.0 2023-11-20 11:20:12,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1065380.0, ans=0.0 2023-11-20 11:20:13,391 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.50 vs. limit=15.0 2023-11-20 11:20:14,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1065380.0, ans=0.125 2023-11-20 11:20:22,597 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.721e+01 8.437e+01 9.163e+01 1.016e+02 1.154e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-20 11:20:40,561 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 11:20:51,679 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.81 vs. limit=15.0 2023-11-20 11:20:52,579 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1065580.0, ans=0.1 2023-11-20 11:20:55,084 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1065580.0, ans=0.125 2023-11-20 11:20:58,888 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.34 vs. limit=15.0 2023-11-20 11:21:00,588 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 159850 2023-11-20 11:21:08,561 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.19 vs. limit=22.5 2023-11-20 11:21:11,677 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 3550, loss[loss=0.1017, simple_loss=0.1306, pruned_loss=0.02871, audio_tagging_loss=0.007674, over 16406.00 frames. ], tot_loss[loss=0.07923, simple_loss=0.09971, pruned_loss=0.01946, audio_tagging_loss=0.009919, over 3048741.42 frames. ], batch size: 62, lr: 4.97e-03, grad_scale: 8.0 2023-11-20 11:21:12,370 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.81 vs. limit=6.0 2023-11-20 11:21:15,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1065713.3333333333, ans=0.125 2023-11-20 11:21:24,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1065780.0, ans=0.125 2023-11-20 11:21:42,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1065846.6666666667, ans=0.1 2023-11-20 11:21:45,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1065846.6666666667, ans=0.0 2023-11-20 11:22:03,984 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.49 vs. limit=22.5 2023-11-20 11:22:04,569 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 159900 2023-11-20 11:22:04,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1065980.0, ans=0.125 2023-11-20 11:22:16,472 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 3600, loss[loss=0.105, simple_loss=0.1302, pruned_loss=0.02922, audio_tagging_loss=0.01063, over 14989.00 frames. ], tot_loss[loss=0.07921, simple_loss=0.09999, pruned_loss=0.01943, audio_tagging_loss=0.009783, over 3043417.62 frames. ], batch size: 56, lr: 4.97e-03, grad_scale: 16.0 2023-11-20 11:22:30,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=1066113.3333333333, ans=0.2 2023-11-20 11:22:31,771 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.755e+01 8.370e+01 9.104e+01 9.925e+01 1.510e+02, threshold=1.821e+02, percent-clipped=0.0 2023-11-20 11:22:32,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1066113.3333333333, ans=0.125 2023-11-20 11:22:41,460 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.24 vs. limit=22.5 2023-11-20 11:22:42,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1066180.0, ans=0.0 2023-11-20 11:22:46,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1066180.0, ans=0.1 2023-11-20 11:23:07,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1066313.3333333333, ans=0.125 2023-11-20 11:23:09,945 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 159950 2023-11-20 11:23:15,612 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.61 vs. limit=15.0 2023-11-20 11:23:21,969 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 3650, loss[loss=0.06921, simple_loss=0.08591, pruned_loss=0.01707, audio_tagging_loss=0.009191, over 14218.00 frames. ], tot_loss[loss=0.07945, simple_loss=0.1003, pruned_loss=0.01951, audio_tagging_loss=0.009808, over 3048601.52 frames. ], batch size: 53, lr: 4.97e-03, grad_scale: 16.0 2023-11-20 11:23:22,216 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 11:23:26,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1066380.0, ans=0.0 2023-11-20 11:23:32,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1066380.0, ans=0.0 2023-11-20 11:23:32,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1066380.0, ans=0.125 2023-11-20 11:23:33,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1066380.0, ans=0.07 2023-11-20 11:23:37,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1066446.6666666667, ans=0.125 2023-11-20 11:23:55,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1066513.3333333333, ans=0.0 2023-11-20 11:24:13,063 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.06 vs. limit=6.0 2023-11-20 11:24:16,556 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 160000 2023-11-20 11:24:31,852 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 3700, loss[loss=0.09231, simple_loss=0.1179, pruned_loss=0.02403, audio_tagging_loss=0.009322, over 15557.00 frames. ], tot_loss[loss=0.07952, simple_loss=0.1003, pruned_loss=0.01954, audio_tagging_loss=0.00984, over 3053436.89 frames. ], batch size: 59, lr: 4.96e-03, grad_scale: 16.0 2023-11-20 11:24:47,127 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.803e+01 8.291e+01 8.874e+01 9.793e+01 1.503e+02, threshold=1.775e+02, percent-clipped=0.0 2023-11-20 11:24:48,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1066780.0, ans=0.125 2023-11-20 11:25:05,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1066846.6666666667, ans=0.125 2023-11-20 11:25:25,351 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 160050 2023-11-20 11:25:29,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1066980.0, ans=0.125 2023-11-20 11:25:33,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1066980.0, ans=0.0 2023-11-20 11:25:36,917 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 3750, loss[loss=0.08535, simple_loss=0.1069, pruned_loss=0.02122, audio_tagging_loss=0.01069, over 15361.00 frames. ], tot_loss[loss=0.08023, simple_loss=0.1013, pruned_loss=0.01983, audio_tagging_loss=0.009768, over 3046643.84 frames. ], batch size: 56, lr: 4.96e-03, grad_scale: 16.0 2023-11-20 11:26:22,707 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 11:26:27,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1067313.3333333333, ans=0.2 2023-11-20 11:26:30,172 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 160100 2023-11-20 11:26:41,892 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 3800, loss[loss=0.08703, simple_loss=0.1057, pruned_loss=0.02218, audio_tagging_loss=0.01202, over 15119.00 frames. ], tot_loss[loss=0.08043, simple_loss=0.1017, pruned_loss=0.01982, audio_tagging_loss=0.009773, over 3054431.56 frames. ], batch size: 54, lr: 4.96e-03, grad_scale: 16.0 2023-11-20 11:26:51,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1067380.0, ans=0.1 2023-11-20 11:26:52,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1067380.0, ans=0.125 2023-11-20 11:26:57,803 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.898e+01 8.245e+01 9.019e+01 9.669e+01 1.480e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-20 11:26:59,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1067446.6666666667, ans=0.125 2023-11-20 11:27:25,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1067580.0, ans=0.1 2023-11-20 11:27:30,121 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.13 vs. limit=15.0 2023-11-20 11:27:36,399 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 160150 2023-11-20 11:27:39,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1067646.6666666667, ans=0.1 2023-11-20 11:27:48,122 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 3850, loss[loss=0.08028, simple_loss=0.1028, pruned_loss=0.02194, audio_tagging_loss=0.006923, over 15905.00 frames. ], tot_loss[loss=0.08099, simple_loss=0.1022, pruned_loss=0.01998, audio_tagging_loss=0.009922, over 3049426.53 frames. ], batch size: 58, lr: 4.96e-03, grad_scale: 16.0 2023-11-20 11:28:03,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1067780.0, ans=0.125 2023-11-20 11:28:12,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1067846.6666666667, ans=0.125 2023-11-20 11:28:36,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1067913.3333333333, ans=0.0 2023-11-20 11:28:41,420 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 160200 2023-11-20 11:28:52,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1068046.6666666667, ans=0.0 2023-11-20 11:28:53,397 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 3900, loss[loss=0.07215, simple_loss=0.09503, pruned_loss=0.01589, audio_tagging_loss=0.008747, over 15488.00 frames. ], tot_loss[loss=0.08107, simple_loss=0.1022, pruned_loss=0.02006, audio_tagging_loss=0.009906, over 3047571.76 frames. ], batch size: 57, lr: 4.96e-03, grad_scale: 16.0 2023-11-20 11:29:08,875 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.848e+01 8.101e+01 8.668e+01 9.712e+01 1.300e+02, threshold=1.734e+02, percent-clipped=0.0 2023-11-20 11:29:12,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1068113.3333333333, ans=0.0 2023-11-20 11:29:18,774 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.68 vs. limit=12.0 2023-11-20 11:29:19,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1068180.0, ans=0.0 2023-11-20 11:29:19,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1068180.0, ans=0.1 2023-11-20 11:29:33,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1068246.6666666667, ans=0.5 2023-11-20 11:29:40,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=1068246.6666666667, ans=0.95 2023-11-20 11:29:46,769 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 160250 2023-11-20 11:29:46,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1068313.3333333333, ans=0.125 2023-11-20 11:29:53,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1068313.3333333333, ans=0.1 2023-11-20 11:29:58,758 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 3950, loss[loss=0.08372, simple_loss=0.1023, pruned_loss=0.02206, audio_tagging_loss=0.01052, over 14259.00 frames. ], tot_loss[loss=0.0806, simple_loss=0.1015, pruned_loss=0.01987, audio_tagging_loss=0.01001, over 3044552.92 frames. ], batch size: 56, lr: 4.96e-03, grad_scale: 16.0 2023-11-20 11:30:02,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1068380.0, ans=0.2 2023-11-20 11:30:25,449 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.93 vs. limit=15.0 2023-11-20 11:30:38,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1068580.0, ans=0.0 2023-11-20 11:30:52,498 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 160300 2023-11-20 11:31:04,122 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 4000, loss[loss=0.08753, simple_loss=0.1071, pruned_loss=0.02199, audio_tagging_loss=0.01199, over 15589.00 frames. ], tot_loss[loss=0.08068, simple_loss=0.101, pruned_loss=0.01999, audio_tagging_loss=0.01019, over 3048811.99 frames. ], batch size: 58, lr: 4.96e-03, grad_scale: 32.0 2023-11-20 11:31:10,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1068713.3333333333, ans=0.1 2023-11-20 11:31:16,628 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.21 vs. limit=15.0 2023-11-20 11:31:20,249 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.967e+01 8.155e+01 8.816e+01 9.659e+01 1.219e+02, threshold=1.763e+02, percent-clipped=0.0 2023-11-20 11:31:36,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1068846.6666666667, ans=0.125 2023-11-20 11:31:57,386 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 160350 2023-11-20 11:32:04,655 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.76 vs. limit=12.0 2023-11-20 11:32:09,819 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 4050, loss[loss=0.1096, simple_loss=0.1432, pruned_loss=0.03191, audio_tagging_loss=0.006096, over 16419.00 frames. ], tot_loss[loss=0.08126, simple_loss=0.102, pruned_loss=0.02016, audio_tagging_loss=0.0101, over 3045222.48 frames. ], batch size: 56, lr: 4.96e-03, grad_scale: 16.0 2023-11-20 11:32:13,601 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 11:32:23,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1069113.3333333333, ans=0.0 2023-11-20 11:33:02,901 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 160400 2023-11-20 11:33:14,082 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 4100, loss[loss=0.08225, simple_loss=0.1057, pruned_loss=0.02034, audio_tagging_loss=0.009043, over 16142.00 frames. ], tot_loss[loss=0.08183, simple_loss=0.1029, pruned_loss=0.02026, audio_tagging_loss=0.0101, over 3055589.64 frames. ], batch size: 61, lr: 4.96e-03, grad_scale: 16.0 2023-11-20 11:33:24,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1069380.0, ans=0.125 2023-11-20 11:33:31,209 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.500e+01 8.137e+01 8.868e+01 9.537e+01 1.552e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-20 11:33:36,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=1069446.6666666667, ans=0.025 2023-11-20 11:33:36,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1069446.6666666667, ans=0.2 2023-11-20 11:33:41,531 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 11:33:51,108 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.66 vs. limit=15.0 2023-11-20 11:33:56,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1069580.0, ans=0.125 2023-11-20 11:34:07,299 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 160450 2023-11-20 11:34:19,038 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 4150, loss[loss=0.06595, simple_loss=0.08229, pruned_loss=0.01342, audio_tagging_loss=0.01139, over 15447.00 frames. ], tot_loss[loss=0.08168, simple_loss=0.1029, pruned_loss=0.02026, audio_tagging_loss=0.009962, over 3049143.35 frames. ], batch size: 58, lr: 4.96e-03, grad_scale: 16.0 2023-11-20 11:34:27,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1069713.3333333333, ans=0.0 2023-11-20 11:34:36,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1069780.0, ans=0.2 2023-11-20 11:34:38,610 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.01 vs. limit=15.0 2023-11-20 11:34:42,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1069780.0, ans=0.125 2023-11-20 11:34:56,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1069913.3333333333, ans=0.1 2023-11-20 11:35:06,442 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 11:35:11,532 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 160500 2023-11-20 11:35:17,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1069980.0, ans=0.125 2023-11-20 11:35:22,595 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 4200, loss[loss=0.09163, simple_loss=0.126, pruned_loss=0.01967, audio_tagging_loss=0.008979, over 14941.00 frames. ], tot_loss[loss=0.08148, simple_loss=0.1028, pruned_loss=0.02021, audio_tagging_loss=0.009869, over 3047127.34 frames. ], batch size: 55, lr: 4.96e-03, grad_scale: 16.0 2023-11-20 11:35:24,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1070046.6666666667, ans=0.125 2023-11-20 11:35:30,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1070046.6666666667, ans=0.0 2023-11-20 11:35:40,158 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.812e+01 8.053e+01 8.867e+01 9.480e+01 1.332e+02, threshold=1.773e+02, percent-clipped=0.0 2023-11-20 11:35:47,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1070113.3333333333, ans=0.0 2023-11-20 11:36:03,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1070246.6666666667, ans=0.125 2023-11-20 11:36:16,902 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 160550 2023-11-20 11:36:21,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=1070313.3333333333, ans=0.025 2023-11-20 11:36:28,434 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 4250, loss[loss=0.1, simple_loss=0.1341, pruned_loss=0.02476, audio_tagging_loss=0.008163, over 15014.00 frames. ], tot_loss[loss=0.081, simple_loss=0.1018, pruned_loss=0.02018, audio_tagging_loss=0.009942, over 3049602.53 frames. ], batch size: 57, lr: 4.96e-03, grad_scale: 16.0 2023-11-20 11:36:31,496 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.76 vs. limit=15.0 2023-11-20 11:36:33,370 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.00 vs. limit=15.0 2023-11-20 11:36:54,773 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 11:37:22,878 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 160600 2023-11-20 11:37:30,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1070646.6666666667, ans=0.125 2023-11-20 11:37:34,819 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 4300, loss[loss=0.08321, simple_loss=0.1135, pruned_loss=0.01875, audio_tagging_loss=0.007694, over 16230.00 frames. ], tot_loss[loss=0.08135, simple_loss=0.1025, pruned_loss=0.02029, audio_tagging_loss=0.009824, over 3056440.19 frames. ], batch size: 58, lr: 4.96e-03, grad_scale: 16.0 2023-11-20 11:37:42,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1070713.3333333333, ans=0.125 2023-11-20 11:37:43,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1070713.3333333333, ans=0.2 2023-11-20 11:37:50,512 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.072e+01 8.383e+01 9.324e+01 1.005e+02 1.336e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-20 11:38:10,782 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 11:38:28,433 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 160650 2023-11-20 11:38:28,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1070980.0, ans=0.125 2023-11-20 11:38:39,436 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 4350, loss[loss=0.07133, simple_loss=0.09665, pruned_loss=0.01308, audio_tagging_loss=0.009922, over 14221.00 frames. ], tot_loss[loss=0.08044, simple_loss=0.1017, pruned_loss=0.0198, audio_tagging_loss=0.009797, over 3050490.52 frames. ], batch size: 54, lr: 4.95e-03, grad_scale: 16.0 2023-11-20 11:38:56,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=1071113.3333333333, ans=0.05 2023-11-20 11:38:59,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1071113.3333333333, ans=0.125 2023-11-20 11:39:32,563 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 160700 2023-11-20 11:39:45,066 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 4400, loss[loss=0.124, simple_loss=0.1524, pruned_loss=0.03668, audio_tagging_loss=0.01114, over 14817.00 frames. ], tot_loss[loss=0.08015, simple_loss=0.101, pruned_loss=0.0198, audio_tagging_loss=0.009837, over 3046453.13 frames. ], batch size: 54, lr: 4.95e-03, grad_scale: 32.0 2023-11-20 11:39:53,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1071380.0, ans=0.125 2023-11-20 11:40:02,272 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.072e+01 8.159e+01 8.593e+01 9.338e+01 1.252e+02, threshold=1.719e+02, percent-clipped=0.0 2023-11-20 11:40:11,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1071513.3333333333, ans=0.1 2023-11-20 11:40:16,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1071513.3333333333, ans=0.1 2023-11-20 11:40:23,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1071580.0, ans=0.0 2023-11-20 11:40:28,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1071580.0, ans=0.125 2023-11-20 11:40:38,694 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 160750 2023-11-20 11:40:44,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1071646.6666666667, ans=0.0 2023-11-20 11:40:50,806 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 4450, loss[loss=0.05801, simple_loss=0.06677, pruned_loss=0.01506, audio_tagging_loss=0.00956, over 14773.00 frames. ], tot_loss[loss=0.0803, simple_loss=0.1012, pruned_loss=0.01988, audio_tagging_loss=0.009834, over 3039637.37 frames. ], batch size: 57, lr: 4.95e-03, grad_scale: 32.0 2023-11-20 11:41:29,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1071913.3333333333, ans=0.0 2023-11-20 11:41:32,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1071913.3333333333, ans=0.125 2023-11-20 11:41:37,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1071913.3333333333, ans=0.125 2023-11-20 11:41:40,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1071913.3333333333, ans=0.2 2023-11-20 11:41:44,360 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 160800 2023-11-20 11:41:56,223 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 4500, loss[loss=0.08222, simple_loss=0.106, pruned_loss=0.02038, audio_tagging_loss=0.008816, over 15642.00 frames. ], tot_loss[loss=0.08044, simple_loss=0.1018, pruned_loss=0.01984, audio_tagging_loss=0.009697, over 3042103.97 frames. ], batch size: 58, lr: 4.95e-03, grad_scale: 32.0 2023-11-20 11:42:01,921 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.34 vs. limit=15.0 2023-11-20 11:42:12,855 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.114e+01 8.604e+01 9.143e+01 9.908e+01 1.250e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-20 11:42:42,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1072246.6666666667, ans=0.125 2023-11-20 11:42:50,238 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 160850 2023-11-20 11:42:51,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1072313.3333333333, ans=0.125 2023-11-20 11:43:01,965 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 4550, loss[loss=0.06951, simple_loss=0.0863, pruned_loss=0.01498, audio_tagging_loss=0.01139, over 14991.00 frames. ], tot_loss[loss=0.07948, simple_loss=0.1004, pruned_loss=0.01954, audio_tagging_loss=0.00976, over 3034580.04 frames. ], batch size: 55, lr: 4.95e-03, grad_scale: 32.0 2023-11-20 11:43:38,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1072513.3333333333, ans=0.0 2023-11-20 11:43:39,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1072513.3333333333, ans=0.125 2023-11-20 11:43:53,053 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 11:43:55,614 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 160900 2023-11-20 11:43:58,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1072646.6666666667, ans=0.125 2023-11-20 11:44:08,141 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 4600, loss[loss=0.05455, simple_loss=0.07104, pruned_loss=0.007959, audio_tagging_loss=0.01107, over 15488.00 frames. ], tot_loss[loss=0.07863, simple_loss=0.09934, pruned_loss=0.01908, audio_tagging_loss=0.009877, over 3035756.06 frames. ], batch size: 59, lr: 4.95e-03, grad_scale: 32.0 2023-11-20 11:44:23,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1072780.0, ans=0.125 2023-11-20 11:44:24,877 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.602e+01 8.263e+01 8.597e+01 9.248e+01 1.165e+02, threshold=1.719e+02, percent-clipped=0.0 2023-11-20 11:44:34,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1072846.6666666667, ans=0.125 2023-11-20 11:44:53,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1072913.3333333333, ans=0.04949747468305833 2023-11-20 11:44:55,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1072913.3333333333, ans=0.125 2023-11-20 11:45:01,644 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 160950 2023-11-20 11:45:12,496 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 4650, loss[loss=0.06015, simple_loss=0.06905, pruned_loss=0.01082, audio_tagging_loss=0.0148, over 15462.00 frames. ], tot_loss[loss=0.0794, simple_loss=0.1, pruned_loss=0.01946, audio_tagging_loss=0.009931, over 3045602.37 frames. ], batch size: 58, lr: 4.95e-03, grad_scale: 32.0 2023-11-20 11:45:18,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1073046.6666666667, ans=0.0 2023-11-20 11:45:30,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1073113.3333333333, ans=0.125 2023-11-20 11:45:31,080 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.36 vs. limit=15.0 2023-11-20 11:45:55,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1073246.6666666667, ans=0.0 2023-11-20 11:45:58,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1073246.6666666667, ans=0.0 2023-11-20 11:46:05,349 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 161000 2023-11-20 11:46:17,336 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 4700, loss[loss=0.07313, simple_loss=0.09245, pruned_loss=0.01683, audio_tagging_loss=0.01007, over 15729.00 frames. ], tot_loss[loss=0.07999, simple_loss=0.1009, pruned_loss=0.01958, audio_tagging_loss=0.009957, over 3045911.33 frames. ], batch size: 59, lr: 4.95e-03, grad_scale: 32.0 2023-11-20 11:46:22,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1073380.0, ans=0.125 2023-11-20 11:46:34,690 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.595e+01 8.445e+01 9.172e+01 1.004e+02 1.426e+02, threshold=1.834e+02, percent-clipped=0.0 2023-11-20 11:46:35,138 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 11:46:53,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1073513.3333333333, ans=0.0 2023-11-20 11:46:56,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1073580.0, ans=0.1 2023-11-20 11:47:00,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1073580.0, ans=0.125 2023-11-20 11:47:04,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1073580.0, ans=0.125 2023-11-20 11:47:10,720 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 161050 2023-11-20 11:47:12,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1073646.6666666667, ans=0.125 2023-11-20 11:47:22,411 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 4750, loss[loss=0.08037, simple_loss=0.09247, pruned_loss=0.01943, audio_tagging_loss=0.01471, over 14997.00 frames. ], tot_loss[loss=0.08054, simple_loss=0.1016, pruned_loss=0.01975, audio_tagging_loss=0.009984, over 3041516.89 frames. ], batch size: 57, lr: 4.95e-03, grad_scale: 16.0 2023-11-20 11:47:36,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1073780.0, ans=0.0 2023-11-20 11:47:36,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1073780.0, ans=0.025 2023-11-20 11:47:53,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1073846.6666666667, ans=0.0 2023-11-20 11:48:15,291 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 161100 2023-11-20 11:48:27,013 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 4800, loss[loss=0.08202, simple_loss=0.1001, pruned_loss=0.01989, audio_tagging_loss=0.01209, over 14114.00 frames. ], tot_loss[loss=0.08097, simple_loss=0.1022, pruned_loss=0.0198, audio_tagging_loss=0.01009, over 3044200.44 frames. ], batch size: 53, lr: 4.95e-03, grad_scale: 16.0 2023-11-20 11:48:40,603 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.08 vs. limit=15.0 2023-11-20 11:48:45,974 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.153e+01 8.077e+01 9.114e+01 9.869e+01 1.463e+02, threshold=1.823e+02, percent-clipped=0.0 2023-11-20 11:48:59,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1074180.0, ans=0.1 2023-11-20 11:49:12,031 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.41 vs. limit=15.0 2023-11-20 11:49:19,963 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 161150 2023-11-20 11:49:27,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1074313.3333333333, ans=0.0 2023-11-20 11:49:31,582 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 4850, loss[loss=0.09457, simple_loss=0.1166, pruned_loss=0.02262, audio_tagging_loss=0.01367, over 14875.00 frames. ], tot_loss[loss=0.08122, simple_loss=0.1022, pruned_loss=0.02002, audio_tagging_loss=0.01009, over 3040153.03 frames. ], batch size: 56, lr: 4.95e-03, grad_scale: 16.0 2023-11-20 11:49:31,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1074380.0, ans=0.0 2023-11-20 11:49:55,228 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1074446.6666666667, ans=0.125 2023-11-20 11:50:02,545 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.13 vs. limit=15.0 2023-11-20 11:50:10,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1074580.0, ans=0.0 2023-11-20 11:50:25,128 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 161200 2023-11-20 11:50:25,764 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.85 vs. limit=12.0 2023-11-20 11:50:28,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1074646.6666666667, ans=0.125 2023-11-20 11:50:32,258 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.37 vs. limit=15.0 2023-11-20 11:50:36,475 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 4900, loss[loss=0.0944, simple_loss=0.1281, pruned_loss=0.02444, audio_tagging_loss=0.005907, over 15595.00 frames. ], tot_loss[loss=0.08089, simple_loss=0.1019, pruned_loss=0.01989, audio_tagging_loss=0.01004, over 3042314.63 frames. ], batch size: 56, lr: 4.95e-03, grad_scale: 16.0 2023-11-20 11:50:38,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1074713.3333333333, ans=0.0 2023-11-20 11:50:48,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1074713.3333333333, ans=0.2 2023-11-20 11:50:54,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1074780.0, ans=10.0 2023-11-20 11:50:56,961 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.282e+01 8.231e+01 8.987e+01 9.531e+01 1.955e+02, threshold=1.797e+02, percent-clipped=1.0 2023-11-20 11:51:22,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1074913.3333333333, ans=0.125 2023-11-20 11:51:30,140 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.52 vs. limit=15.0 2023-11-20 11:51:30,797 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 161250 2023-11-20 11:51:33,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1074980.0, ans=0.0 2023-11-20 11:51:37,285 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.15 vs. limit=12.0 2023-11-20 11:51:43,146 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 4950, loss[loss=0.08633, simple_loss=0.1125, pruned_loss=0.02076, audio_tagging_loss=0.009307, over 15343.00 frames. ], tot_loss[loss=0.08042, simple_loss=0.1014, pruned_loss=0.01978, audio_tagging_loss=0.009956, over 3042026.35 frames. ], batch size: 55, lr: 4.95e-03, grad_scale: 16.0 2023-11-20 11:51:44,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1075046.6666666667, ans=0.125 2023-11-20 11:51:44,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1075046.6666666667, ans=0.07 2023-11-20 11:51:46,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1075046.6666666667, ans=0.2 2023-11-20 11:51:53,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1075046.6666666667, ans=0.0 2023-11-20 11:51:54,558 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 11:52:02,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1075113.3333333333, ans=0.0 2023-11-20 11:52:36,381 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 161300 2023-11-20 11:52:40,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1075313.3333333333, ans=0.0 2023-11-20 11:52:42,819 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 11:52:46,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1075380.0, ans=0.1 2023-11-20 11:52:47,452 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 5000, loss[loss=0.05107, simple_loss=0.05509, pruned_loss=0.01019, audio_tagging_loss=0.01334, over 16126.00 frames. ], tot_loss[loss=0.07984, simple_loss=0.1007, pruned_loss=0.01965, audio_tagging_loss=0.009845, over 3046969.19 frames. ], batch size: 64, lr: 4.94e-03, grad_scale: 16.0 2023-11-20 11:53:04,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=1075446.6666666667, ans=0.0 2023-11-20 11:53:07,299 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.854e+01 7.908e+01 8.687e+01 9.464e+01 1.260e+02, threshold=1.737e+02, percent-clipped=0.0 2023-11-20 11:53:34,221 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.81 vs. limit=15.0 2023-11-20 11:53:41,363 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 161350 2023-11-20 11:53:41,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1075646.6666666667, ans=0.0 2023-11-20 11:53:48,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1075646.6666666667, ans=0.0 2023-11-20 11:53:52,358 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 5050, loss[loss=0.09931, simple_loss=0.1364, pruned_loss=0.02648, audio_tagging_loss=0.004603, over 14846.00 frames. ], tot_loss[loss=0.07905, simple_loss=0.09968, pruned_loss=0.0194, audio_tagging_loss=0.009806, over 3050444.56 frames. ], batch size: 54, lr: 4.94e-03, grad_scale: 16.0 2023-11-20 11:53:52,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1075713.3333333333, ans=0.125 2023-11-20 11:53:58,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1075713.3333333333, ans=0.125 2023-11-20 11:54:00,579 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1075713.3333333333, ans=0.125 2023-11-20 11:54:18,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1075846.6666666667, ans=0.1 2023-11-20 11:54:46,736 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 161400 2023-11-20 11:54:57,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1076046.6666666667, ans=0.125 2023-11-20 11:54:58,765 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 5100, loss[loss=0.079, simple_loss=0.1165, pruned_loss=0.01453, audio_tagging_loss=0.006198, over 15706.00 frames. ], tot_loss[loss=0.07924, simple_loss=0.09995, pruned_loss=0.01954, audio_tagging_loss=0.009726, over 3050522.74 frames. ], batch size: 58, lr: 4.94e-03, grad_scale: 16.0 2023-11-20 11:55:00,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1076046.6666666667, ans=0.2 2023-11-20 11:55:02,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1076046.6666666667, ans=0.125 2023-11-20 11:55:14,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1076113.3333333333, ans=0.1 2023-11-20 11:55:17,830 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.894e+01 8.531e+01 9.239e+01 1.028e+02 1.398e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-20 11:55:20,440 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.28 vs. limit=22.5 2023-11-20 11:55:39,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1076246.6666666667, ans=0.125 2023-11-20 11:55:46,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1076246.6666666667, ans=0.125 2023-11-20 11:55:51,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1076313.3333333333, ans=0.2 2023-11-20 11:55:52,393 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 161450 2023-11-20 11:56:04,121 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 5150, loss[loss=0.06545, simple_loss=0.08645, pruned_loss=0.0132, audio_tagging_loss=0.009022, over 15298.00 frames. ], tot_loss[loss=0.07932, simple_loss=0.1003, pruned_loss=0.01949, audio_tagging_loss=0.009672, over 3049067.12 frames. ], batch size: 58, lr: 4.94e-03, grad_scale: 16.0 2023-11-20 11:56:12,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1076380.0, ans=0.125 2023-11-20 11:56:21,473 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.57 vs. limit=15.0 2023-11-20 11:56:42,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1076580.0, ans=0.125 2023-11-20 11:56:44,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1076580.0, ans=0.2 2023-11-20 11:56:56,749 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.12 vs. limit=22.5 2023-11-20 11:56:57,366 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 161500 2023-11-20 11:57:09,034 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 5200, loss[loss=0.07826, simple_loss=0.1024, pruned_loss=0.0169, audio_tagging_loss=0.01014, over 15827.00 frames. ], tot_loss[loss=0.07926, simple_loss=0.1003, pruned_loss=0.01937, audio_tagging_loss=0.009726, over 3052903.70 frames. ], batch size: 59, lr: 4.94e-03, grad_scale: 32.0 2023-11-20 11:57:09,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1076713.3333333333, ans=0.2 2023-11-20 11:57:28,477 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.211e+01 8.173e+01 8.712e+01 9.541e+01 1.479e+02, threshold=1.742e+02, percent-clipped=0.0 2023-11-20 11:57:32,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1076780.0, ans=0.125 2023-11-20 11:57:42,868 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 11:58:01,802 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 161550 2023-11-20 11:58:14,011 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 5250, loss[loss=0.08167, simple_loss=0.1049, pruned_loss=0.01917, audio_tagging_loss=0.01005, over 14349.00 frames. ], tot_loss[loss=0.07957, simple_loss=0.1007, pruned_loss=0.01947, audio_tagging_loss=0.009743, over 3053118.48 frames. ], batch size: 53, lr: 4.94e-03, grad_scale: 32.0 2023-11-20 11:58:22,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1077046.6666666667, ans=0.2 2023-11-20 11:58:29,449 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.21 vs. limit=15.0 2023-11-20 11:58:45,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1077180.0, ans=0.2 2023-11-20 11:58:55,816 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.70 vs. limit=15.0 2023-11-20 11:58:56,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1077246.6666666667, ans=0.125 2023-11-20 11:59:06,720 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 161600 2023-11-20 11:59:18,102 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 5300, loss[loss=0.08338, simple_loss=0.1053, pruned_loss=0.02373, audio_tagging_loss=0.007025, over 15585.00 frames. ], tot_loss[loss=0.08064, simple_loss=0.1022, pruned_loss=0.01988, audio_tagging_loss=0.00968, over 3055317.10 frames. ], batch size: 61, lr: 4.94e-03, grad_scale: 32.0 2023-11-20 11:59:18,423 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1077380.0, ans=0.0 2023-11-20 11:59:37,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1077446.6666666667, ans=0.2 2023-11-20 11:59:38,922 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.714e+01 8.511e+01 9.160e+01 9.855e+01 1.370e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-20 11:59:44,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1077513.3333333333, ans=0.125 2023-11-20 12:00:01,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1077580.0, ans=0.0 2023-11-20 12:00:01,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1077580.0, ans=0.125 2023-11-20 12:00:03,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1077580.0, ans=0.125 2023-11-20 12:00:06,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1077580.0, ans=0.2 2023-11-20 12:00:11,316 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 161650 2023-11-20 12:00:13,060 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.29 vs. limit=15.0 2023-11-20 12:00:23,448 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 5350, loss[loss=0.08054, simple_loss=0.102, pruned_loss=0.01887, audio_tagging_loss=0.01067, over 15388.00 frames. ], tot_loss[loss=0.08019, simple_loss=0.1016, pruned_loss=0.01958, audio_tagging_loss=0.009792, over 3050180.51 frames. ], batch size: 57, lr: 4.94e-03, grad_scale: 16.0 2023-11-20 12:00:30,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1077713.3333333333, ans=0.125 2023-11-20 12:00:36,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1077780.0, ans=0.0 2023-11-20 12:00:41,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1077780.0, ans=0.1 2023-11-20 12:00:55,638 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 12:01:05,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1077913.3333333333, ans=0.125 2023-11-20 12:01:13,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1077913.3333333333, ans=0.2 2023-11-20 12:01:15,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1077980.0, ans=0.0 2023-11-20 12:01:16,607 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 161700 2023-11-20 12:01:28,384 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 5400, loss[loss=0.08056, simple_loss=0.1017, pruned_loss=0.01989, audio_tagging_loss=0.009796, over 15412.00 frames. ], tot_loss[loss=0.07953, simple_loss=0.1007, pruned_loss=0.0193, audio_tagging_loss=0.009864, over 3049599.09 frames. ], batch size: 58, lr: 4.94e-03, grad_scale: 16.0 2023-11-20 12:01:34,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1078046.6666666667, ans=0.0 2023-11-20 12:01:48,638 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.745e+01 8.017e+01 8.531e+01 9.206e+01 1.102e+02, threshold=1.706e+02, percent-clipped=0.0 2023-11-20 12:01:57,109 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.11 vs. limit=15.0 2023-11-20 12:02:01,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1078180.0, ans=10.0 2023-11-20 12:02:05,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1078246.6666666667, ans=0.0 2023-11-20 12:02:09,307 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.29 vs. limit=6.0 2023-11-20 12:02:10,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1078246.6666666667, ans=0.0 2023-11-20 12:02:20,735 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 161750 2023-11-20 12:02:32,292 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 5450, loss[loss=0.07691, simple_loss=0.1012, pruned_loss=0.01628, audio_tagging_loss=0.01004, over 14426.00 frames. ], tot_loss[loss=0.0797, simple_loss=0.1009, pruned_loss=0.01938, audio_tagging_loss=0.009849, over 3051552.25 frames. ], batch size: 53, lr: 4.94e-03, grad_scale: 16.0 2023-11-20 12:02:33,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1078380.0, ans=0.2 2023-11-20 12:02:40,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1078380.0, ans=0.1 2023-11-20 12:02:51,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1078446.6666666667, ans=0.1 2023-11-20 12:03:20,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1078580.0, ans=0.0 2023-11-20 12:03:21,823 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.72 vs. limit=6.0 2023-11-20 12:03:22,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1078646.6666666667, ans=0.125 2023-11-20 12:03:25,009 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 161800 2023-11-20 12:03:25,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1078646.6666666667, ans=0.035 2023-11-20 12:03:37,504 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 5500, loss[loss=0.08739, simple_loss=0.112, pruned_loss=0.02104, audio_tagging_loss=0.01035, over 15938.00 frames. ], tot_loss[loss=0.08051, simple_loss=0.1021, pruned_loss=0.01965, audio_tagging_loss=0.009814, over 3052560.98 frames. ], batch size: 58, lr: 4.94e-03, grad_scale: 8.0 2023-11-20 12:03:40,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1078713.3333333333, ans=0.125 2023-11-20 12:03:40,493 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.96 vs. limit=10.0 2023-11-20 12:03:50,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1078780.0, ans=0.125 2023-11-20 12:03:59,052 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.875e+01 8.226e+01 8.739e+01 9.564e+01 1.235e+02, threshold=1.748e+02, percent-clipped=0.0 2023-11-20 12:04:06,522 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.77 vs. limit=15.0 2023-11-20 12:04:17,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1078913.3333333333, ans=0.1 2023-11-20 12:04:30,656 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 161850 2023-11-20 12:04:42,362 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 5550, loss[loss=0.09368, simple_loss=0.1175, pruned_loss=0.02084, audio_tagging_loss=0.0141, over 15652.00 frames. ], tot_loss[loss=0.08123, simple_loss=0.1028, pruned_loss=0.01989, audio_tagging_loss=0.009927, over 3051559.47 frames. ], batch size: 59, lr: 4.94e-03, grad_scale: 8.0 2023-11-20 12:04:45,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1079046.6666666667, ans=0.125 2023-11-20 12:04:46,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1079046.6666666667, ans=0.2 2023-11-20 12:04:56,158 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.39 vs. limit=12.0 2023-11-20 12:04:57,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1079113.3333333333, ans=0.1 2023-11-20 12:05:01,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1079113.3333333333, ans=0.1 2023-11-20 12:05:25,695 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.73 vs. limit=15.0 2023-11-20 12:05:33,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1079313.3333333333, ans=0.125 2023-11-20 12:05:34,683 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.942e-01 2023-11-20 12:05:35,633 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 161900 2023-11-20 12:05:38,458 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.36 vs. limit=15.0 2023-11-20 12:05:43,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1079313.3333333333, ans=0.125 2023-11-20 12:05:47,159 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 5600, loss[loss=0.05089, simple_loss=0.06213, pruned_loss=0.007796, audio_tagging_loss=0.01203, over 15027.00 frames. ], tot_loss[loss=0.08191, simple_loss=0.1036, pruned_loss=0.0201, audio_tagging_loss=0.01003, over 3049685.23 frames. ], batch size: 58, lr: 4.94e-03, grad_scale: 16.0 2023-11-20 12:06:06,949 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.13 vs. limit=22.5 2023-11-20 12:06:09,445 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.864e+01 8.052e+01 8.920e+01 9.702e+01 1.303e+02, threshold=1.784e+02, percent-clipped=0.0 2023-11-20 12:06:26,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=1079580.0, ans=0.5 2023-11-20 12:06:35,290 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 12:06:40,281 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 161950 2023-11-20 12:06:45,880 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.92 vs. limit=15.0 2023-11-20 12:06:47,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1079646.6666666667, ans=0.0 2023-11-20 12:06:51,208 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 5650, loss[loss=0.08328, simple_loss=0.1107, pruned_loss=0.01682, audio_tagging_loss=0.01113, over 14920.00 frames. ], tot_loss[loss=0.08099, simple_loss=0.1024, pruned_loss=0.01965, audio_tagging_loss=0.01014, over 3056741.49 frames. ], batch size: 56, lr: 4.93e-03, grad_scale: 16.0 2023-11-20 12:07:36,141 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 12:07:37,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1079913.3333333333, ans=0.125 2023-11-20 12:07:45,007 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 162000 2023-11-20 12:07:56,857 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 5700, loss[loss=0.06653, simple_loss=0.0805, pruned_loss=0.01491, audio_tagging_loss=0.01137, over 15913.00 frames. ], tot_loss[loss=0.08029, simple_loss=0.1012, pruned_loss=0.01943, audio_tagging_loss=0.01024, over 3054347.28 frames. ], batch size: 60, lr: 4.93e-03, grad_scale: 16.0 2023-11-20 12:08:00,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1080046.6666666667, ans=0.1 2023-11-20 12:08:06,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1080046.6666666667, ans=0.125 2023-11-20 12:08:18,953 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 5.645e+01 7.906e+01 8.669e+01 9.570e+01 1.200e+02, threshold=1.734e+02, percent-clipped=0.0 2023-11-20 12:08:21,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1080180.0, ans=0.0 2023-11-20 12:08:50,299 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 162050 2023-11-20 12:09:02,093 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 5750, loss[loss=0.08784, simple_loss=0.1112, pruned_loss=0.02239, audio_tagging_loss=0.009866, over 14856.00 frames. ], tot_loss[loss=0.08013, simple_loss=0.1014, pruned_loss=0.01949, audio_tagging_loss=0.009964, over 3054877.85 frames. ], batch size: 55, lr: 4.93e-03, grad_scale: 16.0 2023-11-20 12:09:11,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1080380.0, ans=0.125 2023-11-20 12:09:16,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1080446.6666666667, ans=0.125 2023-11-20 12:09:35,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1080513.3333333333, ans=0.07 2023-11-20 12:09:55,115 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 162100 2023-11-20 12:09:57,838 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1080646.6666666667, ans=0.1 2023-11-20 12:10:06,169 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 5800, loss[loss=0.04863, simple_loss=0.05911, pruned_loss=0.007511, audio_tagging_loss=0.01156, over 16273.00 frames. ], tot_loss[loss=0.07934, simple_loss=0.1007, pruned_loss=0.01918, audio_tagging_loss=0.009831, over 3052882.11 frames. ], batch size: 63, lr: 4.93e-03, grad_scale: 16.0 2023-11-20 12:10:28,886 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.720e+01 8.138e+01 8.619e+01 9.335e+01 1.829e+02, threshold=1.724e+02, percent-clipped=1.0 2023-11-20 12:10:34,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1080846.6666666667, ans=0.0 2023-11-20 12:10:59,933 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 162150 2023-11-20 12:11:11,069 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 5850, loss[loss=0.08219, simple_loss=0.09918, pruned_loss=0.02206, audio_tagging_loss=0.01055, over 14703.00 frames. ], tot_loss[loss=0.07974, simple_loss=0.101, pruned_loss=0.01938, audio_tagging_loss=0.009849, over 3048400.94 frames. ], batch size: 56, lr: 4.93e-03, grad_scale: 16.0 2023-11-20 12:11:18,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1081046.6666666667, ans=0.5 2023-11-20 12:12:04,345 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 162200 2023-11-20 12:12:16,945 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 5900, loss[loss=0.06385, simple_loss=0.08794, pruned_loss=0.01185, audio_tagging_loss=0.008033, over 15312.00 frames. ], tot_loss[loss=0.07994, simple_loss=0.1013, pruned_loss=0.01946, audio_tagging_loss=0.00981, over 3057580.24 frames. ], batch size: 58, lr: 4.93e-03, grad_scale: 16.0 2023-11-20 12:12:17,535 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.03 vs. limit=15.0 2023-11-20 12:12:29,773 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.61 vs. limit=15.0 2023-11-20 12:12:34,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=1081446.6666666667, ans=0.05 2023-11-20 12:12:38,401 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.547e+01 7.908e+01 8.564e+01 9.521e+01 1.124e+02, threshold=1.713e+02, percent-clipped=0.0 2023-11-20 12:13:08,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1081646.6666666667, ans=0.125 2023-11-20 12:13:10,401 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 162250 2023-11-20 12:13:13,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1081646.6666666667, ans=0.0 2023-11-20 12:13:21,353 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 5950, loss[loss=0.09012, simple_loss=0.1116, pruned_loss=0.02211, audio_tagging_loss=0.01221, over 15198.00 frames. ], tot_loss[loss=0.08048, simple_loss=0.1018, pruned_loss=0.01973, audio_tagging_loss=0.009827, over 3059481.01 frames. ], batch size: 57, lr: 4.93e-03, grad_scale: 16.0 2023-11-20 12:13:21,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1081713.3333333333, ans=0.125 2023-11-20 12:13:40,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1081780.0, ans=0.125 2023-11-20 12:13:40,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1081780.0, ans=0.125 2023-11-20 12:14:05,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1081913.3333333333, ans=0.2 2023-11-20 12:14:14,701 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 162300 2023-11-20 12:14:26,374 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 6000, loss[loss=0.06833, simple_loss=0.08799, pruned_loss=0.01379, audio_tagging_loss=0.01054, over 15421.00 frames. ], tot_loss[loss=0.07939, simple_loss=0.1005, pruned_loss=0.01929, audio_tagging_loss=0.009834, over 3050000.18 frames. ], batch size: 60, lr: 4.93e-03, grad_scale: 32.0 2023-11-20 12:14:26,375 INFO [train_asr.py:1285] (1/4) Computing validation loss 2023-11-20 12:14:59,151 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.6341, 3.5765, 3.8320, 3.3905], device='cuda:1') 2023-11-20 12:15:09,616 INFO [train_asr.py:1294] (1/4) Epoch 14, validation: loss=0.06225, simple_loss=0.05354, pruned_loss=0.005677, audio_tagging_loss=0.0298, over 4681554.00 frames. 2023-11-20 12:15:09,617 INFO [train_asr.py:1295] (1/4) Maximum memory allocated so far is 26082MB 2023-11-20 12:15:17,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1082046.6666666667, ans=0.0 2023-11-20 12:15:19,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1082046.6666666667, ans=0.2 2023-11-20 12:15:23,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1082113.3333333333, ans=0.125 2023-11-20 12:15:30,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1082113.3333333333, ans=0.0 2023-11-20 12:15:31,319 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.652e+01 8.215e+01 8.669e+01 9.712e+01 1.545e+02, threshold=1.734e+02, percent-clipped=0.0 2023-11-20 12:15:35,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1082180.0, ans=0.125 2023-11-20 12:15:47,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=1082246.6666666667, ans=0.125 2023-11-20 12:15:58,178 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.96 vs. limit=15.0 2023-11-20 12:15:58,687 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 12:16:02,655 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 162350 2023-11-20 12:16:11,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=1082313.3333333333, ans=10.0 2023-11-20 12:16:13,793 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 6050, loss[loss=0.08789, simple_loss=0.1087, pruned_loss=0.0232, audio_tagging_loss=0.01036, over 15090.00 frames. ], tot_loss[loss=0.07936, simple_loss=0.1004, pruned_loss=0.01932, audio_tagging_loss=0.009821, over 3045141.20 frames. ], batch size: 56, lr: 4.93e-03, grad_scale: 32.0 2023-11-20 12:16:34,169 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.73 vs. limit=12.0 2023-11-20 12:16:40,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1082513.3333333333, ans=0.09899494936611666 2023-11-20 12:16:43,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1082513.3333333333, ans=0.125 2023-11-20 12:17:01,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=1082580.0, ans=0.04949747468305833 2023-11-20 12:17:07,525 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 162400 2023-11-20 12:17:07,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1082646.6666666667, ans=0.0 2023-11-20 12:17:09,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=1082646.6666666667, ans=0.07 2023-11-20 12:17:19,043 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 6100, loss[loss=0.06903, simple_loss=0.08841, pruned_loss=0.01774, audio_tagging_loss=0.007089, over 16194.00 frames. ], tot_loss[loss=0.07961, simple_loss=0.1003, pruned_loss=0.0195, audio_tagging_loss=0.00995, over 3047154.33 frames. ], batch size: 61, lr: 4.93e-03, grad_scale: 32.0 2023-11-20 12:17:32,765 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.88 vs. limit=15.0 2023-11-20 12:17:38,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1082780.0, ans=0.125 2023-11-20 12:17:41,819 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.187e+01 8.098e+01 8.908e+01 9.804e+01 1.147e+02, threshold=1.782e+02, percent-clipped=0.0 2023-11-20 12:17:45,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1082846.6666666667, ans=0.0 2023-11-20 12:18:03,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1082913.3333333333, ans=0.2 2023-11-20 12:18:08,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=1082913.3333333333, ans=0.125 2023-11-20 12:18:12,864 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 162450 2023-11-20 12:18:13,639 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.92 vs. limit=15.0 2023-11-20 12:18:24,011 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 6150, loss[loss=0.07915, simple_loss=0.09733, pruned_loss=0.01936, audio_tagging_loss=0.01113, over 15265.00 frames. ], tot_loss[loss=0.0801, simple_loss=0.1011, pruned_loss=0.01968, audio_tagging_loss=0.009879, over 3052995.36 frames. ], batch size: 57, lr: 4.93e-03, grad_scale: 32.0 2023-11-20 12:18:34,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1083046.6666666667, ans=0.2 2023-11-20 12:18:45,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1083113.3333333333, ans=0.1 2023-11-20 12:18:54,410 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 12:19:07,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1083246.6666666667, ans=0.125 2023-11-20 12:19:16,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1083313.3333333333, ans=0.125 2023-11-20 12:19:17,356 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 162500 2023-11-20 12:19:29,085 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 6200, loss[loss=0.06438, simple_loss=0.07413, pruned_loss=0.01458, audio_tagging_loss=0.01274, over 13334.00 frames. ], tot_loss[loss=0.08002, simple_loss=0.1007, pruned_loss=0.01971, audio_tagging_loss=0.009963, over 3049049.95 frames. ], batch size: 54, lr: 4.93e-03, grad_scale: 32.0 2023-11-20 12:19:40,463 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.92 vs. limit=15.0 2023-11-20 12:19:47,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1083446.6666666667, ans=0.0 2023-11-20 12:19:50,106 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=11.04 vs. limit=12.0 2023-11-20 12:19:51,146 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.044e+01 8.194e+01 8.727e+01 9.528e+01 1.309e+02, threshold=1.745e+02, percent-clipped=0.0 2023-11-20 12:19:54,689 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.03 vs. limit=15.0 2023-11-20 12:20:15,129 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.40 vs. limit=12.0 2023-11-20 12:20:15,813 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 12:20:21,974 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 162550 2023-11-20 12:20:27,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1083646.6666666667, ans=0.125 2023-11-20 12:20:33,713 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 6250, loss[loss=0.06599, simple_loss=0.08021, pruned_loss=0.01617, audio_tagging_loss=0.009724, over 15171.00 frames. ], tot_loss[loss=0.07967, simple_loss=0.1002, pruned_loss=0.01959, audio_tagging_loss=0.00996, over 3045945.26 frames. ], batch size: 58, lr: 4.93e-03, grad_scale: 32.0 2023-11-20 12:20:35,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1083713.3333333333, ans=0.125 2023-11-20 12:20:42,282 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.23 vs. limit=22.5 2023-11-20 12:21:10,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1083846.6666666667, ans=0.0 2023-11-20 12:21:26,677 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 162600 2023-11-20 12:21:29,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1083980.0, ans=0.125 2023-11-20 12:21:37,969 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 6300, loss[loss=0.07513, simple_loss=0.08592, pruned_loss=0.02182, audio_tagging_loss=0.01036, over 14284.00 frames. ], tot_loss[loss=0.07982, simple_loss=0.1004, pruned_loss=0.01959, audio_tagging_loss=0.01004, over 3042101.62 frames. ], batch size: 56, lr: 4.92e-03, grad_scale: 32.0 2023-11-20 12:21:38,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1084046.6666666667, ans=0.0 2023-11-20 12:22:00,273 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.702e+01 8.158e+01 9.011e+01 9.819e+01 1.577e+02, threshold=1.802e+02, percent-clipped=0.0 2023-11-20 12:22:12,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1084180.0, ans=0.125 2023-11-20 12:22:12,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1084180.0, ans=0.05 2023-11-20 12:22:32,225 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 162650 2023-11-20 12:22:43,822 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 6350, loss[loss=0.06138, simple_loss=0.06781, pruned_loss=0.01425, audio_tagging_loss=0.01323, over 14525.00 frames. ], tot_loss[loss=0.07929, simple_loss=0.09942, pruned_loss=0.01944, audio_tagging_loss=0.01014, over 3037411.78 frames. ], batch size: 57, lr: 4.92e-03, grad_scale: 32.0 2023-11-20 12:22:46,023 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.38 vs. limit=10.0 2023-11-20 12:22:52,651 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.15 vs. limit=10.0 2023-11-20 12:23:02,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1084446.6666666667, ans=0.1 2023-11-20 12:23:08,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1084513.3333333333, ans=10.0 2023-11-20 12:23:36,504 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 162700 2023-11-20 12:23:45,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1084646.6666666667, ans=10.0 2023-11-20 12:23:48,111 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 6400, loss[loss=0.08441, simple_loss=0.1047, pruned_loss=0.02153, audio_tagging_loss=0.01052, over 14646.00 frames. ], tot_loss[loss=0.07915, simple_loss=0.09914, pruned_loss=0.01932, audio_tagging_loss=0.01026, over 3037414.85 frames. ], batch size: 56, lr: 4.92e-03, grad_scale: 32.0 2023-11-20 12:23:51,242 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.76 vs. limit=22.5 2023-11-20 12:24:03,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1084780.0, ans=0.1 2023-11-20 12:24:10,368 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.846e+01 8.159e+01 8.686e+01 9.396e+01 1.221e+02, threshold=1.737e+02, percent-clipped=0.0 2023-11-20 12:24:28,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1084913.3333333333, ans=0.0 2023-11-20 12:24:40,965 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 162750 2023-11-20 12:24:47,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=1084980.0, ans=0.0 2023-11-20 12:24:52,687 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 6450, loss[loss=0.1017, simple_loss=0.1362, pruned_loss=0.02491, audio_tagging_loss=0.008675, over 16366.00 frames. ], tot_loss[loss=0.07936, simple_loss=0.09935, pruned_loss=0.01935, audio_tagging_loss=0.01034, over 3038929.74 frames. ], batch size: 57, lr: 4.92e-03, grad_scale: 32.0 2023-11-20 12:25:04,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1085113.3333333333, ans=0.0 2023-11-20 12:25:19,988 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.85 vs. limit=22.5 2023-11-20 12:25:21,387 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.87 vs. limit=15.0 2023-11-20 12:25:45,773 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 162800 2023-11-20 12:25:57,220 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 6500, loss[loss=0.09241, simple_loss=0.1256, pruned_loss=0.02247, audio_tagging_loss=0.007136, over 16520.00 frames. ], tot_loss[loss=0.07883, simple_loss=0.09883, pruned_loss=0.01912, audio_tagging_loss=0.01029, over 3042498.02 frames. ], batch size: 61, lr: 4.92e-03, grad_scale: 32.0 2023-11-20 12:26:10,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1085446.6666666667, ans=0.2 2023-11-20 12:26:18,888 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.899e+01 8.195e+01 8.750e+01 9.288e+01 1.237e+02, threshold=1.750e+02, percent-clipped=0.0 2023-11-20 12:26:20,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1085446.6666666667, ans=0.1 2023-11-20 12:26:37,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1085580.0, ans=0.1 2023-11-20 12:26:44,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1085580.0, ans=0.125 2023-11-20 12:26:49,737 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 162850 2023-11-20 12:26:49,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1085646.6666666667, ans=0.0 2023-11-20 12:26:55,610 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.86 vs. limit=12.0 2023-11-20 12:27:01,694 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 6550, loss[loss=0.08356, simple_loss=0.105, pruned_loss=0.02167, audio_tagging_loss=0.009406, over 15146.00 frames. ], tot_loss[loss=0.07897, simple_loss=0.09918, pruned_loss=0.01931, audio_tagging_loss=0.01007, over 3041588.62 frames. ], batch size: 58, lr: 4.92e-03, grad_scale: 32.0 2023-11-20 12:27:07,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1085713.3333333333, ans=0.125 2023-11-20 12:27:15,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1085780.0, ans=0.125 2023-11-20 12:27:17,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1085780.0, ans=0.1 2023-11-20 12:27:36,991 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.57 vs. limit=15.0 2023-11-20 12:27:51,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1085913.3333333333, ans=0.0 2023-11-20 12:27:54,776 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 162900 2023-11-20 12:28:06,261 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 6600, loss[loss=0.07185, simple_loss=0.08446, pruned_loss=0.01587, audio_tagging_loss=0.01376, over 15922.00 frames. ], tot_loss[loss=0.07866, simple_loss=0.0989, pruned_loss=0.01918, audio_tagging_loss=0.01003, over 3040815.61 frames. ], batch size: 59, lr: 4.92e-03, grad_scale: 32.0 2023-11-20 12:28:07,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=1086046.6666666667, ans=0.2 2023-11-20 12:28:20,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1086113.3333333333, ans=0.035 2023-11-20 12:28:27,638 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.865e+01 8.102e+01 8.649e+01 9.372e+01 1.515e+02, threshold=1.730e+02, percent-clipped=0.0 2023-11-20 12:28:45,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1086246.6666666667, ans=0.125 2023-11-20 12:28:53,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1086246.6666666667, ans=0.0 2023-11-20 12:28:59,020 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 162950 2023-11-20 12:29:04,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1086313.3333333333, ans=0.125 2023-11-20 12:29:10,567 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 6650, loss[loss=0.07756, simple_loss=0.1012, pruned_loss=0.01645, audio_tagging_loss=0.0105, over 14702.00 frames. ], tot_loss[loss=0.0783, simple_loss=0.09857, pruned_loss=0.01908, audio_tagging_loss=0.009931, over 3043040.67 frames. ], batch size: 56, lr: 4.92e-03, grad_scale: 32.0 2023-11-20 12:29:15,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1086380.0, ans=0.125 2023-11-20 12:29:32,301 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.71 vs. limit=6.0 2023-11-20 12:29:34,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=1086446.6666666667, ans=0.95 2023-11-20 12:29:55,911 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.02 vs. limit=6.0 2023-11-20 12:30:02,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1086646.6666666667, ans=0.1 2023-11-20 12:30:03,761 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 163000 2023-11-20 12:30:15,792 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 6700, loss[loss=0.06118, simple_loss=0.07495, pruned_loss=0.01008, audio_tagging_loss=0.01362, over 15722.00 frames. ], tot_loss[loss=0.07933, simple_loss=0.09974, pruned_loss=0.01959, audio_tagging_loss=0.009869, over 3044738.32 frames. ], batch size: 58, lr: 4.92e-03, grad_scale: 32.0 2023-11-20 12:30:17,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1086713.3333333333, ans=0.2 2023-11-20 12:30:18,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1086713.3333333333, ans=0.0 2023-11-20 12:30:36,748 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 12:30:37,676 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.914e+01 8.088e+01 8.794e+01 9.537e+01 1.389e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-20 12:30:49,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1086846.6666666667, ans=0.2 2023-11-20 12:30:54,945 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.85 vs. limit=15.0 2023-11-20 12:31:08,496 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 163050 2023-11-20 12:31:09,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1086980.0, ans=0.125 2023-11-20 12:31:20,543 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 6750, loss[loss=0.1019, simple_loss=0.1399, pruned_loss=0.02321, audio_tagging_loss=0.008728, over 16503.00 frames. ], tot_loss[loss=0.07915, simple_loss=0.09984, pruned_loss=0.01939, audio_tagging_loss=0.00984, over 3040381.04 frames. ], batch size: 60, lr: 4.92e-03, grad_scale: 32.0 2023-11-20 12:31:28,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1087046.6666666667, ans=0.1 2023-11-20 12:31:45,073 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.56 vs. limit=15.0 2023-11-20 12:31:45,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1087180.0, ans=0.035 2023-11-20 12:31:47,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1087180.0, ans=0.0 2023-11-20 12:31:59,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1087246.6666666667, ans=0.0 2023-11-20 12:31:59,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1087246.6666666667, ans=0.1 2023-11-20 12:32:13,360 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 163100 2023-11-20 12:32:24,818 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 6800, loss[loss=0.08692, simple_loss=0.11, pruned_loss=0.02027, audio_tagging_loss=0.01163, over 15294.00 frames. ], tot_loss[loss=0.07867, simple_loss=0.09904, pruned_loss=0.01921, audio_tagging_loss=0.009944, over 3047332.68 frames. ], batch size: 60, lr: 4.92e-03, grad_scale: 32.0 2023-11-20 12:32:41,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=1087446.6666666667, ans=0.0 2023-11-20 12:32:45,878 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.532e+01 7.924e+01 8.642e+01 9.401e+01 1.270e+02, threshold=1.728e+02, percent-clipped=0.0 2023-11-20 12:32:50,234 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.85 vs. limit=15.0 2023-11-20 12:33:03,702 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.98 vs. limit=10.0 2023-11-20 12:33:17,168 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 163150 2023-11-20 12:33:28,576 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 6850, loss[loss=0.08878, simple_loss=0.1234, pruned_loss=0.02041, audio_tagging_loss=0.006681, over 14036.00 frames. ], tot_loss[loss=0.07882, simple_loss=0.09944, pruned_loss=0.01922, audio_tagging_loss=0.009878, over 3050435.79 frames. ], batch size: 52, lr: 4.92e-03, grad_scale: 32.0 2023-11-20 12:34:07,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1087913.3333333333, ans=0.125 2023-11-20 12:34:13,907 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.93 vs. limit=22.5 2023-11-20 12:34:14,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1087913.3333333333, ans=0.5 2023-11-20 12:34:21,355 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 163200 2023-11-20 12:34:32,693 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 6900, loss[loss=0.08561, simple_loss=0.1074, pruned_loss=0.02132, audio_tagging_loss=0.01061, over 15096.00 frames. ], tot_loss[loss=0.07935, simple_loss=0.1002, pruned_loss=0.01942, audio_tagging_loss=0.009832, over 3047464.42 frames. ], batch size: 58, lr: 4.92e-03, grad_scale: 32.0 2023-11-20 12:34:41,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1088046.6666666667, ans=0.0 2023-11-20 12:34:42,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1088046.6666666667, ans=0.1 2023-11-20 12:34:55,158 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.594e+01 8.124e+01 8.683e+01 9.436e+01 1.192e+02, threshold=1.737e+02, percent-clipped=0.0 2023-11-20 12:35:07,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1088180.0, ans=0.0 2023-11-20 12:35:08,858 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.86 vs. limit=15.0 2023-11-20 12:35:23,551 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.49 vs. limit=15.0 2023-11-20 12:35:23,579 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.57 vs. limit=6.0 2023-11-20 12:35:24,867 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 12:35:26,136 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 163250 2023-11-20 12:35:34,681 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.35 vs. limit=15.0 2023-11-20 12:35:38,408 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 6950, loss[loss=0.06142, simple_loss=0.07272, pruned_loss=0.0146, audio_tagging_loss=0.01045, over 14377.00 frames. ], tot_loss[loss=0.07926, simple_loss=0.09996, pruned_loss=0.01938, audio_tagging_loss=0.009895, over 3036402.21 frames. ], batch size: 55, lr: 4.92e-03, grad_scale: 32.0 2023-11-20 12:35:39,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1088380.0, ans=0.125 2023-11-20 12:35:53,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1088446.6666666667, ans=0.0 2023-11-20 12:35:56,970 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.10 vs. limit=15.0 2023-11-20 12:35:58,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1088446.6666666667, ans=0.125 2023-11-20 12:36:03,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1088513.3333333333, ans=0.0 2023-11-20 12:36:29,684 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.68 vs. limit=10.0 2023-11-20 12:36:31,657 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 163300 2023-11-20 12:36:35,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1088646.6666666667, ans=0.0 2023-11-20 12:36:38,247 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.12 vs. limit=15.0 2023-11-20 12:36:42,513 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 7000, loss[loss=0.08877, simple_loss=0.1148, pruned_loss=0.02083, audio_tagging_loss=0.01055, over 15025.00 frames. ], tot_loss[loss=0.07881, simple_loss=0.09908, pruned_loss=0.01929, audio_tagging_loss=0.009981, over 3034319.84 frames. ], batch size: 55, lr: 4.91e-03, grad_scale: 32.0 2023-11-20 12:36:49,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1088713.3333333333, ans=0.125 2023-11-20 12:36:55,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1088780.0, ans=0.0 2023-11-20 12:36:58,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=1088780.0, ans=0.2 2023-11-20 12:37:04,380 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.412e+01 8.025e+01 8.662e+01 9.457e+01 1.125e+02, threshold=1.732e+02, percent-clipped=0.0 2023-11-20 12:37:06,316 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.75 vs. limit=15.0 2023-11-20 12:37:14,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1088846.6666666667, ans=0.125 2023-11-20 12:37:24,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1088913.3333333333, ans=0.0 2023-11-20 12:37:35,892 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 163350 2023-11-20 12:37:45,032 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.33 vs. limit=10.0 2023-11-20 12:37:46,691 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 7050, loss[loss=0.05372, simple_loss=0.06177, pruned_loss=0.00719, audio_tagging_loss=0.01565, over 15324.00 frames. ], tot_loss[loss=0.07847, simple_loss=0.09844, pruned_loss=0.01907, audio_tagging_loss=0.01017, over 3035008.00 frames. ], batch size: 59, lr: 4.91e-03, grad_scale: 32.0 2023-11-20 12:37:56,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1089046.6666666667, ans=0.125 2023-11-20 12:38:17,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1089180.0, ans=0.0 2023-11-20 12:38:21,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1089180.0, ans=0.0 2023-11-20 12:38:23,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=1089180.0, ans=0.125 2023-11-20 12:38:35,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=1089246.6666666667, ans=0.0 2023-11-20 12:38:39,740 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 163400 2023-11-20 12:38:48,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1089313.3333333333, ans=0.2 2023-11-20 12:38:51,915 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 7100, loss[loss=0.08274, simple_loss=0.1169, pruned_loss=0.01773, audio_tagging_loss=0.006546, over 16382.00 frames. ], tot_loss[loss=0.07939, simple_loss=0.09993, pruned_loss=0.0193, audio_tagging_loss=0.01013, over 3043899.17 frames. ], batch size: 60, lr: 4.91e-03, grad_scale: 32.0 2023-11-20 12:39:14,018 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.595e+01 8.043e+01 8.663e+01 9.375e+01 1.240e+02, threshold=1.733e+02, percent-clipped=0.0 2023-11-20 12:39:20,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1089513.3333333333, ans=0.125 2023-11-20 12:39:25,725 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.76 vs. limit=15.0 2023-11-20 12:39:45,230 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 163450 2023-11-20 12:39:48,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=1089646.6666666667, ans=15.0 2023-11-20 12:39:55,999 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 7150, loss[loss=0.07783, simple_loss=0.09186, pruned_loss=0.02233, audio_tagging_loss=0.009571, over 14589.00 frames. ], tot_loss[loss=0.07885, simple_loss=0.09938, pruned_loss=0.01899, audio_tagging_loss=0.01018, over 3048319.55 frames. ], batch size: 58, lr: 4.91e-03, grad_scale: 32.0 2023-11-20 12:40:03,493 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.36 vs. limit=10.0 2023-11-20 12:40:06,128 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.06 vs. limit=12.0 2023-11-20 12:40:12,255 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.84 vs. limit=15.0 2023-11-20 12:40:48,649 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 163500 2023-11-20 12:41:00,078 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 7200, loss[loss=0.0737, simple_loss=0.09344, pruned_loss=0.0151, audio_tagging_loss=0.01189, over 15656.00 frames. ], tot_loss[loss=0.07861, simple_loss=0.09882, pruned_loss=0.01892, audio_tagging_loss=0.01028, over 3051436.47 frames. ], batch size: 62, lr: 4.91e-03, grad_scale: 32.0 2023-11-20 12:41:00,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1090046.6666666667, ans=0.09899494936611666 2023-11-20 12:41:07,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=1090046.6666666667, ans=0.025 2023-11-20 12:41:18,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=1090113.3333333333, ans=0.0 2023-11-20 12:41:20,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=1090113.3333333333, ans=6.0 2023-11-20 12:41:23,963 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.741e+01 8.075e+01 8.632e+01 9.241e+01 3.399e+02, threshold=1.726e+02, percent-clipped=1.0 2023-11-20 12:41:34,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1090180.0, ans=0.125 2023-11-20 12:41:52,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1090313.3333333333, ans=0.09899494936611666 2023-11-20 12:41:53,235 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 163550 2023-11-20 12:41:56,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1090313.3333333333, ans=0.125 2023-11-20 12:42:02,916 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.42 vs. limit=15.0 2023-11-20 12:42:04,802 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 7250, loss[loss=0.1107, simple_loss=0.1478, pruned_loss=0.0308, audio_tagging_loss=0.006004, over 15258.00 frames. ], tot_loss[loss=0.0792, simple_loss=0.09982, pruned_loss=0.01904, audio_tagging_loss=0.01025, over 3046498.60 frames. ], batch size: 56, lr: 4.91e-03, grad_scale: 32.0 2023-11-20 12:42:05,329 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.34 vs. limit=15.0 2023-11-20 12:42:37,582 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.83 vs. limit=10.0 2023-11-20 12:42:48,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1090580.0, ans=0.2 2023-11-20 12:42:57,778 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 163600 2023-11-20 12:43:01,047 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.02 vs. limit=15.0 2023-11-20 12:43:09,572 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 7300, loss[loss=0.0784, simple_loss=0.1013, pruned_loss=0.01818, audio_tagging_loss=0.009584, over 16494.00 frames. ], tot_loss[loss=0.07875, simple_loss=0.09937, pruned_loss=0.01895, audio_tagging_loss=0.01011, over 3050368.97 frames. ], batch size: 60, lr: 4.91e-03, grad_scale: 32.0 2023-11-20 12:43:11,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1090713.3333333333, ans=0.0 2023-11-20 12:43:24,257 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.02 vs. limit=15.0 2023-11-20 12:43:31,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1090780.0, ans=0.125 2023-11-20 12:43:31,935 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.842e+01 8.253e+01 8.937e+01 9.591e+01 1.343e+02, threshold=1.787e+02, percent-clipped=0.0 2023-11-20 12:44:01,930 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 163650 2023-11-20 12:44:03,383 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 12:44:13,397 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 7350, loss[loss=0.06054, simple_loss=0.07027, pruned_loss=0.01423, audio_tagging_loss=0.01118, over 14999.00 frames. ], tot_loss[loss=0.07843, simple_loss=0.09909, pruned_loss=0.01883, audio_tagging_loss=0.01005, over 3046677.90 frames. ], batch size: 58, lr: 4.91e-03, grad_scale: 32.0 2023-11-20 12:44:19,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1091046.6666666667, ans=0.125 2023-11-20 12:44:23,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=1091046.6666666667, ans=0.2 2023-11-20 12:45:00,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1091246.6666666667, ans=0.1 2023-11-20 12:45:05,584 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 163700 2023-11-20 12:45:07,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1091313.3333333333, ans=0.0 2023-11-20 12:45:11,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1091313.3333333333, ans=0.125 2023-11-20 12:45:17,054 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 7400, loss[loss=0.07949, simple_loss=0.09742, pruned_loss=0.02204, audio_tagging_loss=0.008735, over 14846.00 frames. ], tot_loss[loss=0.07856, simple_loss=0.09927, pruned_loss=0.01896, audio_tagging_loss=0.00997, over 3048237.32 frames. ], batch size: 57, lr: 4.91e-03, grad_scale: 16.0 2023-11-20 12:45:21,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1091380.0, ans=0.025 2023-11-20 12:45:39,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1091446.6666666667, ans=0.0 2023-11-20 12:45:41,930 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.771e+01 8.379e+01 9.065e+01 9.616e+01 1.278e+02, threshold=1.813e+02, percent-clipped=0.0 2023-11-20 12:45:44,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1091513.3333333333, ans=0.2 2023-11-20 12:45:44,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1091513.3333333333, ans=0.2 2023-11-20 12:45:49,377 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.04 vs. limit=22.5 2023-11-20 12:45:58,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1091580.0, ans=0.125 2023-11-20 12:46:05,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1091580.0, ans=0.125 2023-11-20 12:46:08,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1091646.6666666667, ans=0.125 2023-11-20 12:46:10,317 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 163750 2023-11-20 12:46:21,963 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 7450, loss[loss=0.0815, simple_loss=0.1038, pruned_loss=0.01959, audio_tagging_loss=0.009997, over 15625.00 frames. ], tot_loss[loss=0.07859, simple_loss=0.09924, pruned_loss=0.01907, audio_tagging_loss=0.009897, over 3042447.38 frames. ], batch size: 58, lr: 4.91e-03, grad_scale: 16.0 2023-11-20 12:46:23,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=1091713.3333333333, ans=0.125 2023-11-20 12:46:47,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1091846.6666666667, ans=0.0 2023-11-20 12:47:06,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1091913.3333333333, ans=0.125 2023-11-20 12:47:14,951 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 163800 2023-11-20 12:47:16,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=1091980.0, ans=0.09899494936611666 2023-11-20 12:47:24,357 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.99 vs. limit=12.0 2023-11-20 12:47:25,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1091980.0, ans=0.025 2023-11-20 12:47:27,448 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 7500, loss[loss=0.1112, simple_loss=0.1398, pruned_loss=0.03185, audio_tagging_loss=0.009451, over 14630.00 frames. ], tot_loss[loss=0.07921, simple_loss=0.1, pruned_loss=0.01935, audio_tagging_loss=0.009844, over 3040761.29 frames. ], batch size: 55, lr: 4.91e-03, grad_scale: 16.0 2023-11-20 12:47:40,669 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.60 vs. limit=22.5 2023-11-20 12:47:44,501 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.91 vs. limit=15.0 2023-11-20 12:47:51,403 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.921e+01 8.205e+01 8.885e+01 9.811e+01 1.439e+02, threshold=1.777e+02, percent-clipped=0.0 2023-11-20 12:48:19,231 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 163850 2023-11-20 12:48:27,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1092313.3333333333, ans=0.0 2023-11-20 12:48:30,687 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 7550, loss[loss=0.07239, simple_loss=0.08092, pruned_loss=0.01937, audio_tagging_loss=0.01256, over 15335.00 frames. ], tot_loss[loss=0.07893, simple_loss=0.09957, pruned_loss=0.01936, audio_tagging_loss=0.009791, over 3043220.58 frames. ], batch size: 58, lr: 4.91e-03, grad_scale: 16.0 2023-11-20 12:48:33,679 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.61 vs. limit=15.0 2023-11-20 12:48:59,825 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.02 vs. limit=15.0 2023-11-20 12:49:11,043 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.23 vs. limit=15.0 2023-11-20 12:49:14,859 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.62 vs. limit=22.5 2023-11-20 12:49:18,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1092580.0, ans=0.1 2023-11-20 12:49:22,856 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.50 vs. limit=15.0 2023-11-20 12:49:23,460 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 163900 2023-11-20 12:49:27,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1092646.6666666667, ans=0.125 2023-11-20 12:49:34,869 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 7600, loss[loss=0.06633, simple_loss=0.09146, pruned_loss=0.01283, audio_tagging_loss=0.007763, over 13769.00 frames. ], tot_loss[loss=0.07899, simple_loss=0.09976, pruned_loss=0.01927, audio_tagging_loss=0.009839, over 3047417.19 frames. ], batch size: 53, lr: 4.91e-03, grad_scale: 32.0 2023-11-20 12:49:37,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1092713.3333333333, ans=0.125 2023-11-20 12:49:53,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1092780.0, ans=0.2 2023-11-20 12:49:59,227 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.781e+01 8.009e+01 8.543e+01 9.202e+01 1.104e+02, threshold=1.709e+02, percent-clipped=0.0 2023-11-20 12:50:02,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1092846.6666666667, ans=0.125 2023-11-20 12:50:12,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1092913.3333333333, ans=0.1 2023-11-20 12:50:27,539 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 163950 2023-11-20 12:50:39,677 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 7650, loss[loss=0.0957, simple_loss=0.1287, pruned_loss=0.02462, audio_tagging_loss=0.006714, over 14917.00 frames. ], tot_loss[loss=0.07873, simple_loss=0.09958, pruned_loss=0.01914, audio_tagging_loss=0.009799, over 3051644.38 frames. ], batch size: 56, lr: 4.90e-03, grad_scale: 32.0 2023-11-20 12:50:43,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1093046.6666666667, ans=0.125 2023-11-20 12:50:58,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1093113.3333333333, ans=0.1 2023-11-20 12:51:00,410 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.66 vs. limit=15.0 2023-11-20 12:51:02,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=1093113.3333333333, ans=0.125 2023-11-20 12:51:20,489 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.63 vs. limit=15.0 2023-11-20 12:51:31,746 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 164000 2023-11-20 12:51:40,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1093313.3333333333, ans=0.1 2023-11-20 12:51:47,543 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 7700, loss[loss=0.08279, simple_loss=0.1125, pruned_loss=0.01945, audio_tagging_loss=0.007108, over 16112.00 frames. ], tot_loss[loss=0.07881, simple_loss=0.09966, pruned_loss=0.01917, audio_tagging_loss=0.009809, over 3047681.14 frames. ], batch size: 59, lr: 4.90e-03, grad_scale: 32.0 2023-11-20 12:51:52,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1093380.0, ans=0.0 2023-11-20 12:52:11,268 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.534e+01 7.870e+01 8.763e+01 9.438e+01 1.322e+02, threshold=1.753e+02, percent-clipped=0.0 2023-11-20 12:52:11,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1093513.3333333333, ans=0.035 2023-11-20 12:52:14,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1093513.3333333333, ans=0.125 2023-11-20 12:52:17,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=1093513.3333333333, ans=0.04949747468305833 2023-11-20 12:52:17,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1093513.3333333333, ans=0.125 2023-11-20 12:52:38,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1093646.6666666667, ans=0.1 2023-11-20 12:52:39,895 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 164050 2023-11-20 12:52:42,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1093646.6666666667, ans=0.0 2023-11-20 12:52:51,351 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 7750, loss[loss=0.06275, simple_loss=0.07506, pruned_loss=0.01526, audio_tagging_loss=0.009958, over 15218.00 frames. ], tot_loss[loss=0.07876, simple_loss=0.09973, pruned_loss=0.01907, audio_tagging_loss=0.009824, over 3050800.81 frames. ], batch size: 61, lr: 4.90e-03, grad_scale: 32.0 2023-11-20 12:53:05,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1093780.0, ans=0.2 2023-11-20 12:53:14,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1093780.0, ans=0.125 2023-11-20 12:53:29,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1093913.3333333333, ans=0.125 2023-11-20 12:53:33,066 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.06 vs. limit=15.0 2023-11-20 12:53:44,128 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 164100 2023-11-20 12:53:55,666 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 7800, loss[loss=0.09944, simple_loss=0.1203, pruned_loss=0.02822, audio_tagging_loss=0.01108, over 15805.00 frames. ], tot_loss[loss=0.07897, simple_loss=0.09987, pruned_loss=0.01913, audio_tagging_loss=0.009904, over 3046924.07 frames. ], batch size: 58, lr: 4.90e-03, grad_scale: 32.0 2023-11-20 12:53:56,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1094046.6666666667, ans=0.2 2023-11-20 12:54:05,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1094046.6666666667, ans=0.1 2023-11-20 12:54:20,233 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.543e+01 8.289e+01 9.083e+01 1.010e+02 1.614e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-20 12:54:48,373 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 164150 2023-11-20 12:54:49,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1094313.3333333333, ans=0.0 2023-11-20 12:54:52,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=1094313.3333333333, ans=0.5 2023-11-20 12:54:59,881 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 7850, loss[loss=0.0931, simple_loss=0.1104, pruned_loss=0.02847, audio_tagging_loss=0.009437, over 15130.00 frames. ], tot_loss[loss=0.07893, simple_loss=0.09945, pruned_loss=0.01919, audio_tagging_loss=0.01001, over 3048581.71 frames. ], batch size: 56, lr: 4.90e-03, grad_scale: 32.0 2023-11-20 12:55:53,658 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 164200 2023-11-20 12:56:05,324 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 7900, loss[loss=0.07247, simple_loss=0.09576, pruned_loss=0.01286, audio_tagging_loss=0.01172, over 16305.00 frames. ], tot_loss[loss=0.07877, simple_loss=0.09905, pruned_loss=0.01906, audio_tagging_loss=0.01018, over 3054996.85 frames. ], batch size: 62, lr: 4.90e-03, grad_scale: 32.0 2023-11-20 12:56:11,767 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 12:56:18,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1094780.0, ans=0.125 2023-11-20 12:56:28,983 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.813e+01 8.207e+01 9.087e+01 9.691e+01 1.187e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-20 12:56:45,828 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.31 vs. limit=15.0 2023-11-20 12:56:58,008 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 164250 2023-11-20 12:57:09,132 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 7950, loss[loss=0.06385, simple_loss=0.07408, pruned_loss=0.01342, audio_tagging_loss=0.01339, over 15741.00 frames. ], tot_loss[loss=0.07858, simple_loss=0.0985, pruned_loss=0.01901, audio_tagging_loss=0.01032, over 3047023.08 frames. ], batch size: 60, lr: 4.90e-03, grad_scale: 32.0 2023-11-20 12:57:09,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1095046.6666666667, ans=0.0 2023-11-20 12:57:26,265 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 12:57:27,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1095113.3333333333, ans=0.2 2023-11-20 12:57:57,483 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.29 vs. limit=15.0 2023-11-20 12:58:02,379 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 164300 2023-11-20 12:58:07,595 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.56 vs. limit=15.0 2023-11-20 12:58:13,256 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 8000, loss[loss=0.08119, simple_loss=0.1, pruned_loss=0.01964, audio_tagging_loss=0.01155, over 15452.00 frames. ], tot_loss[loss=0.07842, simple_loss=0.09818, pruned_loss=0.01896, audio_tagging_loss=0.01037, over 3049652.03 frames. ], batch size: 57, lr: 4.90e-03, grad_scale: 32.0 2023-11-20 12:58:26,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1095446.6666666667, ans=0.125 2023-11-20 12:58:39,391 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.429e+01 8.050e+01 8.646e+01 9.454e+01 1.422e+02, threshold=1.729e+02, percent-clipped=0.0 2023-11-20 12:58:42,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1095513.3333333333, ans=0.125 2023-11-20 12:58:52,222 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.43 vs. limit=15.0 2023-11-20 12:58:55,672 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.72 vs. limit=15.0 2023-11-20 12:58:57,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1095580.0, ans=0.1 2023-11-20 12:59:01,561 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1095580.0, ans=0.0 2023-11-20 12:59:06,793 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 164350 2023-11-20 12:59:09,933 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.24 vs. limit=15.0 2023-11-20 12:59:17,721 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 8050, loss[loss=0.07446, simple_loss=0.09497, pruned_loss=0.01704, audio_tagging_loss=0.009936, over 16239.00 frames. ], tot_loss[loss=0.07821, simple_loss=0.09767, pruned_loss=0.01905, audio_tagging_loss=0.01033, over 3050344.57 frames. ], batch size: 63, lr: 4.90e-03, grad_scale: 16.0 2023-11-20 12:59:26,294 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.78 vs. limit=15.0 2023-11-20 12:59:30,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1095780.0, ans=0.0 2023-11-20 12:59:34,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=1095780.0, ans=0.2 2023-11-20 12:59:51,411 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.47 vs. limit=12.0 2023-11-20 12:59:55,467 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.03 vs. limit=15.0 2023-11-20 13:00:08,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=1095980.0, ans=0.125 2023-11-20 13:00:10,873 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 164400 2023-11-20 13:00:19,109 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.11 vs. limit=15.0 2023-11-20 13:00:20,417 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.96 vs. limit=15.0 2023-11-20 13:00:22,220 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 8100, loss[loss=0.05729, simple_loss=0.06695, pruned_loss=0.01228, audio_tagging_loss=0.01154, over 14999.00 frames. ], tot_loss[loss=0.07734, simple_loss=0.09654, pruned_loss=0.01876, audio_tagging_loss=0.01031, over 3050287.04 frames. ], batch size: 57, lr: 4.90e-03, grad_scale: 8.0 2023-11-20 13:00:44,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1096113.3333333333, ans=0.125 2023-11-20 13:00:48,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1096180.0, ans=0.125 2023-11-20 13:00:50,973 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.227e+01 8.681e+01 9.657e+01 1.040e+02 1.994e+02, threshold=1.931e+02, percent-clipped=2.0 2023-11-20 13:00:52,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1096180.0, ans=0.125 2023-11-20 13:01:15,647 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 164450 2023-11-20 13:01:22,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1096313.3333333333, ans=0.0 2023-11-20 13:01:26,631 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 8150, loss[loss=0.05641, simple_loss=0.07784, pruned_loss=0.01068, audio_tagging_loss=0.006817, over 14995.00 frames. ], tot_loss[loss=0.07764, simple_loss=0.09731, pruned_loss=0.01884, audio_tagging_loss=0.01014, over 3053531.20 frames. ], batch size: 58, lr: 4.90e-03, grad_scale: 8.0 2023-11-20 13:01:41,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1096446.6666666667, ans=0.125 2023-11-20 13:01:50,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1096446.6666666667, ans=0.2 2023-11-20 13:01:58,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1096513.3333333333, ans=0.125 2023-11-20 13:01:59,884 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.72 vs. limit=22.5 2023-11-20 13:02:00,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1096513.3333333333, ans=0.125 2023-11-20 13:02:02,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1096513.3333333333, ans=0.125 2023-11-20 13:02:19,694 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 164500 2023-11-20 13:02:26,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1096646.6666666667, ans=0.0 2023-11-20 13:02:31,741 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 8200, loss[loss=0.06603, simple_loss=0.07667, pruned_loss=0.01671, audio_tagging_loss=0.01099, over 15392.00 frames. ], tot_loss[loss=0.07792, simple_loss=0.0979, pruned_loss=0.01899, audio_tagging_loss=0.009982, over 3060906.19 frames. ], batch size: 57, lr: 4.90e-03, grad_scale: 8.0 2023-11-20 13:02:33,005 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 13:02:39,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1096713.3333333333, ans=0.0 2023-11-20 13:02:49,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1096780.0, ans=0.125 2023-11-20 13:03:00,014 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.486e+01 8.378e+01 8.960e+01 9.917e+01 5.109e+02, threshold=1.792e+02, percent-clipped=1.0 2023-11-20 13:03:03,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1096846.6666666667, ans=0.0 2023-11-20 13:03:09,349 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.57 vs. limit=12.0 2023-11-20 13:03:24,782 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 164550 2023-11-20 13:03:27,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1096980.0, ans=0.0 2023-11-20 13:03:29,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1096980.0, ans=0.09899494936611666 2023-11-20 13:03:35,859 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.32 vs. limit=6.0 2023-11-20 13:03:36,290 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 8250, loss[loss=0.07878, simple_loss=0.08998, pruned_loss=0.022, audio_tagging_loss=0.0118, over 14840.00 frames. ], tot_loss[loss=0.078, simple_loss=0.09799, pruned_loss=0.01905, audio_tagging_loss=0.009951, over 3055554.00 frames. ], batch size: 55, lr: 4.90e-03, grad_scale: 8.0 2023-11-20 13:03:37,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1097046.6666666667, ans=0.125 2023-11-20 13:03:51,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1097113.3333333333, ans=0.125 2023-11-20 13:04:26,998 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.75 vs. limit=15.0 2023-11-20 13:04:27,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=1097313.3333333333, ans=0.1 2023-11-20 13:04:28,666 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 164600 2023-11-20 13:04:38,528 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.65 vs. limit=15.0 2023-11-20 13:04:40,207 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 8300, loss[loss=0.07124, simple_loss=0.09686, pruned_loss=0.01298, audio_tagging_loss=0.009833, over 15730.00 frames. ], tot_loss[loss=0.0786, simple_loss=0.09914, pruned_loss=0.01913, audio_tagging_loss=0.0099, over 3058788.17 frames. ], batch size: 58, lr: 4.90e-03, grad_scale: 8.0 2023-11-20 13:04:45,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn1.whiten.whitening_limit, batch_count=1097380.0, ans=22.5 2023-11-20 13:04:49,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=1097380.0, ans=0.0 2023-11-20 13:04:58,723 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.80 vs. limit=15.0 2023-11-20 13:05:08,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1097513.3333333333, ans=0.125 2023-11-20 13:05:08,980 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.931e+01 8.247e+01 8.781e+01 9.736e+01 1.742e+02, threshold=1.756e+02, percent-clipped=0.0 2023-11-20 13:05:11,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1097513.3333333333, ans=0.125 2023-11-20 13:05:11,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1097513.3333333333, ans=0.2 2023-11-20 13:05:16,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1097513.3333333333, ans=0.125 2023-11-20 13:05:32,905 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 164650 2023-11-20 13:05:35,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1097646.6666666667, ans=0.0 2023-11-20 13:05:44,415 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 8350, loss[loss=0.06131, simple_loss=0.0711, pruned_loss=0.01398, audio_tagging_loss=0.01178, over 14801.00 frames. ], tot_loss[loss=0.07851, simple_loss=0.09904, pruned_loss=0.01913, audio_tagging_loss=0.009856, over 3056287.20 frames. ], batch size: 57, lr: 4.89e-03, grad_scale: 8.0 2023-11-20 13:05:50,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1097713.3333333333, ans=0.0 2023-11-20 13:06:22,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1097913.3333333333, ans=0.1 2023-11-20 13:06:36,777 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 164700 2023-11-20 13:06:46,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1097980.0, ans=0.0 2023-11-20 13:06:48,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=1098046.6666666667, ans=0.125 2023-11-20 13:06:49,003 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 8400, loss[loss=0.07667, simple_loss=0.09635, pruned_loss=0.02188, audio_tagging_loss=0.006608, over 14435.00 frames. ], tot_loss[loss=0.07887, simple_loss=0.09978, pruned_loss=0.01919, audio_tagging_loss=0.009796, over 3050699.57 frames. ], batch size: 55, lr: 4.89e-03, grad_scale: 16.0 2023-11-20 13:06:55,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1098046.6666666667, ans=0.1 2023-11-20 13:07:17,339 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.406e+01 7.789e+01 8.682e+01 9.296e+01 1.321e+02, threshold=1.736e+02, percent-clipped=0.0 2023-11-20 13:07:35,260 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.16 vs. limit=12.0 2023-11-20 13:07:41,783 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 164750 2023-11-20 13:07:53,349 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 8450, loss[loss=0.08068, simple_loss=0.1004, pruned_loss=0.0198, audio_tagging_loss=0.01067, over 15573.00 frames. ], tot_loss[loss=0.07867, simple_loss=0.09925, pruned_loss=0.01913, audio_tagging_loss=0.009916, over 3054550.59 frames. ], batch size: 58, lr: 4.89e-03, grad_scale: 16.0 2023-11-20 13:08:05,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1098446.6666666667, ans=0.125 2023-11-20 13:08:14,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1098446.6666666667, ans=0.0 2023-11-20 13:08:15,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1098446.6666666667, ans=0.125 2023-11-20 13:08:16,697 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.50 vs. limit=6.0 2023-11-20 13:08:23,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=1098513.3333333333, ans=0.0 2023-11-20 13:08:26,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1098513.3333333333, ans=0.125 2023-11-20 13:08:31,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1098580.0, ans=0.0 2023-11-20 13:08:36,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=1098580.0, ans=0.125 2023-11-20 13:08:46,027 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 164800 2023-11-20 13:08:47,663 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.69 vs. limit=10.0 2023-11-20 13:08:57,712 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 8500, loss[loss=0.08007, simple_loss=0.1012, pruned_loss=0.021, audio_tagging_loss=0.008492, over 15188.00 frames. ], tot_loss[loss=0.079, simple_loss=0.09972, pruned_loss=0.01924, audio_tagging_loss=0.009908, over 3061066.32 frames. ], batch size: 54, lr: 4.89e-03, grad_scale: 16.0 2023-11-20 13:08:58,321 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.70 vs. limit=6.0 2023-11-20 13:09:01,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1098713.3333333333, ans=0.1 2023-11-20 13:09:12,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=1098780.0, ans=0.2 2023-11-20 13:09:25,681 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.462e+01 8.485e+01 9.217e+01 9.923e+01 2.190e+02, threshold=1.843e+02, percent-clipped=1.0 2023-11-20 13:09:50,079 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 164850 2023-11-20 13:09:57,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1098980.0, ans=0.125 2023-11-20 13:10:02,321 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 8550, loss[loss=0.1071, simple_loss=0.1364, pruned_loss=0.03115, audio_tagging_loss=0.007738, over 15667.00 frames. ], tot_loss[loss=0.07932, simple_loss=0.1002, pruned_loss=0.01933, audio_tagging_loss=0.009885, over 3057562.08 frames. ], batch size: 58, lr: 4.89e-03, grad_scale: 16.0 2023-11-20 13:10:11,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1099046.6666666667, ans=0.125 2023-11-20 13:10:55,020 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 164900 2023-11-20 13:11:06,510 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 8600, loss[loss=0.06789, simple_loss=0.09182, pruned_loss=0.01273, audio_tagging_loss=0.009245, over 14441.00 frames. ], tot_loss[loss=0.07854, simple_loss=0.09891, pruned_loss=0.01909, audio_tagging_loss=0.01, over 3059149.09 frames. ], batch size: 53, lr: 4.89e-03, grad_scale: 16.0 2023-11-20 13:11:24,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1099446.6666666667, ans=0.125 2023-11-20 13:11:33,752 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.330e+01 8.002e+01 8.794e+01 9.640e+01 1.342e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-20 13:11:35,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1099513.3333333333, ans=0.07 2023-11-20 13:11:35,398 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.139e-01 2023-11-20 13:11:44,384 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.22 vs. limit=22.5 2023-11-20 13:11:55,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1099580.0, ans=0.125 2023-11-20 13:11:58,957 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 164950 2023-11-20 13:12:05,810 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.91 vs. limit=15.0 2023-11-20 13:12:10,616 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 8650, loss[loss=0.08636, simple_loss=0.1226, pruned_loss=0.01673, audio_tagging_loss=0.008332, over 15865.00 frames. ], tot_loss[loss=0.07932, simple_loss=0.1002, pruned_loss=0.01922, audio_tagging_loss=0.00999, over 3052803.80 frames. ], batch size: 59, lr: 4.89e-03, grad_scale: 16.0 2023-11-20 13:12:17,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1099713.3333333333, ans=0.0 2023-11-20 13:12:23,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=1099780.0, ans=0.2 2023-11-20 13:12:26,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=1099780.0, ans=0.0 2023-11-20 13:12:28,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1099780.0, ans=0.125 2023-11-20 13:12:44,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1099846.6666666667, ans=0.125 2023-11-20 13:12:59,428 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.51 vs. limit=22.5 2023-11-20 13:13:03,791 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 165000 2023-11-20 13:13:15,570 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 8700, loss[loss=0.104, simple_loss=0.1319, pruned_loss=0.03075, audio_tagging_loss=0.007253, over 15253.00 frames. ], tot_loss[loss=0.07972, simple_loss=0.1007, pruned_loss=0.01938, audio_tagging_loss=0.009999, over 3054861.71 frames. ], batch size: 57, lr: 4.89e-03, grad_scale: 16.0 2023-11-20 13:13:15,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1100046.6666666667, ans=0.0 2023-11-20 13:13:20,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1100046.6666666667, ans=0.0 2023-11-20 13:13:21,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1100046.6666666667, ans=0.125 2023-11-20 13:13:24,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1100046.6666666667, ans=0.05 2023-11-20 13:13:29,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1100113.3333333333, ans=0.1 2023-11-20 13:13:31,245 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.69 vs. limit=12.0 2023-11-20 13:13:44,001 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.858e+01 8.298e+01 8.856e+01 9.572e+01 1.265e+02, threshold=1.771e+02, percent-clipped=0.0 2023-11-20 13:13:58,564 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.93 vs. limit=15.0 2023-11-20 13:14:08,460 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 165050 2023-11-20 13:14:12,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1100313.3333333333, ans=0.0 2023-11-20 13:14:15,760 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.09 vs. limit=15.0 2023-11-20 13:14:20,653 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 8750, loss[loss=0.06103, simple_loss=0.06964, pruned_loss=0.01405, audio_tagging_loss=0.01216, over 13823.00 frames. ], tot_loss[loss=0.07964, simple_loss=0.1004, pruned_loss=0.01925, audio_tagging_loss=0.01017, over 3054626.85 frames. ], batch size: 54, lr: 4.89e-03, grad_scale: 16.0 2023-11-20 13:14:21,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1100380.0, ans=0.0 2023-11-20 13:14:24,815 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.75 vs. limit=15.0 2023-11-20 13:14:56,100 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 13:14:57,849 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.39 vs. limit=10.0 2023-11-20 13:15:07,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1100580.0, ans=0.2 2023-11-20 13:15:13,122 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 165100 2023-11-20 13:15:23,985 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 8800, loss[loss=0.05818, simple_loss=0.05883, pruned_loss=0.01145, audio_tagging_loss=0.01731, over 14216.00 frames. ], tot_loss[loss=0.08032, simple_loss=0.1014, pruned_loss=0.01943, audio_tagging_loss=0.01017, over 3053058.84 frames. ], batch size: 57, lr: 4.89e-03, grad_scale: 32.0 2023-11-20 13:15:28,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1100713.3333333333, ans=0.0 2023-11-20 13:15:32,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=1100713.3333333333, ans=0.125 2023-11-20 13:15:51,940 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.572e+01 8.177e+01 8.747e+01 9.582e+01 1.210e+02, threshold=1.749e+02, percent-clipped=0.0 2023-11-20 13:15:59,227 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.53 vs. limit=15.0 2023-11-20 13:16:16,739 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 165150 2023-11-20 13:16:18,524 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.69 vs. limit=15.0 2023-11-20 13:16:24,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1100980.0, ans=0.125 2023-11-20 13:16:27,592 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 8850, loss[loss=0.06503, simple_loss=0.08586, pruned_loss=0.01236, audio_tagging_loss=0.009736, over 16157.00 frames. ], tot_loss[loss=0.0802, simple_loss=0.1011, pruned_loss=0.01946, audio_tagging_loss=0.01021, over 3053925.34 frames. ], batch size: 59, lr: 4.89e-03, grad_scale: 32.0 2023-11-20 13:16:29,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1101046.6666666667, ans=0.125 2023-11-20 13:16:36,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1101046.6666666667, ans=0.125 2023-11-20 13:16:40,495 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 13:17:12,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=1101246.6666666667, ans=0.125 2023-11-20 13:17:21,017 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 165200 2023-11-20 13:17:32,330 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 8900, loss[loss=0.05341, simple_loss=0.0664, pruned_loss=0.01132, audio_tagging_loss=0.008896, over 14631.00 frames. ], tot_loss[loss=0.07969, simple_loss=0.1008, pruned_loss=0.01927, audio_tagging_loss=0.01004, over 3055089.07 frames. ], batch size: 57, lr: 4.89e-03, grad_scale: 32.0 2023-11-20 13:17:41,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1101380.0, ans=0.2 2023-11-20 13:18:00,910 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.758e+01 8.398e+01 8.901e+01 9.799e+01 1.599e+02, threshold=1.780e+02, percent-clipped=0.0 2023-11-20 13:18:07,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1101513.3333333333, ans=0.125 2023-11-20 13:18:11,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=1101580.0, ans=0.125 2023-11-20 13:18:23,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1101646.6666666667, ans=0.05 2023-11-20 13:18:25,630 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 165250 2023-11-20 13:18:30,452 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.04 vs. limit=15.0 2023-11-20 13:18:34,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=1101646.6666666667, ans=0.0 2023-11-20 13:18:37,243 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 8950, loss[loss=0.08785, simple_loss=0.1153, pruned_loss=0.02005, audio_tagging_loss=0.01017, over 16343.00 frames. ], tot_loss[loss=0.0795, simple_loss=0.1007, pruned_loss=0.01922, audio_tagging_loss=0.009928, over 3059221.57 frames. ], batch size: 58, lr: 4.89e-03, grad_scale: 32.0 2023-11-20 13:18:41,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1101713.3333333333, ans=0.125 2023-11-20 13:19:29,851 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 165300 2023-11-20 13:19:37,093 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.32 vs. limit=15.0 2023-11-20 13:19:41,533 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 9000, loss[loss=0.07072, simple_loss=0.09058, pruned_loss=0.01711, audio_tagging_loss=0.008313, over 16371.00 frames. ], tot_loss[loss=0.07915, simple_loss=0.1003, pruned_loss=0.01913, audio_tagging_loss=0.009889, over 3060497.63 frames. ], batch size: 62, lr: 4.88e-03, grad_scale: 32.0 2023-11-20 13:19:41,533 INFO [train_asr.py:1285] (1/4) Computing validation loss 2023-11-20 13:20:16,734 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.2017, 4.0287, 4.3782, 4.4118], device='cuda:1') 2023-11-20 13:20:23,240 INFO [train_asr.py:1294] (1/4) Epoch 14, validation: loss=0.06237, simple_loss=0.05346, pruned_loss=0.005661, audio_tagging_loss=0.02998, over 4681554.00 frames. 2023-11-20 13:20:23,241 INFO [train_asr.py:1295] (1/4) Maximum memory allocated so far is 26082MB 2023-11-20 13:20:23,613 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 13:20:38,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1102113.3333333333, ans=0.0 2023-11-20 13:20:45,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1102113.3333333333, ans=0.1 2023-11-20 13:20:50,532 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.01 vs. limit=10.0 2023-11-20 13:20:50,973 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.532e+01 8.335e+01 8.996e+01 9.763e+01 1.376e+02, threshold=1.799e+02, percent-clipped=0.0 2023-11-20 13:21:01,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1102246.6666666667, ans=0.1 2023-11-20 13:21:12,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1102246.6666666667, ans=0.1 2023-11-20 13:21:16,378 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 165350 2023-11-20 13:21:26,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1102380.0, ans=0.1 2023-11-20 13:21:27,313 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 9050, loss[loss=0.06384, simple_loss=0.07554, pruned_loss=0.01236, audio_tagging_loss=0.01372, over 14527.00 frames. ], tot_loss[loss=0.07869, simple_loss=0.09986, pruned_loss=0.01903, audio_tagging_loss=0.009735, over 3060152.96 frames. ], batch size: 57, lr: 4.88e-03, grad_scale: 32.0 2023-11-20 13:21:45,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=1102446.6666666667, ans=0.125 2023-11-20 13:21:53,786 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.44 vs. limit=15.0 2023-11-20 13:22:00,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1102513.3333333333, ans=0.125 2023-11-20 13:22:20,858 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 165400 2023-11-20 13:22:22,491 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.86 vs. limit=15.0 2023-11-20 13:22:32,182 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 9100, loss[loss=0.07431, simple_loss=0.09938, pruned_loss=0.01437, audio_tagging_loss=0.01025, over 15630.00 frames. ], tot_loss[loss=0.07951, simple_loss=0.101, pruned_loss=0.01926, audio_tagging_loss=0.009727, over 3059397.04 frames. ], batch size: 57, lr: 4.88e-03, grad_scale: 32.0 2023-11-20 13:22:40,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1102713.3333333333, ans=0.0 2023-11-20 13:22:53,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1102780.0, ans=0.0 2023-11-20 13:23:01,054 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.597e+01 8.047e+01 8.709e+01 9.562e+01 1.643e+02, threshold=1.742e+02, percent-clipped=0.0 2023-11-20 13:23:06,258 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 13:23:25,763 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 165450 2023-11-20 13:23:36,767 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 9150, loss[loss=0.06387, simple_loss=0.07324, pruned_loss=0.01608, audio_tagging_loss=0.01117, over 15331.00 frames. ], tot_loss[loss=0.07805, simple_loss=0.09861, pruned_loss=0.01883, audio_tagging_loss=0.009921, over 3058644.77 frames. ], batch size: 61, lr: 4.88e-03, grad_scale: 32.0 2023-11-20 13:23:47,869 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.01 vs. limit=15.0 2023-11-20 13:23:53,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1103113.3333333333, ans=0.125 2023-11-20 13:24:08,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1103180.0, ans=0.125 2023-11-20 13:24:18,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1103246.6666666667, ans=0.125 2023-11-20 13:24:29,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1103313.3333333333, ans=0.0 2023-11-20 13:24:30,178 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 165500 2023-11-20 13:24:34,476 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.07 vs. limit=22.5 2023-11-20 13:24:39,939 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.71 vs. limit=15.0 2023-11-20 13:24:41,867 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 9200, loss[loss=0.07515, simple_loss=0.09129, pruned_loss=0.01727, audio_tagging_loss=0.01224, over 15044.00 frames. ], tot_loss[loss=0.07812, simple_loss=0.09864, pruned_loss=0.0189, audio_tagging_loss=0.009899, over 3058221.71 frames. ], batch size: 58, lr: 4.88e-03, grad_scale: 32.0 2023-11-20 13:25:06,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1103513.3333333333, ans=0.0 2023-11-20 13:25:09,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1103513.3333333333, ans=0.0 2023-11-20 13:25:10,019 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.657e+01 8.175e+01 8.950e+01 9.913e+01 1.287e+02, threshold=1.790e+02, percent-clipped=0.0 2023-11-20 13:25:11,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1103513.3333333333, ans=0.125 2023-11-20 13:25:28,295 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.84 vs. limit=15.0 2023-11-20 13:25:34,561 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 165550 2023-11-20 13:25:45,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1103713.3333333333, ans=0.0 2023-11-20 13:25:46,406 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 9250, loss[loss=0.06729, simple_loss=0.08363, pruned_loss=0.01401, audio_tagging_loss=0.01146, over 14396.00 frames. ], tot_loss[loss=0.07813, simple_loss=0.09855, pruned_loss=0.01899, audio_tagging_loss=0.009867, over 3055235.97 frames. ], batch size: 56, lr: 4.88e-03, grad_scale: 16.0 2023-11-20 13:25:49,481 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.01 vs. limit=15.0 2023-11-20 13:25:50,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=1103713.3333333333, ans=0.125 2023-11-20 13:26:01,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1103780.0, ans=0.125 2023-11-20 13:26:12,610 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.44 vs. limit=15.0 2023-11-20 13:26:18,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1103846.6666666667, ans=0.125 2023-11-20 13:26:24,386 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1103913.3333333333, ans=0.0 2023-11-20 13:26:28,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1103913.3333333333, ans=0.125 2023-11-20 13:26:29,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1103913.3333333333, ans=0.1 2023-11-20 13:26:38,989 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 165600 2023-11-20 13:26:50,982 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 9300, loss[loss=0.0644, simple_loss=0.08404, pruned_loss=0.01169, audio_tagging_loss=0.01069, over 15611.00 frames. ], tot_loss[loss=0.07827, simple_loss=0.09841, pruned_loss=0.01905, audio_tagging_loss=0.01002, over 3050393.77 frames. ], batch size: 59, lr: 4.88e-03, grad_scale: 16.0 2023-11-20 13:27:17,103 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.23 vs. limit=15.0 2023-11-20 13:27:21,186 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.442e+01 7.830e+01 8.462e+01 9.599e+01 1.223e+02, threshold=1.692e+02, percent-clipped=0.0 2023-11-20 13:27:23,125 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.72 vs. limit=22.5 2023-11-20 13:27:44,376 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 165650 2023-11-20 13:27:48,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1104313.3333333333, ans=0.125 2023-11-20 13:27:55,247 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 9350, loss[loss=0.08395, simple_loss=0.09815, pruned_loss=0.02505, audio_tagging_loss=0.009828, over 14397.00 frames. ], tot_loss[loss=0.07838, simple_loss=0.09858, pruned_loss=0.01906, audio_tagging_loss=0.01003, over 3052803.90 frames. ], batch size: 56, lr: 4.88e-03, grad_scale: 16.0 2023-11-20 13:28:36,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1104580.0, ans=0.2 2023-11-20 13:28:40,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=1104580.0, ans=0.0 2023-11-20 13:28:45,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1104646.6666666667, ans=0.125 2023-11-20 13:28:47,689 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 165700 2023-11-20 13:28:59,933 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 9400, loss[loss=0.07563, simple_loss=0.09724, pruned_loss=0.01816, audio_tagging_loss=0.008848, over 14493.00 frames. ], tot_loss[loss=0.0787, simple_loss=0.09894, pruned_loss=0.0192, audio_tagging_loss=0.01003, over 3047627.11 frames. ], batch size: 54, lr: 4.88e-03, grad_scale: 16.0 2023-11-20 13:29:20,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=1104780.0, ans=0.0 2023-11-20 13:29:29,610 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.579e+01 8.038e+01 8.701e+01 9.410e+01 1.188e+02, threshold=1.740e+02, percent-clipped=0.0 2023-11-20 13:29:45,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1104913.3333333333, ans=0.0 2023-11-20 13:29:49,884 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.32 vs. limit=15.0 2023-11-20 13:29:52,930 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 165750 2023-11-20 13:30:02,338 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 13:30:04,900 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 9450, loss[loss=0.07398, simple_loss=0.08926, pruned_loss=0.01846, audio_tagging_loss=0.01089, over 14264.00 frames. ], tot_loss[loss=0.07918, simple_loss=0.09947, pruned_loss=0.01941, audio_tagging_loss=0.01003, over 3052858.75 frames. ], batch size: 53, lr: 4.88e-03, grad_scale: 16.0 2023-11-20 13:30:06,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1105046.6666666667, ans=0.0 2023-11-20 13:30:09,325 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.08 vs. limit=6.0 2023-11-20 13:30:20,808 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.20 vs. limit=15.0 2023-11-20 13:30:32,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1105180.0, ans=0.05 2023-11-20 13:30:57,422 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 165800 2023-11-20 13:31:08,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1105380.0, ans=0.125 2023-11-20 13:31:09,033 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 9500, loss[loss=0.05843, simple_loss=0.07038, pruned_loss=0.008865, audio_tagging_loss=0.01437, over 14738.00 frames. ], tot_loss[loss=0.07928, simple_loss=0.09961, pruned_loss=0.01937, audio_tagging_loss=0.0101, over 3052562.01 frames. ], batch size: 55, lr: 4.88e-03, grad_scale: 16.0 2023-11-20 13:31:12,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1105380.0, ans=0.125 2023-11-20 13:31:16,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1105380.0, ans=0.125 2023-11-20 13:31:39,024 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.072e+01 8.167e+01 8.714e+01 9.690e+01 1.183e+02, threshold=1.743e+02, percent-clipped=0.0 2023-11-20 13:31:55,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1105580.0, ans=0.1 2023-11-20 13:32:01,591 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 165850 2023-11-20 13:32:13,247 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 9550, loss[loss=0.07392, simple_loss=0.09742, pruned_loss=0.01634, audio_tagging_loss=0.008865, over 14443.00 frames. ], tot_loss[loss=0.07887, simple_loss=0.09924, pruned_loss=0.01916, audio_tagging_loss=0.01009, over 3046840.70 frames. ], batch size: 54, lr: 4.88e-03, grad_scale: 16.0 2023-11-20 13:32:14,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1105713.3333333333, ans=0.0 2023-11-20 13:32:46,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1105846.6666666667, ans=0.0 2023-11-20 13:32:49,847 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.95 vs. limit=6.0 2023-11-20 13:33:06,223 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 165900 2023-11-20 13:33:07,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=1105980.0, ans=0.125 2023-11-20 13:33:17,800 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 9600, loss[loss=0.08307, simple_loss=0.1019, pruned_loss=0.02195, audio_tagging_loss=0.01015, over 16227.00 frames. ], tot_loss[loss=0.07927, simple_loss=0.09943, pruned_loss=0.01938, audio_tagging_loss=0.01018, over 3048869.64 frames. ], batch size: 58, lr: 4.88e-03, grad_scale: 32.0 2023-11-20 13:33:27,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1106046.6666666667, ans=0.125 2023-11-20 13:33:27,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1106046.6666666667, ans=0.125 2023-11-20 13:33:44,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1106180.0, ans=0.1 2023-11-20 13:33:48,241 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.832e+01 8.385e+01 9.134e+01 1.022e+02 1.365e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-20 13:33:54,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1106246.6666666667, ans=0.2 2023-11-20 13:34:10,112 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 165950 2023-11-20 13:34:15,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1106313.3333333333, ans=0.125 2023-11-20 13:34:18,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1106313.3333333333, ans=0.025 2023-11-20 13:34:20,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1106380.0, ans=0.125 2023-11-20 13:34:21,506 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 9650, loss[loss=0.07877, simple_loss=0.09254, pruned_loss=0.0174, audio_tagging_loss=0.0151, over 14854.00 frames. ], tot_loss[loss=0.07964, simple_loss=0.1001, pruned_loss=0.01937, audio_tagging_loss=0.01024, over 3050172.88 frames. ], batch size: 55, lr: 4.88e-03, grad_scale: 16.0 2023-11-20 13:34:21,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1106380.0, ans=0.0 2023-11-20 13:34:23,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=1106380.0, ans=0.0 2023-11-20 13:34:34,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1106446.6666666667, ans=0.0 2023-11-20 13:35:04,877 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.01 vs. limit=15.0 2023-11-20 13:35:08,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1106580.0, ans=0.09899494936611666 2023-11-20 13:35:08,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1106580.0, ans=0.0 2023-11-20 13:35:14,148 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 166000 2023-11-20 13:35:20,697 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 13:35:23,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1106646.6666666667, ans=0.125 2023-11-20 13:35:25,838 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 9700, loss[loss=0.08737, simple_loss=0.1133, pruned_loss=0.02178, audio_tagging_loss=0.008961, over 14855.00 frames. ], tot_loss[loss=0.07963, simple_loss=0.1004, pruned_loss=0.01931, audio_tagging_loss=0.01013, over 3041860.48 frames. ], batch size: 57, lr: 4.87e-03, grad_scale: 16.0 2023-11-20 13:35:38,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1106780.0, ans=0.0 2023-11-20 13:35:41,604 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.20 vs. limit=12.0 2023-11-20 13:35:57,239 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.187e+01 8.103e+01 9.034e+01 9.824e+01 1.276e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-20 13:36:18,843 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 166050 2023-11-20 13:36:25,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1106980.0, ans=0.0 2023-11-20 13:36:31,006 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 9750, loss[loss=0.08147, simple_loss=0.09957, pruned_loss=0.01901, audio_tagging_loss=0.01267, over 15016.00 frames. ], tot_loss[loss=0.07977, simple_loss=0.1007, pruned_loss=0.01945, audio_tagging_loss=0.009991, over 3049792.34 frames. ], batch size: 57, lr: 4.87e-03, grad_scale: 16.0 2023-11-20 13:36:38,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1107046.6666666667, ans=0.0 2023-11-20 13:36:39,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1107046.6666666667, ans=0.125 2023-11-20 13:37:03,772 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.07 vs. limit=6.0 2023-11-20 13:37:04,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1107180.0, ans=0.125 2023-11-20 13:37:07,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1107180.0, ans=0.125 2023-11-20 13:37:24,255 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 166100 2023-11-20 13:37:26,206 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.23 vs. limit=12.0 2023-11-20 13:37:29,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=1107313.3333333333, ans=0.1 2023-11-20 13:37:30,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1107313.3333333333, ans=0.1 2023-11-20 13:37:35,925 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 9800, loss[loss=0.07311, simple_loss=0.08411, pruned_loss=0.01625, audio_tagging_loss=0.01481, over 14282.00 frames. ], tot_loss[loss=0.07929, simple_loss=0.1003, pruned_loss=0.01923, audio_tagging_loss=0.009911, over 3048322.46 frames. ], batch size: 56, lr: 4.87e-03, grad_scale: 8.0 2023-11-20 13:37:39,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1107380.0, ans=0.125 2023-11-20 13:37:54,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=1107446.6666666667, ans=0.125 2023-11-20 13:37:56,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1107446.6666666667, ans=0.125 2023-11-20 13:37:57,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1107446.6666666667, ans=0.2 2023-11-20 13:38:01,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=1107513.3333333333, ans=0.05 2023-11-20 13:38:07,940 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.608e+01 8.297e+01 9.086e+01 9.730e+01 1.369e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-20 13:38:10,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1107513.3333333333, ans=0.125 2023-11-20 13:38:14,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1107580.0, ans=0.0 2023-11-20 13:38:28,901 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 166150 2023-11-20 13:38:30,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1107646.6666666667, ans=0.125 2023-11-20 13:38:32,568 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 13:38:32,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1107646.6666666667, ans=0.125 2023-11-20 13:38:39,982 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.87 vs. limit=12.0 2023-11-20 13:38:40,587 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 9850, loss[loss=0.05088, simple_loss=0.06558, pruned_loss=0.009861, audio_tagging_loss=0.008227, over 15063.00 frames. ], tot_loss[loss=0.07935, simple_loss=0.1007, pruned_loss=0.01922, audio_tagging_loss=0.009778, over 3053347.17 frames. ], batch size: 60, lr: 4.87e-03, grad_scale: 8.0 2023-11-20 13:39:01,750 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 13:39:14,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1107846.6666666667, ans=0.125 2023-11-20 13:39:17,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1107846.6666666667, ans=0.125 2023-11-20 13:39:23,351 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=1107913.3333333333, ans=0.0 2023-11-20 13:39:27,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1107913.3333333333, ans=0.2 2023-11-20 13:39:33,739 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 166200 2023-11-20 13:39:35,504 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.53 vs. limit=6.0 2023-11-20 13:39:36,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1107980.0, ans=0.125 2023-11-20 13:39:45,718 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 9900, loss[loss=0.08356, simple_loss=0.1117, pruned_loss=0.01921, audio_tagging_loss=0.00852, over 15586.00 frames. ], tot_loss[loss=0.07875, simple_loss=0.09976, pruned_loss=0.01908, audio_tagging_loss=0.009793, over 3056365.91 frames. ], batch size: 55, lr: 4.87e-03, grad_scale: 8.0 2023-11-20 13:39:54,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=1108046.6666666667, ans=0.0 2023-11-20 13:40:05,868 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1108113.3333333333, ans=0.125 2023-11-20 13:40:09,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1108113.3333333333, ans=0.2 2023-11-20 13:40:18,345 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.742e+01 8.087e+01 8.695e+01 9.650e+01 1.416e+02, threshold=1.739e+02, percent-clipped=0.0 2023-11-20 13:40:18,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=1108180.0, ans=0.125 2023-11-20 13:40:21,045 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 13:40:25,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1108246.6666666667, ans=0.0 2023-11-20 13:40:38,804 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 166250 2023-11-20 13:40:51,309 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 9950, loss[loss=0.06187, simple_loss=0.077, pruned_loss=0.01139, audio_tagging_loss=0.01198, over 14308.00 frames. ], tot_loss[loss=0.07855, simple_loss=0.09945, pruned_loss=0.01907, audio_tagging_loss=0.00975, over 3051788.53 frames. ], batch size: 56, lr: 4.87e-03, grad_scale: 8.0 2023-11-20 13:41:00,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1108380.0, ans=0.125 2023-11-20 13:41:01,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=1108380.0, ans=0.0 2023-11-20 13:41:02,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1108446.6666666667, ans=0.035 2023-11-20 13:41:19,353 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 13:41:37,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1108580.0, ans=0.125 2023-11-20 13:41:44,185 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 166300 2023-11-20 13:41:54,993 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 10000, loss[loss=0.1001, simple_loss=0.1392, pruned_loss=0.02333, audio_tagging_loss=0.007147, over 15713.00 frames. ], tot_loss[loss=0.0784, simple_loss=0.09937, pruned_loss=0.019, audio_tagging_loss=0.009716, over 3049925.20 frames. ], batch size: 56, lr: 4.87e-03, grad_scale: 8.0 2023-11-20 13:42:00,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1108713.3333333333, ans=0.125 2023-11-20 13:42:29,460 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.616e+01 8.105e+01 8.776e+01 9.451e+01 1.209e+02, threshold=1.755e+02, percent-clipped=0.0 2023-11-20 13:42:29,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1108846.6666666667, ans=0.2 2023-11-20 13:42:31,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1108846.6666666667, ans=0.125 2023-11-20 13:42:34,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1108913.3333333333, ans=0.125 2023-11-20 13:42:38,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1108913.3333333333, ans=0.07 2023-11-20 13:42:43,445 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.16 vs. limit=12.0 2023-11-20 13:42:47,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=1108980.0, ans=0.04949747468305833 2023-11-20 13:42:48,544 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 166350 2023-11-20 13:42:59,232 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 10050, loss[loss=0.05728, simple_loss=0.06532, pruned_loss=0.01139, audio_tagging_loss=0.01323, over 14770.00 frames. ], tot_loss[loss=0.07811, simple_loss=0.0989, pruned_loss=0.01886, audio_tagging_loss=0.009801, over 3050904.31 frames. ], batch size: 57, lr: 4.87e-03, grad_scale: 8.0 2023-11-20 13:43:07,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1109046.6666666667, ans=0.125 2023-11-20 13:43:12,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1109113.3333333333, ans=0.1 2023-11-20 13:43:26,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1109180.0, ans=0.1 2023-11-20 13:43:35,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1109180.0, ans=0.0 2023-11-20 13:43:39,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1109246.6666666667, ans=0.125 2023-11-20 13:43:52,156 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 166400 2023-11-20 13:44:01,434 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.51 vs. limit=15.0 2023-11-20 13:44:03,879 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 10100, loss[loss=0.06814, simple_loss=0.08689, pruned_loss=0.01534, audio_tagging_loss=0.009356, over 15024.00 frames. ], tot_loss[loss=0.07749, simple_loss=0.09761, pruned_loss=0.0188, audio_tagging_loss=0.009888, over 3044526.33 frames. ], batch size: 55, lr: 4.87e-03, grad_scale: 8.0 2023-11-20 13:44:09,029 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.93 vs. limit=22.5 2023-11-20 13:44:12,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1109380.0, ans=0.125 2023-11-20 13:44:37,153 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.486e+01 8.086e+01 8.697e+01 9.512e+01 1.226e+02, threshold=1.739e+02, percent-clipped=0.0 2023-11-20 13:44:44,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=1109580.0, ans=0.0 2023-11-20 13:44:56,186 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 13:44:57,521 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 166450 2023-11-20 13:45:08,323 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 10150, loss[loss=0.07735, simple_loss=0.09766, pruned_loss=0.01916, audio_tagging_loss=0.009363, over 15169.00 frames. ], tot_loss[loss=0.0781, simple_loss=0.09849, pruned_loss=0.01894, audio_tagging_loss=0.009905, over 3047372.34 frames. ], batch size: 57, lr: 4.87e-03, grad_scale: 8.0 2023-11-20 13:45:08,908 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.60 vs. limit=15.0 2023-11-20 13:45:12,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1109713.3333333333, ans=0.125 2023-11-20 13:45:12,449 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.66 vs. limit=15.0 2023-11-20 13:45:14,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=1109713.3333333333, ans=0.0 2023-11-20 13:45:27,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1109780.0, ans=0.0 2023-11-20 13:45:37,504 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 13:45:52,829 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.69 vs. limit=15.0 2023-11-20 13:45:56,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1109913.3333333333, ans=0.125 2023-11-20 13:46:00,858 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 166500 2023-11-20 13:46:04,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1109980.0, ans=0.125 2023-11-20 13:46:04,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1109980.0, ans=0.1 2023-11-20 13:46:08,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=1109980.0, ans=0.05 2023-11-20 13:46:12,418 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 10200, loss[loss=0.08845, simple_loss=0.1116, pruned_loss=0.02315, audio_tagging_loss=0.009494, over 15046.00 frames. ], tot_loss[loss=0.07898, simple_loss=0.09943, pruned_loss=0.01939, audio_tagging_loss=0.009865, over 3054254.28 frames. ], batch size: 57, lr: 4.87e-03, grad_scale: 8.0 2023-11-20 13:46:26,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1110113.3333333333, ans=0.0 2023-11-20 13:46:36,383 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 13:46:41,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1110180.0, ans=0.125 2023-11-20 13:46:46,110 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.930e+01 8.131e+01 8.850e+01 9.665e+01 1.277e+02, threshold=1.770e+02, percent-clipped=0.0 2023-11-20 13:46:56,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1110246.6666666667, ans=0.125 2023-11-20 13:46:59,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1110246.6666666667, ans=0.0 2023-11-20 13:47:05,114 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 166550 2023-11-20 13:47:16,463 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 10250, loss[loss=0.09332, simple_loss=0.1146, pruned_loss=0.02446, audio_tagging_loss=0.01157, over 15455.00 frames. ], tot_loss[loss=0.07993, simple_loss=0.1003, pruned_loss=0.01982, audio_tagging_loss=0.009949, over 3055043.01 frames. ], batch size: 60, lr: 4.87e-03, grad_scale: 8.0 2023-11-20 13:47:18,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=1110380.0, ans=0.05 2023-11-20 13:47:19,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=1110380.0, ans=0.2 2023-11-20 13:47:34,528 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.30 vs. limit=6.0 2023-11-20 13:47:40,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1110446.6666666667, ans=0.125 2023-11-20 13:47:57,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1110580.0, ans=0.0 2023-11-20 13:48:09,961 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 166600 2023-11-20 13:48:21,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1110713.3333333333, ans=0.0 2023-11-20 13:48:21,984 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 10300, loss[loss=0.05865, simple_loss=0.06286, pruned_loss=0.01502, audio_tagging_loss=0.0122, over 16185.00 frames. ], tot_loss[loss=0.08009, simple_loss=0.1003, pruned_loss=0.01995, audio_tagging_loss=0.01001, over 3055772.80 frames. ], batch size: 63, lr: 4.87e-03, grad_scale: 8.0 2023-11-20 13:48:40,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1110780.0, ans=0.0 2023-11-20 13:48:54,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1110846.6666666667, ans=0.0 2023-11-20 13:48:55,301 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.311e+01 8.084e+01 8.693e+01 9.702e+01 1.335e+02, threshold=1.739e+02, percent-clipped=0.0 2023-11-20 13:48:56,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1110846.6666666667, ans=0.125 2023-11-20 13:49:11,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1110913.3333333333, ans=0.5 2023-11-20 13:49:15,221 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 166650 2023-11-20 13:49:21,123 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.44 vs. limit=12.0 2023-11-20 13:49:26,757 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 10350, loss[loss=0.08378, simple_loss=0.1146, pruned_loss=0.01747, audio_tagging_loss=0.009015, over 15950.00 frames. ], tot_loss[loss=0.07999, simple_loss=0.1003, pruned_loss=0.0198, audio_tagging_loss=0.01002, over 3054268.10 frames. ], batch size: 57, lr: 4.86e-03, grad_scale: 8.0 2023-11-20 13:49:27,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1111046.6666666667, ans=0.05 2023-11-20 13:49:28,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1111046.6666666667, ans=0.05 2023-11-20 13:49:44,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1111113.3333333333, ans=0.0 2023-11-20 13:49:46,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=1111113.3333333333, ans=0.0 2023-11-20 13:50:19,492 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 166700 2023-11-20 13:50:31,165 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 10400, loss[loss=0.07356, simple_loss=0.09203, pruned_loss=0.01918, audio_tagging_loss=0.008367, over 16169.00 frames. ], tot_loss[loss=0.08022, simple_loss=0.1007, pruned_loss=0.01976, audio_tagging_loss=0.01009, over 3044880.87 frames. ], batch size: 60, lr: 4.86e-03, grad_scale: 16.0 2023-11-20 13:50:39,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=1111380.0, ans=0.07 2023-11-20 13:50:40,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1111380.0, ans=0.0 2023-11-20 13:50:40,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1111380.0, ans=0.125 2023-11-20 13:50:44,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1111446.6666666667, ans=0.04949747468305833 2023-11-20 13:51:02,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1111513.3333333333, ans=0.0 2023-11-20 13:51:05,050 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.857e+01 8.019e+01 8.655e+01 9.452e+01 1.304e+02, threshold=1.731e+02, percent-clipped=0.0 2023-11-20 13:51:14,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=1111580.0, ans=0.02 2023-11-20 13:51:24,391 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 166750 2023-11-20 13:51:36,019 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 10450, loss[loss=0.07225, simple_loss=0.0932, pruned_loss=0.01646, audio_tagging_loss=0.009184, over 14182.00 frames. ], tot_loss[loss=0.07937, simple_loss=0.09991, pruned_loss=0.01943, audio_tagging_loss=0.009982, over 3046433.75 frames. ], batch size: 55, lr: 4.86e-03, grad_scale: 16.0 2023-11-20 13:52:05,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1111846.6666666667, ans=0.125 2023-11-20 13:52:05,538 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.25 vs. limit=15.0 2023-11-20 13:52:12,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1111846.6666666667, ans=0.0 2023-11-20 13:52:22,447 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.88 vs. limit=15.0 2023-11-20 13:52:29,651 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 166800 2023-11-20 13:52:35,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1111980.0, ans=0.125 2023-11-20 13:52:39,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=1111980.0, ans=0.125 2023-11-20 13:52:40,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1112046.6666666667, ans=0.125 2023-11-20 13:52:41,518 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 10500, loss[loss=0.08244, simple_loss=0.1084, pruned_loss=0.02015, audio_tagging_loss=0.008086, over 16556.00 frames. ], tot_loss[loss=0.0796, simple_loss=0.1004, pruned_loss=0.0196, audio_tagging_loss=0.009798, over 3047683.63 frames. ], batch size: 63, lr: 4.86e-03, grad_scale: 16.0 2023-11-20 13:52:44,797 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.12 vs. limit=22.5 2023-11-20 13:52:48,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1112046.6666666667, ans=0.07 2023-11-20 13:53:14,867 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.848e+01 8.112e+01 8.724e+01 9.287e+01 1.188e+02, threshold=1.745e+02, percent-clipped=0.0 2023-11-20 13:53:33,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1112313.3333333333, ans=0.0 2023-11-20 13:53:34,582 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 166850 2023-11-20 13:53:45,953 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 10550, loss[loss=0.05576, simple_loss=0.06487, pruned_loss=0.01146, audio_tagging_loss=0.01186, over 16154.00 frames. ], tot_loss[loss=0.07853, simple_loss=0.09909, pruned_loss=0.01918, audio_tagging_loss=0.009809, over 3048709.32 frames. ], batch size: 63, lr: 4.86e-03, grad_scale: 16.0 2023-11-20 13:54:21,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=1112513.3333333333, ans=0.2 2023-11-20 13:54:23,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=1112580.0, ans=0.025 2023-11-20 13:54:29,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1112580.0, ans=0.125 2023-11-20 13:54:38,996 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 166900 2023-11-20 13:54:48,922 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.38 vs. limit=12.0 2023-11-20 13:54:50,577 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 10600, loss[loss=0.09914, simple_loss=0.1223, pruned_loss=0.02858, audio_tagging_loss=0.009423, over 15651.00 frames. ], tot_loss[loss=0.07918, simple_loss=0.1001, pruned_loss=0.01941, audio_tagging_loss=0.009733, over 3046579.76 frames. ], batch size: 57, lr: 4.86e-03, grad_scale: 16.0 2023-11-20 13:54:53,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1112713.3333333333, ans=0.0 2023-11-20 13:54:54,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1112713.3333333333, ans=0.125 2023-11-20 13:54:59,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1112713.3333333333, ans=0.125 2023-11-20 13:55:09,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=1112780.0, ans=0.125 2023-11-20 13:55:18,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1112846.6666666667, ans=0.125 2023-11-20 13:55:21,262 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.30 vs. limit=15.0 2023-11-20 13:55:24,051 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.162e+01 8.206e+01 8.903e+01 9.867e+01 1.464e+02, threshold=1.781e+02, percent-clipped=0.0 2023-11-20 13:55:28,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1112913.3333333333, ans=0.125 2023-11-20 13:55:43,418 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 166950 2023-11-20 13:55:55,855 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 10650, loss[loss=0.08148, simple_loss=0.1081, pruned_loss=0.01849, audio_tagging_loss=0.008935, over 15721.00 frames. ], tot_loss[loss=0.07956, simple_loss=0.1006, pruned_loss=0.0196, audio_tagging_loss=0.009659, over 3043879.64 frames. ], batch size: 58, lr: 4.86e-03, grad_scale: 16.0 2023-11-20 13:55:59,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1113046.6666666667, ans=0.2 2023-11-20 13:56:14,289 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.47 vs. limit=15.0 2023-11-20 13:56:40,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=1113246.6666666667, ans=0.5 2023-11-20 13:56:46,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1113313.3333333333, ans=0.0 2023-11-20 13:56:48,708 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 167000 2023-11-20 13:57:00,511 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 10700, loss[loss=0.07383, simple_loss=0.09225, pruned_loss=0.01626, audio_tagging_loss=0.01144, over 14540.00 frames. ], tot_loss[loss=0.07954, simple_loss=0.1007, pruned_loss=0.01957, audio_tagging_loss=0.009633, over 3042610.04 frames. ], batch size: 53, lr: 4.86e-03, grad_scale: 16.0 2023-11-20 13:57:04,805 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.33 vs. limit=12.0 2023-11-20 13:57:06,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1113380.0, ans=0.125 2023-11-20 13:57:19,243 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.93 vs. limit=22.5 2023-11-20 13:57:30,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1113513.3333333333, ans=0.0 2023-11-20 13:57:34,215 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.075e+01 8.061e+01 8.803e+01 9.456e+01 1.141e+02, threshold=1.761e+02, percent-clipped=0.0 2023-11-20 13:57:43,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=1113580.0, ans=0.125 2023-11-20 13:57:46,263 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.45 vs. limit=15.0 2023-11-20 13:57:53,751 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 167050 2023-11-20 13:57:57,922 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.58 vs. limit=15.0 2023-11-20 13:58:05,328 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 10750, loss[loss=0.08452, simple_loss=0.114, pruned_loss=0.02085, audio_tagging_loss=0.006649, over 15318.00 frames. ], tot_loss[loss=0.07924, simple_loss=0.1001, pruned_loss=0.01951, audio_tagging_loss=0.009682, over 3050494.03 frames. ], batch size: 58, lr: 4.86e-03, grad_scale: 16.0 2023-11-20 13:58:19,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=1113780.0, ans=0.0 2023-11-20 13:58:25,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1113780.0, ans=0.2 2023-11-20 13:58:26,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1113780.0, ans=0.1 2023-11-20 13:58:46,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=1113913.3333333333, ans=0.2 2023-11-20 13:58:49,841 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.36 vs. limit=15.0 2023-11-20 13:58:57,865 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 167100 2023-11-20 13:59:04,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1113980.0, ans=0.1 2023-11-20 13:59:09,689 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 10800, loss[loss=0.09061, simple_loss=0.1132, pruned_loss=0.02559, audio_tagging_loss=0.008435, over 15351.00 frames. ], tot_loss[loss=0.07889, simple_loss=0.09934, pruned_loss=0.01953, audio_tagging_loss=0.009693, over 3051891.26 frames. ], batch size: 56, lr: 4.86e-03, grad_scale: 32.0 2023-11-20 13:59:13,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1114046.6666666667, ans=0.125 2023-11-20 13:59:28,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1114113.3333333333, ans=0.025 2023-11-20 13:59:43,616 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.204e+01 8.350e+01 8.974e+01 9.650e+01 1.251e+02, threshold=1.795e+02, percent-clipped=0.0 2023-11-20 14:00:00,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=1114313.3333333333, ans=0.0 2023-11-20 14:00:03,063 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 167150 2023-11-20 14:00:14,902 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 10850, loss[loss=0.07258, simple_loss=0.08736, pruned_loss=0.01738, audio_tagging_loss=0.01152, over 16726.00 frames. ], tot_loss[loss=0.07928, simple_loss=0.1, pruned_loss=0.01962, audio_tagging_loss=0.009657, over 3052680.77 frames. ], batch size: 62, lr: 4.86e-03, grad_scale: 32.0 2023-11-20 14:00:26,915 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 14:01:08,148 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 167200 2023-11-20 14:01:12,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1114646.6666666667, ans=0.0 2023-11-20 14:01:14,755 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 14:01:20,244 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 10900, loss[loss=0.0641, simple_loss=0.07979, pruned_loss=0.014, audio_tagging_loss=0.0102, over 15888.00 frames. ], tot_loss[loss=0.07842, simple_loss=0.09901, pruned_loss=0.0191, audio_tagging_loss=0.009813, over 3051920.27 frames. ], batch size: 62, lr: 4.86e-03, grad_scale: 32.0 2023-11-20 14:01:29,739 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.96 vs. limit=15.0 2023-11-20 14:01:38,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1114780.0, ans=0.125 2023-11-20 14:01:39,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1114780.0, ans=0.1 2023-11-20 14:01:45,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1114846.6666666667, ans=0.125 2023-11-20 14:01:50,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=1114846.6666666667, ans=0.125 2023-11-20 14:01:52,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1114846.6666666667, ans=0.125 2023-11-20 14:01:53,710 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.172e+01 8.152e+01 8.794e+01 9.597e+01 1.232e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-20 14:02:12,228 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1114980.0, ans=0.2 2023-11-20 14:02:13,353 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 167250 2023-11-20 14:02:13,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1114980.0, ans=0.125 2023-11-20 14:02:21,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1114980.0, ans=0.125 2023-11-20 14:02:24,243 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 10950, loss[loss=0.05208, simple_loss=0.05568, pruned_loss=0.01077, audio_tagging_loss=0.01347, over 15574.00 frames. ], tot_loss[loss=0.07823, simple_loss=0.09868, pruned_loss=0.01891, audio_tagging_loss=0.009979, over 3047695.40 frames. ], batch size: 59, lr: 4.86e-03, grad_scale: 32.0 2023-11-20 14:02:58,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1115180.0, ans=0.07 2023-11-20 14:03:11,731 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.87 vs. limit=12.0 2023-11-20 14:03:13,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=1115246.6666666667, ans=0.125 2023-11-20 14:03:17,729 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 167300 2023-11-20 14:03:20,822 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.16 vs. limit=22.5 2023-11-20 14:03:29,237 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 11000, loss[loss=0.07321, simple_loss=0.08803, pruned_loss=0.0196, audio_tagging_loss=0.009596, over 15459.00 frames. ], tot_loss[loss=0.07829, simple_loss=0.09889, pruned_loss=0.01889, audio_tagging_loss=0.009957, over 3048809.85 frames. ], batch size: 60, lr: 4.86e-03, grad_scale: 32.0 2023-11-20 14:03:38,488 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 14:04:02,038 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.939e+01 8.121e+01 8.892e+01 9.815e+01 1.453e+02, threshold=1.778e+02, percent-clipped=0.0 2023-11-20 14:04:22,128 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 167350 2023-11-20 14:04:26,360 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.10 vs. limit=6.0 2023-11-20 14:04:33,198 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 11050, loss[loss=0.08802, simple_loss=0.1079, pruned_loss=0.02335, audio_tagging_loss=0.01073, over 14074.00 frames. ], tot_loss[loss=0.07916, simple_loss=0.09996, pruned_loss=0.01923, audio_tagging_loss=0.009952, over 3044345.64 frames. ], batch size: 53, lr: 4.85e-03, grad_scale: 32.0 2023-11-20 14:04:46,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=1115780.0, ans=10.0 2023-11-20 14:04:58,425 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.68 vs. limit=15.0 2023-11-20 14:05:01,846 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=1115846.6666666667, ans=0.2 2023-11-20 14:05:04,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1115846.6666666667, ans=0.0 2023-11-20 14:05:04,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1115846.6666666667, ans=0.1 2023-11-20 14:05:04,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1115846.6666666667, ans=0.1 2023-11-20 14:05:14,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1115913.3333333333, ans=10.0 2023-11-20 14:05:25,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1115980.0, ans=0.1 2023-11-20 14:05:26,760 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 167400 2023-11-20 14:05:26,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1115980.0, ans=0.125 2023-11-20 14:05:28,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1115980.0, ans=0.2 2023-11-20 14:05:38,047 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 11100, loss[loss=0.07972, simple_loss=0.1007, pruned_loss=0.02029, audio_tagging_loss=0.00908, over 14444.00 frames. ], tot_loss[loss=0.07958, simple_loss=0.1006, pruned_loss=0.0192, audio_tagging_loss=0.01008, over 3050870.90 frames. ], batch size: 54, lr: 4.85e-03, grad_scale: 32.0 2023-11-20 14:06:11,985 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.063e+01 8.381e+01 8.919e+01 9.708e+01 1.297e+02, threshold=1.784e+02, percent-clipped=0.0 2023-11-20 14:06:31,699 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 167450 2023-11-20 14:06:36,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1116313.3333333333, ans=0.125 2023-11-20 14:06:41,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1116380.0, ans=0.95 2023-11-20 14:06:42,795 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 11150, loss[loss=0.09136, simple_loss=0.1191, pruned_loss=0.02102, audio_tagging_loss=0.0108, over 15541.00 frames. ], tot_loss[loss=0.07885, simple_loss=0.0992, pruned_loss=0.019, audio_tagging_loss=0.01025, over 3054797.04 frames. ], batch size: 56, lr: 4.85e-03, grad_scale: 32.0 2023-11-20 14:06:47,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=1116380.0, ans=0.02 2023-11-20 14:07:04,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1116446.6666666667, ans=0.125 2023-11-20 14:07:08,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1116513.3333333333, ans=0.1 2023-11-20 14:07:10,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1116513.3333333333, ans=0.1 2023-11-20 14:07:23,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1116580.0, ans=0.125 2023-11-20 14:07:33,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1116646.6666666667, ans=0.1 2023-11-20 14:07:35,336 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 167500 2023-11-20 14:07:36,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1116646.6666666667, ans=0.125 2023-11-20 14:07:47,510 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 11200, loss[loss=0.09613, simple_loss=0.1268, pruned_loss=0.02344, audio_tagging_loss=0.009294, over 15012.00 frames. ], tot_loss[loss=0.07917, simple_loss=0.09939, pruned_loss=0.01917, audio_tagging_loss=0.01031, over 3052944.18 frames. ], batch size: 53, lr: 4.85e-03, grad_scale: 32.0 2023-11-20 14:07:59,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=1116780.0, ans=0.125 2023-11-20 14:08:09,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1116780.0, ans=0.1 2023-11-20 14:08:20,250 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.974e+01 8.196e+01 8.773e+01 9.585e+01 1.271e+02, threshold=1.755e+02, percent-clipped=0.0 2023-11-20 14:08:27,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1116913.3333333333, ans=0.0 2023-11-20 14:08:37,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1116980.0, ans=0.125 2023-11-20 14:08:40,460 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 167550 2023-11-20 14:08:43,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1116980.0, ans=0.1 2023-11-20 14:08:44,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1116980.0, ans=0.125 2023-11-20 14:08:51,241 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 11250, loss[loss=0.08282, simple_loss=0.1032, pruned_loss=0.02174, audio_tagging_loss=0.009481, over 15521.00 frames. ], tot_loss[loss=0.07922, simple_loss=0.0993, pruned_loss=0.01926, audio_tagging_loss=0.01031, over 3050853.61 frames. ], batch size: 59, lr: 4.85e-03, grad_scale: 16.0 2023-11-20 14:08:52,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1117046.6666666667, ans=0.125 2023-11-20 14:08:55,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1117046.6666666667, ans=0.1 2023-11-20 14:08:59,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1117046.6666666667, ans=0.0 2023-11-20 14:09:17,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1117180.0, ans=0.125 2023-11-20 14:09:25,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1117180.0, ans=0.025 2023-11-20 14:09:41,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1117313.3333333333, ans=0.125 2023-11-20 14:09:44,032 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 167600 2023-11-20 14:09:53,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1117313.3333333333, ans=0.125 2023-11-20 14:09:55,738 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 11300, loss[loss=0.08349, simple_loss=0.107, pruned_loss=0.02266, audio_tagging_loss=0.007307, over 15238.00 frames. ], tot_loss[loss=0.07819, simple_loss=0.09835, pruned_loss=0.0189, audio_tagging_loss=0.01012, over 3050189.19 frames. ], batch size: 58, lr: 4.85e-03, grad_scale: 16.0 2023-11-20 14:09:55,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1117380.0, ans=0.125 2023-11-20 14:10:30,803 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.679e+01 8.103e+01 8.654e+01 9.341e+01 1.359e+02, threshold=1.731e+02, percent-clipped=0.0 2023-11-20 14:10:32,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1117513.3333333333, ans=0.125 2023-11-20 14:10:44,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1117580.0, ans=0.125 2023-11-20 14:10:48,665 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 167650 2023-11-20 14:11:00,277 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 11350, loss[loss=0.0698, simple_loss=0.09172, pruned_loss=0.01545, audio_tagging_loss=0.008491, over 15064.00 frames. ], tot_loss[loss=0.07849, simple_loss=0.09929, pruned_loss=0.01892, audio_tagging_loss=0.009924, over 3041720.66 frames. ], batch size: 57, lr: 4.85e-03, grad_scale: 16.0 2023-11-20 14:11:00,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1117713.3333333333, ans=0.125 2023-11-20 14:11:01,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1117713.3333333333, ans=0.125 2023-11-20 14:11:09,468 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.09 vs. limit=22.5 2023-11-20 14:11:52,935 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 167700 2023-11-20 14:11:57,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1117980.0, ans=0.0 2023-11-20 14:12:02,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1117980.0, ans=0.125 2023-11-20 14:12:04,785 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 11400, loss[loss=0.05221, simple_loss=0.0585, pruned_loss=0.01108, audio_tagging_loss=0.01188, over 15441.00 frames. ], tot_loss[loss=0.07789, simple_loss=0.09857, pruned_loss=0.01876, audio_tagging_loss=0.009848, over 3042381.88 frames. ], batch size: 59, lr: 4.85e-03, grad_scale: 16.0 2023-11-20 14:12:09,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1118046.6666666667, ans=0.125 2023-11-20 14:12:21,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1118113.3333333333, ans=0.125 2023-11-20 14:12:34,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=1118180.0, ans=0.0 2023-11-20 14:12:37,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1118180.0, ans=0.0 2023-11-20 14:12:39,416 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.780e+01 8.090e+01 8.832e+01 9.724e+01 2.021e+02, threshold=1.766e+02, percent-clipped=1.0 2023-11-20 14:12:39,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1118180.0, ans=0.125 2023-11-20 14:12:42,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=1118246.6666666667, ans=0.02 2023-11-20 14:12:57,885 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 167750 2023-11-20 14:13:09,438 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 11450, loss[loss=0.0686, simple_loss=0.0878, pruned_loss=0.0156, audio_tagging_loss=0.009102, over 15376.00 frames. ], tot_loss[loss=0.07809, simple_loss=0.09885, pruned_loss=0.01888, audio_tagging_loss=0.009786, over 3033099.59 frames. ], batch size: 60, lr: 4.85e-03, grad_scale: 16.0 2023-11-20 14:13:27,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1118446.6666666667, ans=0.125 2023-11-20 14:13:31,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=1118446.6666666667, ans=0.5 2023-11-20 14:14:02,127 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 167800 2023-11-20 14:14:14,029 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 11500, loss[loss=0.06856, simple_loss=0.08815, pruned_loss=0.01238, audio_tagging_loss=0.01211, over 15947.00 frames. ], tot_loss[loss=0.07866, simple_loss=0.09956, pruned_loss=0.01905, audio_tagging_loss=0.009822, over 3035291.50 frames. ], batch size: 59, lr: 4.85e-03, grad_scale: 16.0 2023-11-20 14:14:14,630 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.37 vs. limit=15.0 2023-11-20 14:14:23,981 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.19 vs. limit=22.5 2023-11-20 14:14:38,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1118846.6666666667, ans=0.2 2023-11-20 14:14:48,338 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.29 vs. limit=15.0 2023-11-20 14:14:48,914 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.104e+01 8.299e+01 8.769e+01 9.853e+01 1.208e+02, threshold=1.754e+02, percent-clipped=0.0 2023-11-20 14:14:54,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1118913.3333333333, ans=0.125 2023-11-20 14:14:58,002 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 14:15:07,054 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 167850 2023-11-20 14:15:19,089 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 11550, loss[loss=0.08391, simple_loss=0.1082, pruned_loss=0.01918, audio_tagging_loss=0.01062, over 15360.00 frames. ], tot_loss[loss=0.07937, simple_loss=0.1003, pruned_loss=0.01938, audio_tagging_loss=0.009833, over 3039749.81 frames. ], batch size: 58, lr: 4.85e-03, grad_scale: 16.0 2023-11-20 14:15:38,362 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 14:15:55,711 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 14:16:03,046 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.89 vs. limit=15.0 2023-11-20 14:16:04,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1119246.6666666667, ans=0.125 2023-11-20 14:16:11,558 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 167900 2023-11-20 14:16:13,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=1119313.3333333333, ans=0.0 2023-11-20 14:16:16,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1119313.3333333333, ans=0.1 2023-11-20 14:16:21,608 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.35 vs. limit=15.0 2023-11-20 14:16:23,302 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 11600, loss[loss=0.08444, simple_loss=0.0954, pruned_loss=0.0245, audio_tagging_loss=0.01224, over 13788.00 frames. ], tot_loss[loss=0.0794, simple_loss=0.1007, pruned_loss=0.01922, audio_tagging_loss=0.009845, over 3046332.55 frames. ], batch size: 53, lr: 4.85e-03, grad_scale: 32.0 2023-11-20 14:16:33,939 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.02 vs. limit=10.0 2023-11-20 14:16:43,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1119446.6666666667, ans=0.125 2023-11-20 14:16:54,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1119513.3333333333, ans=0.2 2023-11-20 14:16:57,521 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.656e+01 8.039e+01 8.649e+01 9.262e+01 1.367e+02, threshold=1.730e+02, percent-clipped=0.0 2023-11-20 14:17:15,946 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 167950 2023-11-20 14:17:26,981 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 11650, loss[loss=0.06599, simple_loss=0.07113, pruned_loss=0.01598, audio_tagging_loss=0.01444, over 15070.00 frames. ], tot_loss[loss=0.07896, simple_loss=0.09998, pruned_loss=0.01904, audio_tagging_loss=0.009928, over 3035916.54 frames. ], batch size: 61, lr: 4.85e-03, grad_scale: 32.0 2023-11-20 14:17:36,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1119713.3333333333, ans=0.0 2023-11-20 14:17:40,842 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.15 vs. limit=15.0 2023-11-20 14:17:43,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=1119780.0, ans=0.125 2023-11-20 14:17:49,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1119780.0, ans=0.0 2023-11-20 14:18:00,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=1119846.6666666667, ans=0.125 2023-11-20 14:18:15,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=1119913.3333333333, ans=0.0 2023-11-20 14:18:18,155 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.41 vs. limit=15.0 2023-11-20 14:18:20,053 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 168000 2023-11-20 14:18:34,837 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 11700, loss[loss=0.09319, simple_loss=0.1151, pruned_loss=0.02472, audio_tagging_loss=0.01093, over 15271.00 frames. ], tot_loss[loss=0.07891, simple_loss=0.09986, pruned_loss=0.01901, audio_tagging_loss=0.009971, over 3035900.45 frames. ], batch size: 55, lr: 4.85e-03, grad_scale: 32.0 2023-11-20 14:18:47,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1120113.3333333333, ans=0.125 2023-11-20 14:18:56,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1120113.3333333333, ans=0.0 2023-11-20 14:19:09,470 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.797e+01 8.107e+01 8.645e+01 9.352e+01 1.111e+02, threshold=1.729e+02, percent-clipped=0.0 2023-11-20 14:19:12,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1120246.6666666667, ans=0.125 2023-11-20 14:19:16,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1120246.6666666667, ans=0.125 2023-11-20 14:19:27,340 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 168050 2023-11-20 14:19:35,793 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.90 vs. limit=15.0 2023-11-20 14:19:39,530 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 11750, loss[loss=0.09198, simple_loss=0.1219, pruned_loss=0.02018, audio_tagging_loss=0.01084, over 16931.00 frames. ], tot_loss[loss=0.07819, simple_loss=0.09904, pruned_loss=0.01866, audio_tagging_loss=0.01002, over 3036206.84 frames. ], batch size: 62, lr: 4.84e-03, grad_scale: 32.0 2023-11-20 14:19:56,994 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1120446.6666666667, ans=0.125 2023-11-20 14:20:14,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1120513.3333333333, ans=0.2 2023-11-20 14:20:14,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1120513.3333333333, ans=0.07 2023-11-20 14:20:32,712 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 168100 2023-11-20 14:20:43,461 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 11800, loss[loss=0.09461, simple_loss=0.1157, pruned_loss=0.02656, audio_tagging_loss=0.01019, over 16183.00 frames. ], tot_loss[loss=0.07821, simple_loss=0.09915, pruned_loss=0.01858, audio_tagging_loss=0.01006, over 3037540.17 frames. ], batch size: 60, lr: 4.84e-03, grad_scale: 32.0 2023-11-20 14:20:53,296 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.39 vs. limit=6.0 2023-11-20 14:21:01,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=1120780.0, ans=0.125 2023-11-20 14:21:02,165 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1120780.0, ans=0.1 2023-11-20 14:21:19,056 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.990e+01 8.087e+01 8.933e+01 9.931e+01 1.196e+02, threshold=1.787e+02, percent-clipped=0.0 2023-11-20 14:21:24,258 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 14:21:25,739 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.49 vs. limit=22.5 2023-11-20 14:21:30,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1120913.3333333333, ans=0.0 2023-11-20 14:21:32,950 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.18 vs. limit=15.0 2023-11-20 14:21:36,603 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 168150 2023-11-20 14:21:47,567 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 11850, loss[loss=0.08827, simple_loss=0.1107, pruned_loss=0.02145, audio_tagging_loss=0.01147, over 15567.00 frames. ], tot_loss[loss=0.07788, simple_loss=0.09873, pruned_loss=0.01843, audio_tagging_loss=0.01008, over 3036976.04 frames. ], batch size: 57, lr: 4.84e-03, grad_scale: 32.0 2023-11-20 14:21:51,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=1121046.6666666667, ans=0.0 2023-11-20 14:22:15,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1121180.0, ans=0.1 2023-11-20 14:22:18,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1121180.0, ans=0.1 2023-11-20 14:22:40,204 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 168200 2023-11-20 14:22:47,221 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.39 vs. limit=22.5 2023-11-20 14:22:51,441 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 11900, loss[loss=0.08306, simple_loss=0.1056, pruned_loss=0.01644, audio_tagging_loss=0.01382, over 14893.00 frames. ], tot_loss[loss=0.07776, simple_loss=0.09829, pruned_loss=0.0184, audio_tagging_loss=0.01021, over 3045176.12 frames. ], batch size: 54, lr: 4.84e-03, grad_scale: 32.0 2023-11-20 14:23:11,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1121446.6666666667, ans=0.125 2023-11-20 14:23:18,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1121513.3333333333, ans=0.125 2023-11-20 14:23:26,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1121513.3333333333, ans=0.125 2023-11-20 14:23:27,238 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.121e+01 8.163e+01 8.778e+01 9.504e+01 1.300e+02, threshold=1.756e+02, percent-clipped=0.0 2023-11-20 14:23:30,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1121580.0, ans=10.0 2023-11-20 14:23:45,108 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 168250 2023-11-20 14:23:48,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=1121646.6666666667, ans=0.2 2023-11-20 14:23:56,559 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 11950, loss[loss=0.09028, simple_loss=0.1225, pruned_loss=0.02135, audio_tagging_loss=0.007657, over 16469.00 frames. ], tot_loss[loss=0.07747, simple_loss=0.09769, pruned_loss=0.01819, audio_tagging_loss=0.01044, over 3044832.44 frames. ], batch size: 59, lr: 4.84e-03, grad_scale: 32.0 2023-11-20 14:24:48,395 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 168300 2023-11-20 14:24:58,376 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.63 vs. limit=15.0 2023-11-20 14:24:58,994 INFO [train_asr.py:1262] (1/4) Epoch 14, batch 12000, loss[loss=0.0938, simple_loss=0.1167, pruned_loss=0.02327, audio_tagging_loss=0.0122, over 15187.00 frames. ], tot_loss[loss=0.07781, simple_loss=0.09764, pruned_loss=0.01847, audio_tagging_loss=0.01052, over 3046065.39 frames. ], batch size: 55, lr: 4.84e-03, grad_scale: 32.0 2023-11-20 14:24:58,995 INFO [train_asr.py:1285] (1/4) Computing validation loss 2023-11-20 14:25:22,158 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.5122, 3.6050, 4.4185, 3.3412], device='cuda:1') 2023-11-20 14:25:41,044 INFO [train_asr.py:1294] (1/4) Epoch 14, validation: loss=0.06236, simple_loss=0.05348, pruned_loss=0.005638, audio_tagging_loss=0.02999, over 4681554.00 frames. 2023-11-20 14:25:41,045 INFO [train_asr.py:1295] (1/4) Maximum memory allocated so far is 26082MB 2023-11-20 14:25:48,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1122046.6666666667, ans=0.0 2023-11-20 14:25:49,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1122046.6666666667, ans=0.125 2023-11-20 14:26:46,220 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 0, loss[loss=0.1081, simple_loss=0.1263, pruned_loss=0.02442, audio_tagging_loss=0.02051, over 15873.00 frames. ], tot_loss[loss=0.1081, simple_loss=0.1263, pruned_loss=0.02442, audio_tagging_loss=0.02051, over 15873.00 frames. ], batch size: 56, lr: 4.68e-03, grad_scale: 32.0 2023-11-20 14:26:46,221 INFO [train_asr.py:1285] (1/4) Computing validation loss 2023-11-20 14:27:01,123 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.6982, 5.5954, 5.7570, 5.6279], device='cuda:1') 2023-11-20 14:27:17,692 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.3060, 5.0065, 4.7176, 5.1709], device='cuda:1') 2023-11-20 14:27:21,772 INFO [train_asr.py:1294] (1/4) Epoch 15, validation: loss=0.06153, simple_loss=0.05347, pruned_loss=0.005654, audio_tagging_loss=0.02914, over 4681554.00 frames. 2023-11-20 14:27:21,773 INFO [train_asr.py:1295] (1/4) Maximum memory allocated so far is 26082MB 2023-11-20 14:27:26,690 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.106e+01 8.292e+01 9.006e+01 9.902e+01 1.226e+02, threshold=1.801e+02, percent-clipped=0.0 2023-11-20 14:27:39,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1122266.6666666667, ans=0.2 2023-11-20 14:27:44,709 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 168350 2023-11-20 14:27:59,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1122400.0, ans=0.0 2023-11-20 14:28:07,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1122400.0, ans=0.125 2023-11-20 14:28:17,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=1122466.6666666667, ans=0.2 2023-11-20 14:28:25,994 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 50, loss[loss=0.1004, simple_loss=0.127, pruned_loss=0.02106, audio_tagging_loss=0.01581, over 15203.00 frames. ], tot_loss[loss=0.08892, simple_loss=0.1009, pruned_loss=0.01938, audio_tagging_loss=0.01909, over 687095.34 frames. ], batch size: 54, lr: 4.67e-03, grad_scale: 32.0 2023-11-20 14:28:32,404 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.06 vs. limit=15.0 2023-11-20 14:28:50,238 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 168400 2023-11-20 14:29:02,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1122666.6666666667, ans=0.0 2023-11-20 14:29:32,409 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 100, loss[loss=0.08094, simple_loss=0.1013, pruned_loss=0.01584, audio_tagging_loss=0.01443, over 15213.00 frames. ], tot_loss[loss=0.08727, simple_loss=0.09989, pruned_loss=0.01902, audio_tagging_loss=0.0183, over 1207160.22 frames. ], batch size: 54, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 14:29:39,161 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.599e+01 8.769e+01 9.395e+01 1.004e+02 1.341e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-20 14:29:45,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1122933.3333333333, ans=0.125 2023-11-20 14:29:56,003 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 168450 2023-11-20 14:29:56,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1122933.3333333333, ans=0.1 2023-11-20 14:30:32,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1123133.3333333333, ans=0.125 2023-11-20 14:30:37,468 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 150, loss[loss=0.07988, simple_loss=0.08453, pruned_loss=0.02408, audio_tagging_loss=0.01354, over 15005.00 frames. ], tot_loss[loss=0.08482, simple_loss=0.09928, pruned_loss=0.01871, audio_tagging_loss=0.01648, over 1617806.60 frames. ], batch size: 60, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 14:30:58,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1123266.6666666667, ans=0.2 2023-11-20 14:30:58,652 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.48 vs. limit=15.0 2023-11-20 14:31:00,992 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 168500 2023-11-20 14:31:01,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1123266.6666666667, ans=0.0 2023-11-20 14:31:28,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_na.min_abs, batch_count=1123466.6666666667, ans=0.02 2023-11-20 14:31:40,866 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.28 vs. limit=6.0 2023-11-20 14:31:42,716 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 200, loss[loss=0.08115, simple_loss=0.1077, pruned_loss=0.01915, audio_tagging_loss=0.00813, over 15647.00 frames. ], tot_loss[loss=0.08333, simple_loss=0.09941, pruned_loss=0.01893, audio_tagging_loss=0.0147, over 1929667.32 frames. ], batch size: 59, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 14:31:48,819 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.970e+01 8.233e+01 8.956e+01 9.883e+01 1.318e+02, threshold=1.791e+02, percent-clipped=0.0 2023-11-20 14:31:58,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1123600.0, ans=0.125 2023-11-20 14:31:58,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1123600.0, ans=0.0 2023-11-20 14:32:06,115 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 168550 2023-11-20 14:32:13,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=1123666.6666666667, ans=0.0 2023-11-20 14:32:37,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1123800.0, ans=0.125 2023-11-20 14:32:48,630 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 250, loss[loss=0.08867, simple_loss=0.1092, pruned_loss=0.02487, audio_tagging_loss=0.009196, over 14965.00 frames. ], tot_loss[loss=0.0825, simple_loss=0.1001, pruned_loss=0.01923, audio_tagging_loss=0.0132, over 2177684.21 frames. ], batch size: 57, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 14:32:58,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=1123866.6666666667, ans=0.0 2023-11-20 14:33:09,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=1123933.3333333333, ans=22.5 2023-11-20 14:33:11,760 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 168600 2023-11-20 14:33:16,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1124000.0, ans=0.125 2023-11-20 14:33:29,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1124066.6666666667, ans=0.125 2023-11-20 14:33:44,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1124133.3333333333, ans=0.125 2023-11-20 14:33:50,224 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.92 vs. limit=15.0 2023-11-20 14:33:52,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1124133.3333333333, ans=0.0 2023-11-20 14:33:54,388 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 300, loss[loss=0.06828, simple_loss=0.09171, pruned_loss=0.01398, audio_tagging_loss=0.008445, over 14643.00 frames. ], tot_loss[loss=0.08147, simple_loss=0.1, pruned_loss=0.01938, audio_tagging_loss=0.01207, over 2375716.54 frames. ], batch size: 55, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 14:34:00,422 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.977e+01 8.500e+01 9.120e+01 9.945e+01 1.401e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-20 14:34:02,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1124200.0, ans=0.2 2023-11-20 14:34:17,722 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 168650 2023-11-20 14:34:36,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1124400.0, ans=0.125 2023-11-20 14:34:39,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1124400.0, ans=0.125 2023-11-20 14:34:48,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1124466.6666666667, ans=0.125 2023-11-20 14:34:59,592 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 350, loss[loss=0.04975, simple_loss=0.05958, pruned_loss=0.008223, audio_tagging_loss=0.01174, over 14430.00 frames. ], tot_loss[loss=0.08075, simple_loss=0.1, pruned_loss=0.01925, audio_tagging_loss=0.01149, over 2520800.73 frames. ], batch size: 54, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 14:34:59,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=1124533.3333333333, ans=0.2 2023-11-20 14:35:15,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1124600.0, ans=0.1 2023-11-20 14:35:22,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1124600.0, ans=0.125 2023-11-20 14:35:24,804 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 168700 2023-11-20 14:35:25,397 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.84 vs. limit=15.0 2023-11-20 14:35:29,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=1124666.6666666667, ans=0.125 2023-11-20 14:35:33,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1124666.6666666667, ans=0.125 2023-11-20 14:35:35,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1124666.6666666667, ans=0.1 2023-11-20 14:35:56,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=1124800.0, ans=0.2 2023-11-20 14:36:04,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=1124800.0, ans=0.125 2023-11-20 14:36:06,967 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 400, loss[loss=0.07549, simple_loss=0.1008, pruned_loss=0.01607, audio_tagging_loss=0.009, over 16307.00 frames. ], tot_loss[loss=0.08015, simple_loss=0.09996, pruned_loss=0.01904, audio_tagging_loss=0.01113, over 2635577.49 frames. ], batch size: 61, lr: 4.67e-03, grad_scale: 32.0 2023-11-20 14:36:09,183 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 14:36:13,871 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.381e+01 8.162e+01 9.229e+01 1.065e+02 1.239e+02, threshold=1.846e+02, percent-clipped=0.0 2023-11-20 14:36:16,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1124866.6666666667, ans=0.125 2023-11-20 14:36:16,882 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.04 vs. limit=10.0 2023-11-20 14:36:20,360 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.95 vs. limit=15.0 2023-11-20 14:36:30,693 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 168750 2023-11-20 14:36:38,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1125000.0, ans=0.04949747468305833 2023-11-20 14:36:53,839 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.45 vs. limit=15.0 2023-11-20 14:37:06,370 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.95 vs. limit=15.0 2023-11-20 14:37:07,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1125133.3333333333, ans=0.125 2023-11-20 14:37:12,764 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 450, loss[loss=0.1009, simple_loss=0.1316, pruned_loss=0.02829, audio_tagging_loss=0.006772, over 16123.00 frames. ], tot_loss[loss=0.08009, simple_loss=0.1001, pruned_loss=0.01921, audio_tagging_loss=0.01081, over 2733405.37 frames. ], batch size: 58, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 14:37:34,860 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 168800 2023-11-20 14:37:45,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1125333.3333333333, ans=0.125 2023-11-20 14:38:17,237 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 500, loss[loss=0.06538, simple_loss=0.0838, pruned_loss=0.01384, audio_tagging_loss=0.009645, over 15480.00 frames. ], tot_loss[loss=0.07923, simple_loss=0.0995, pruned_loss=0.01899, audio_tagging_loss=0.01048, over 2799937.52 frames. ], batch size: 58, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 14:38:24,586 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.837e+01 8.018e+01 8.483e+01 9.528e+01 1.143e+02, threshold=1.697e+02, percent-clipped=0.0 2023-11-20 14:38:32,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=1125600.0, ans=0.2 2023-11-20 14:38:41,270 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 168850 2023-11-20 14:38:41,685 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.16 vs. limit=15.0 2023-11-20 14:38:43,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1125666.6666666667, ans=0.025 2023-11-20 14:38:45,571 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.99 vs. limit=15.0 2023-11-20 14:38:51,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1125666.6666666667, ans=0.125 2023-11-20 14:38:52,383 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.45 vs. limit=15.0 2023-11-20 14:39:00,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1125733.3333333333, ans=10.0 2023-11-20 14:39:06,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1125733.3333333333, ans=0.125 2023-11-20 14:39:07,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1125800.0, ans=0.125 2023-11-20 14:39:15,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1125800.0, ans=0.07 2023-11-20 14:39:21,974 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 550, loss[loss=0.07497, simple_loss=0.1051, pruned_loss=0.01512, audio_tagging_loss=0.007289, over 14681.00 frames. ], tot_loss[loss=0.07928, simple_loss=0.0995, pruned_loss=0.01914, audio_tagging_loss=0.01039, over 2850841.25 frames. ], batch size: 54, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 14:39:24,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1125866.6666666667, ans=0.0 2023-11-20 14:39:45,545 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 168900 2023-11-20 14:40:04,504 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.04 vs. limit=15.0 2023-11-20 14:40:14,412 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.22 vs. limit=15.0 2023-11-20 14:40:20,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1126133.3333333333, ans=0.1 2023-11-20 14:40:27,407 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 600, loss[loss=0.06554, simple_loss=0.07185, pruned_loss=0.01664, audio_tagging_loss=0.01297, over 14202.00 frames. ], tot_loss[loss=0.079, simple_loss=0.09912, pruned_loss=0.0191, audio_tagging_loss=0.01033, over 2889192.46 frames. ], batch size: 60, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 14:40:32,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=1126200.0, ans=0.5 2023-11-20 14:40:35,035 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.998e+01 8.140e+01 8.992e+01 9.843e+01 1.226e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-20 14:40:45,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1126266.6666666667, ans=0.0 2023-11-20 14:40:48,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=1126266.6666666667, ans=0.125 2023-11-20 14:40:50,146 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 168950 2023-11-20 14:40:51,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1126333.3333333333, ans=0.0 2023-11-20 14:40:54,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1126333.3333333333, ans=0.1 2023-11-20 14:40:55,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1126333.3333333333, ans=0.125 2023-11-20 14:41:32,772 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 650, loss[loss=0.1017, simple_loss=0.1357, pruned_loss=0.02744, audio_tagging_loss=0.006442, over 15366.00 frames. ], tot_loss[loss=0.07945, simple_loss=0.1002, pruned_loss=0.01916, audio_tagging_loss=0.01019, over 2925964.97 frames. ], batch size: 57, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 14:41:51,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1126600.0, ans=0.0 2023-11-20 14:41:57,330 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 169000 2023-11-20 14:42:08,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=1126666.6666666667, ans=0.2 2023-11-20 14:42:11,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1126666.6666666667, ans=0.0 2023-11-20 14:42:11,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=1126666.6666666667, ans=0.125 2023-11-20 14:42:23,710 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=1126733.3333333333, ans=0.09899494936611666 2023-11-20 14:42:34,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1126800.0, ans=0.015 2023-11-20 14:42:38,534 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 700, loss[loss=0.08209, simple_loss=0.1058, pruned_loss=0.01611, audio_tagging_loss=0.01309, over 15680.00 frames. ], tot_loss[loss=0.07988, simple_loss=0.1008, pruned_loss=0.01934, audio_tagging_loss=0.01014, over 2950689.46 frames. ], batch size: 58, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 14:42:47,985 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.688e+01 8.098e+01 8.725e+01 9.382e+01 1.189e+02, threshold=1.745e+02, percent-clipped=0.0 2023-11-20 14:42:54,534 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=1126933.3333333333, ans=0.125 2023-11-20 14:43:03,504 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 169050 2023-11-20 14:43:09,831 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 14:43:13,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1127000.0, ans=0.09899494936611666 2023-11-20 14:43:14,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=1127000.0, ans=0.5 2023-11-20 14:43:16,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1127000.0, ans=0.0 2023-11-20 14:43:27,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1127066.6666666667, ans=0.125 2023-11-20 14:43:40,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=1127133.3333333333, ans=0.0 2023-11-20 14:43:45,365 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 750, loss[loss=0.07563, simple_loss=0.09999, pruned_loss=0.01862, audio_tagging_loss=0.007008, over 14811.00 frames. ], tot_loss[loss=0.08033, simple_loss=0.1015, pruned_loss=0.01944, audio_tagging_loss=0.01013, over 2977649.01 frames. ], batch size: 55, lr: 4.67e-03, grad_scale: 16.0 2023-11-20 14:43:47,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1127200.0, ans=0.0 2023-11-20 14:43:54,362 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.92 vs. limit=22.5 2023-11-20 14:43:56,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1127200.0, ans=0.125 2023-11-20 14:44:05,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1127266.6666666667, ans=0.125 2023-11-20 14:44:08,672 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 169100 2023-11-20 14:44:22,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1127400.0, ans=0.125 2023-11-20 14:44:24,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1127400.0, ans=0.125 2023-11-20 14:44:26,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1127400.0, ans=0.0 2023-11-20 14:44:50,617 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 800, loss[loss=0.06391, simple_loss=0.07622, pruned_loss=0.01512, audio_tagging_loss=0.01068, over 15751.00 frames. ], tot_loss[loss=0.08026, simple_loss=0.1013, pruned_loss=0.01938, audio_tagging_loss=0.01023, over 2992689.74 frames. ], batch size: 58, lr: 4.66e-03, grad_scale: 32.0 2023-11-20 14:44:55,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=1127533.3333333333, ans=0.09899494936611666 2023-11-20 14:44:57,928 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.496e+01 8.097e+01 8.575e+01 9.313e+01 1.221e+02, threshold=1.715e+02, percent-clipped=0.0 2023-11-20 14:45:00,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=1127533.3333333333, ans=0.0 2023-11-20 14:45:13,861 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 169150 2023-11-20 14:45:19,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1127666.6666666667, ans=0.125 2023-11-20 14:45:23,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1127666.6666666667, ans=0.125 2023-11-20 14:45:56,226 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 850, loss[loss=0.08264, simple_loss=0.1068, pruned_loss=0.01993, audio_tagging_loss=0.009277, over 15493.00 frames. ], tot_loss[loss=0.0793, simple_loss=0.09981, pruned_loss=0.01906, audio_tagging_loss=0.01033, over 3010848.32 frames. ], batch size: 58, lr: 4.66e-03, grad_scale: 32.0 2023-11-20 14:46:10,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1127933.3333333333, ans=0.125 2023-11-20 14:46:21,158 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 169200 2023-11-20 14:46:21,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1127933.3333333333, ans=0.1 2023-11-20 14:46:31,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1128000.0, ans=0.125 2023-11-20 14:46:35,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1128066.6666666667, ans=0.0 2023-11-20 14:46:39,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=1128066.6666666667, ans=0.125 2023-11-20 14:46:49,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1128133.3333333333, ans=0.0 2023-11-20 14:47:02,632 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 900, loss[loss=0.05976, simple_loss=0.07653, pruned_loss=0.0113, audio_tagging_loss=0.01019, over 15738.00 frames. ], tot_loss[loss=0.08006, simple_loss=0.1012, pruned_loss=0.0192, audio_tagging_loss=0.01027, over 3025095.06 frames. ], batch size: 59, lr: 4.66e-03, grad_scale: 32.0 2023-11-20 14:47:11,334 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.656e+01 8.130e+01 8.827e+01 9.752e+01 1.444e+02, threshold=1.765e+02, percent-clipped=0.0 2023-11-20 14:47:26,426 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 169250 2023-11-20 14:47:43,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=1128400.0, ans=0.0 2023-11-20 14:48:07,413 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 950, loss[loss=0.0756, simple_loss=0.09229, pruned_loss=0.01931, audio_tagging_loss=0.01014, over 14325.00 frames. ], tot_loss[loss=0.07974, simple_loss=0.1007, pruned_loss=0.01916, audio_tagging_loss=0.01024, over 3030944.96 frames. ], batch size: 53, lr: 4.66e-03, grad_scale: 32.0 2023-11-20 14:48:10,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=1128533.3333333333, ans=0.125 2023-11-20 14:48:10,228 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1128533.3333333333, ans=0.1 2023-11-20 14:48:29,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=1128600.0, ans=0.125 2023-11-20 14:48:30,271 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 169300 2023-11-20 14:48:37,594 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.57 vs. limit=15.0 2023-11-20 14:48:41,366 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.51 vs. limit=10.0 2023-11-20 14:48:48,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1128733.3333333333, ans=0.125 2023-11-20 14:48:53,874 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.03 vs. limit=22.5 2023-11-20 14:49:07,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=1128800.0, ans=0.125 2023-11-20 14:49:11,803 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 1000, loss[loss=0.09352, simple_loss=0.122, pruned_loss=0.02737, audio_tagging_loss=0.005138, over 15018.00 frames. ], tot_loss[loss=0.07876, simple_loss=0.09973, pruned_loss=0.01887, audio_tagging_loss=0.01003, over 3034349.78 frames. ], batch size: 56, lr: 4.66e-03, grad_scale: 32.0 2023-11-20 14:49:19,999 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.688e+01 8.251e+01 8.894e+01 9.437e+01 1.345e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-20 14:49:20,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1128866.6666666667, ans=0.0 2023-11-20 14:49:35,942 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 169350 2023-11-20 14:49:40,304 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 14:49:47,266 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.08 vs. limit=15.0 2023-11-20 14:49:48,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=1129000.0, ans=0.0 2023-11-20 14:49:57,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=1129066.6666666667, ans=0.125 2023-11-20 14:50:17,334 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 1050, loss[loss=0.08876, simple_loss=0.1186, pruned_loss=0.02334, audio_tagging_loss=0.006144, over 15109.00 frames. ], tot_loss[loss=0.07904, simple_loss=0.1002, pruned_loss=0.01914, audio_tagging_loss=0.009819, over 3039173.50 frames. ], batch size: 56, lr: 4.66e-03, grad_scale: 16.0 2023-11-20 14:50:40,830 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 169400 2023-11-20 14:50:54,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1129333.3333333333, ans=0.0 2023-11-20 14:50:56,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1129400.0, ans=0.05 2023-11-20 14:50:59,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1129400.0, ans=0.125 2023-11-20 14:51:13,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1129466.6666666667, ans=0.2 2023-11-20 14:51:20,194 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.17 vs. limit=15.0 2023-11-20 14:51:23,877 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 1100, loss[loss=0.05258, simple_loss=0.06522, pruned_loss=0.00884, audio_tagging_loss=0.01113, over 13970.00 frames. ], tot_loss[loss=0.07824, simple_loss=0.09922, pruned_loss=0.01885, audio_tagging_loss=0.009788, over 3039480.85 frames. ], batch size: 54, lr: 4.66e-03, grad_scale: 16.0 2023-11-20 14:51:25,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=1129533.3333333333, ans=0.2 2023-11-20 14:51:26,396 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 14:51:29,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1129533.3333333333, ans=0.125 2023-11-20 14:51:32,467 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.387e+01 8.402e+01 8.962e+01 9.739e+01 1.697e+02, threshold=1.792e+02, percent-clipped=0.0 2023-11-20 14:51:47,045 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 169450 2023-11-20 14:51:49,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1129666.6666666667, ans=0.1 2023-11-20 14:51:59,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=1129666.6666666667, ans=0.035 2023-11-20 14:52:27,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1129800.0, ans=0.1 2023-11-20 14:52:29,243 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 1150, loss[loss=0.08871, simple_loss=0.1207, pruned_loss=0.02006, audio_tagging_loss=0.008283, over 15105.00 frames. ], tot_loss[loss=0.0785, simple_loss=0.09964, pruned_loss=0.01885, audio_tagging_loss=0.009821, over 3039845.61 frames. ], batch size: 54, lr: 4.66e-03, grad_scale: 16.0 2023-11-20 14:52:35,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1129866.6666666667, ans=0.0 2023-11-20 14:52:43,049 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.28 vs. limit=12.0 2023-11-20 14:52:49,165 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1129933.3333333333, ans=0.125 2023-11-20 14:52:51,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=1129933.3333333333, ans=0.0 2023-11-20 14:52:53,503 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 169500 2023-11-20 14:52:58,386 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1130000.0, ans=0.125 2023-11-20 14:53:02,391 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.73 vs. limit=15.0 2023-11-20 14:53:35,176 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 1200, loss[loss=0.06894, simple_loss=0.08745, pruned_loss=0.01522, audio_tagging_loss=0.009999, over 15441.00 frames. ], tot_loss[loss=0.07881, simple_loss=0.1004, pruned_loss=0.01893, audio_tagging_loss=0.009704, over 3043701.68 frames. ], batch size: 60, lr: 4.66e-03, grad_scale: 32.0 2023-11-20 14:53:41,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1130200.0, ans=0.125 2023-11-20 14:53:44,423 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.469e+01 8.178e+01 8.897e+01 9.679e+01 1.493e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-20 14:53:55,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1130266.6666666667, ans=0.0 2023-11-20 14:53:57,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1130266.6666666667, ans=0.05 2023-11-20 14:53:58,868 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 169550 2023-11-20 14:54:01,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1130333.3333333333, ans=0.0 2023-11-20 14:54:40,109 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 1250, loss[loss=0.07571, simple_loss=0.09887, pruned_loss=0.01774, audio_tagging_loss=0.008537, over 14542.00 frames. ], tot_loss[loss=0.0787, simple_loss=0.1002, pruned_loss=0.01889, audio_tagging_loss=0.009714, over 3038978.46 frames. ], batch size: 54, lr: 4.66e-03, grad_scale: 32.0 2023-11-20 14:54:45,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=1130533.3333333333, ans=0.2 2023-11-20 14:54:55,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1130600.0, ans=0.2 2023-11-20 14:55:03,161 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 169600 2023-11-20 14:55:18,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1130733.3333333333, ans=0.125 2023-11-20 14:55:22,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=1130733.3333333333, ans=0.0 2023-11-20 14:55:24,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1130733.3333333333, ans=0.0 2023-11-20 14:55:32,091 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 14:55:44,788 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 1300, loss[loss=0.07191, simple_loss=0.09555, pruned_loss=0.01416, audio_tagging_loss=0.009978, over 16123.00 frames. ], tot_loss[loss=0.07834, simple_loss=0.09979, pruned_loss=0.0187, audio_tagging_loss=0.009737, over 3040413.93 frames. ], batch size: 60, lr: 4.66e-03, grad_scale: 32.0 2023-11-20 14:55:53,513 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.559e+01 8.274e+01 8.667e+01 1.016e+02 1.258e+02, threshold=1.733e+02, percent-clipped=0.0 2023-11-20 14:55:59,601 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.53 vs. limit=10.0 2023-11-20 14:56:03,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1130933.3333333333, ans=0.125 2023-11-20 14:56:04,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1130933.3333333333, ans=0.0 2023-11-20 14:56:08,386 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 169650 2023-11-20 14:56:09,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=1131000.0, ans=0.0 2023-11-20 14:56:23,208 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.26 vs. limit=15.0 2023-11-20 14:56:26,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=1131066.6666666667, ans=0.0 2023-11-20 14:56:49,816 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 1350, loss[loss=0.07756, simple_loss=0.09634, pruned_loss=0.01894, audio_tagging_loss=0.01044, over 13911.00 frames. ], tot_loss[loss=0.07836, simple_loss=0.09981, pruned_loss=0.01873, audio_tagging_loss=0.009728, over 3043714.47 frames. ], batch size: 54, lr: 4.66e-03, grad_scale: 32.0 2023-11-20 14:56:56,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1131200.0, ans=0.0 2023-11-20 14:57:13,583 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 169700 2023-11-20 14:57:15,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1131333.3333333333, ans=0.1 2023-11-20 14:57:20,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=1131333.3333333333, ans=15.0 2023-11-20 14:57:31,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1131400.0, ans=0.0 2023-11-20 14:57:36,944 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 14:57:48,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=1131466.6666666667, ans=0.2 2023-11-20 14:57:55,473 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 1400, loss[loss=0.07788, simple_loss=0.09613, pruned_loss=0.01946, audio_tagging_loss=0.01036, over 15785.00 frames. ], tot_loss[loss=0.07835, simple_loss=0.09947, pruned_loss=0.01878, audio_tagging_loss=0.009836, over 3040442.20 frames. ], batch size: 61, lr: 4.66e-03, grad_scale: 32.0 2023-11-20 14:58:04,236 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.814e+01 7.998e+01 8.583e+01 9.280e+01 1.349e+02, threshold=1.717e+02, percent-clipped=0.0 2023-11-20 14:58:16,310 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.00 vs. limit=15.0 2023-11-20 14:58:19,192 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 169750 2023-11-20 14:58:32,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1131666.6666666667, ans=0.125 2023-11-20 14:59:00,605 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 1450, loss[loss=0.07444, simple_loss=0.1023, pruned_loss=0.0142, audio_tagging_loss=0.009101, over 14984.00 frames. ], tot_loss[loss=0.07877, simple_loss=0.09977, pruned_loss=0.01898, audio_tagging_loss=0.009911, over 3041417.40 frames. ], batch size: 56, lr: 4.66e-03, grad_scale: 32.0 2023-11-20 14:59:24,571 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 169800 2023-11-20 14:59:37,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1132000.0, ans=0.125 2023-11-20 14:59:49,871 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.38 vs. limit=15.0 2023-11-20 15:00:06,420 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 1500, loss[loss=0.08004, simple_loss=0.11, pruned_loss=0.01662, audio_tagging_loss=0.008449, over 14695.00 frames. ], tot_loss[loss=0.07963, simple_loss=0.1009, pruned_loss=0.01931, audio_tagging_loss=0.009871, over 3046133.39 frames. ], batch size: 54, lr: 4.65e-03, grad_scale: 16.0 2023-11-20 15:00:17,043 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.235e+01 8.270e+01 9.018e+01 9.743e+01 1.216e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-20 15:00:29,953 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 169850 2023-11-20 15:00:31,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1132333.3333333333, ans=0.07 2023-11-20 15:00:46,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1132400.0, ans=0.125 2023-11-20 15:00:55,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1132400.0, ans=0.125 2023-11-20 15:01:11,543 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 1550, loss[loss=0.09813, simple_loss=0.1223, pruned_loss=0.02613, audio_tagging_loss=0.01083, over 15498.00 frames. ], tot_loss[loss=0.07985, simple_loss=0.101, pruned_loss=0.01936, audio_tagging_loss=0.009966, over 3048937.50 frames. ], batch size: 57, lr: 4.65e-03, grad_scale: 16.0 2023-11-20 15:01:19,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=1132533.3333333333, ans=0.0 2023-11-20 15:01:34,432 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 169900 2023-11-20 15:01:35,849 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-20 15:01:54,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1132733.3333333333, ans=0.125 2023-11-20 15:02:14,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1132866.6666666667, ans=0.125 2023-11-20 15:02:15,847 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 1600, loss[loss=0.06401, simple_loss=0.08069, pruned_loss=0.01172, audio_tagging_loss=0.01194, over 15551.00 frames. ], tot_loss[loss=0.0797, simple_loss=0.1009, pruned_loss=0.01924, audio_tagging_loss=0.01003, over 3045253.72 frames. ], batch size: 57, lr: 4.65e-03, grad_scale: 32.0 2023-11-20 15:02:16,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=1132866.6666666667, ans=0.2 2023-11-20 15:02:16,642 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.73 vs. limit=15.0 2023-11-20 15:02:17,966 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.08 vs. limit=10.0 2023-11-20 15:02:18,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1132866.6666666667, ans=0.0 2023-11-20 15:02:26,262 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.675e+01 8.383e+01 8.914e+01 9.693e+01 2.648e+02, threshold=1.783e+02, percent-clipped=1.0 2023-11-20 15:02:38,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=1132933.3333333333, ans=0.0 2023-11-20 15:02:39,772 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 169950 2023-11-20 15:02:41,826 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.34 vs. limit=10.0 2023-11-20 15:03:20,764 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 1650, loss[loss=0.09712, simple_loss=0.1199, pruned_loss=0.02776, audio_tagging_loss=0.009425, over 15495.00 frames. ], tot_loss[loss=0.07987, simple_loss=0.1011, pruned_loss=0.01924, audio_tagging_loss=0.01006, over 3053442.92 frames. ], batch size: 57, lr: 4.65e-03, grad_scale: 32.0 2023-11-20 15:03:31,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1133200.0, ans=0.125 2023-11-20 15:03:42,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=1133266.6666666667, ans=0.2 2023-11-20 15:03:44,533 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 170000 2023-11-20 15:04:11,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=1133400.0, ans=0.04949747468305833 2023-11-20 15:04:19,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=1133466.6666666667, ans=0.0 2023-11-20 15:04:26,880 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 1700, loss[loss=0.1135, simple_loss=0.1471, pruned_loss=0.03273, audio_tagging_loss=0.007219, over 14583.00 frames. ], tot_loss[loss=0.08014, simple_loss=0.1017, pruned_loss=0.01923, audio_tagging_loss=0.01009, over 3053230.79 frames. ], batch size: 54, lr: 4.65e-03, grad_scale: 32.0 2023-11-20 15:04:30,034 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.74 vs. limit=22.5 2023-11-20 15:04:36,551 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.057e+01 7.994e+01 8.673e+01 9.340e+01 1.265e+02, threshold=1.735e+02, percent-clipped=0.0 2023-11-20 15:04:48,877 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 170050 2023-11-20 15:05:17,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1133800.0, ans=0.125 2023-11-20 15:05:25,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=1133800.0, ans=0.0 2023-11-20 15:05:30,939 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 1750, loss[loss=0.07181, simple_loss=0.08743, pruned_loss=0.01859, audio_tagging_loss=0.009503, over 14915.00 frames. ], tot_loss[loss=0.07942, simple_loss=0.101, pruned_loss=0.01899, audio_tagging_loss=0.009939, over 3044110.60 frames. ], batch size: 58, lr: 4.65e-03, grad_scale: 32.0 2023-11-20 15:05:32,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=1133866.6666666667, ans=0.125 2023-11-20 15:05:38,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1133866.6666666667, ans=0.125 2023-11-20 15:05:40,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=1133866.6666666667, ans=0.2 2023-11-20 15:05:54,409 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 170100 2023-11-20 15:05:59,224 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.62 vs. limit=15.0 2023-11-20 15:06:05,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1134000.0, ans=0.125 2023-11-20 15:06:19,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1134066.6666666667, ans=0.125 2023-11-20 15:06:25,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=1134133.3333333333, ans=0.125 2023-11-20 15:06:34,257 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.27 vs. limit=15.0 2023-11-20 15:06:34,900 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 1800, loss[loss=0.07568, simple_loss=0.1006, pruned_loss=0.01849, audio_tagging_loss=0.006882, over 14632.00 frames. ], tot_loss[loss=0.07929, simple_loss=0.1011, pruned_loss=0.01885, audio_tagging_loss=0.009889, over 3043152.27 frames. ], batch size: 55, lr: 4.65e-03, grad_scale: 32.0 2023-11-20 15:06:35,445 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.09 vs. limit=15.0 2023-11-20 15:06:46,182 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 5.918e+01 8.061e+01 8.642e+01 9.411e+01 1.208e+02, threshold=1.728e+02, percent-clipped=0.0 2023-11-20 15:06:48,310 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.59 vs. limit=15.0 2023-11-20 15:06:57,444 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.28 vs. limit=15.0 2023-11-20 15:06:59,351 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 170150 2023-11-20 15:07:00,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1134333.3333333333, ans=0.125 2023-11-20 15:07:01,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1134333.3333333333, ans=0.0 2023-11-20 15:07:09,760 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.89 vs. limit=12.0 2023-11-20 15:07:15,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=1134400.0, ans=0.125 2023-11-20 15:07:19,640 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.91 vs. limit=15.0 2023-11-20 15:07:30,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=1134466.6666666667, ans=0.125 2023-11-20 15:07:39,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1134533.3333333333, ans=0.1 2023-11-20 15:07:40,570 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 1850, loss[loss=0.07104, simple_loss=0.09468, pruned_loss=0.01322, audio_tagging_loss=0.01047, over 16119.00 frames. ], tot_loss[loss=0.0783, simple_loss=0.09956, pruned_loss=0.01872, audio_tagging_loss=0.0098, over 3040004.34 frames. ], batch size: 62, lr: 4.65e-03, grad_scale: 16.0 2023-11-20 15:07:42,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1134533.3333333333, ans=0.125 2023-11-20 15:07:58,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1134600.0, ans=0.95 2023-11-20 15:07:58,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1134600.0, ans=0.125 2023-11-20 15:08:03,589 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 170200 2023-11-20 15:08:10,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1134666.6666666667, ans=0.025 2023-11-20 15:08:14,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1134666.6666666667, ans=0.125 2023-11-20 15:08:44,765 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.15 vs. limit=15.0 2023-11-20 15:08:44,969 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.63 vs. limit=15.0 2023-11-20 15:08:45,389 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 1900, loss[loss=0.08035, simple_loss=0.112, pruned_loss=0.01558, audio_tagging_loss=0.008753, over 16360.00 frames. ], tot_loss[loss=0.07882, simple_loss=0.1005, pruned_loss=0.01886, audio_tagging_loss=0.009702, over 3043244.21 frames. ], batch size: 61, lr: 4.65e-03, grad_scale: 16.0 2023-11-20 15:08:56,327 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.854e+01 8.016e+01 8.806e+01 9.698e+01 1.880e+02, threshold=1.761e+02, percent-clipped=1.0 2023-11-20 15:08:59,469 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.14 vs. limit=15.0 2023-11-20 15:09:02,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1134933.3333333333, ans=0.0 2023-11-20 15:09:08,234 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 170250 2023-11-20 15:09:42,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1135133.3333333333, ans=0.125 2023-11-20 15:09:48,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=1135200.0, ans=0.05 2023-11-20 15:09:49,304 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 1950, loss[loss=0.07011, simple_loss=0.09071, pruned_loss=0.01479, audio_tagging_loss=0.009965, over 15000.00 frames. ], tot_loss[loss=0.07862, simple_loss=0.1002, pruned_loss=0.01877, audio_tagging_loss=0.009726, over 3043571.47 frames. ], batch size: 58, lr: 4.65e-03, grad_scale: 16.0 2023-11-20 15:10:04,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1135266.6666666667, ans=0.015 2023-11-20 15:10:13,356 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 170300 2023-11-20 15:10:51,876 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.57 vs. limit=22.5 2023-11-20 15:10:53,018 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.80 vs. limit=12.0 2023-11-20 15:10:53,726 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 2000, loss[loss=0.07037, simple_loss=0.08557, pruned_loss=0.0155, audio_tagging_loss=0.01208, over 13589.00 frames. ], tot_loss[loss=0.0787, simple_loss=0.09995, pruned_loss=0.01894, audio_tagging_loss=0.009786, over 3032147.47 frames. ], batch size: 52, lr: 4.65e-03, grad_scale: 32.0 2023-11-20 15:11:05,446 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.656e+01 7.869e+01 8.530e+01 9.315e+01 1.202e+02, threshold=1.706e+02, percent-clipped=0.0 2023-11-20 15:11:16,597 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 170350 2023-11-20 15:11:34,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1135733.3333333333, ans=0.0 2023-11-20 15:11:39,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1135733.3333333333, ans=0.125 2023-11-20 15:11:41,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1135733.3333333333, ans=0.125 2023-11-20 15:11:58,365 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 2050, loss[loss=0.0716, simple_loss=0.09025, pruned_loss=0.01772, audio_tagging_loss=0.008758, over 15265.00 frames. ], tot_loss[loss=0.07824, simple_loss=0.09937, pruned_loss=0.0188, audio_tagging_loss=0.009746, over 3035155.22 frames. ], batch size: 60, lr: 4.65e-03, grad_scale: 32.0 2023-11-20 15:12:21,233 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 170400 2023-11-20 15:12:39,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1136066.6666666667, ans=0.0 2023-11-20 15:13:02,588 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 2100, loss[loss=0.06945, simple_loss=0.08556, pruned_loss=0.01614, audio_tagging_loss=0.01053, over 14933.00 frames. ], tot_loss[loss=0.07825, simple_loss=0.09917, pruned_loss=0.01887, audio_tagging_loss=0.00979, over 3039149.44 frames. ], batch size: 57, lr: 4.65e-03, grad_scale: 32.0 2023-11-20 15:13:10,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=1136200.0, ans=0.1 2023-11-20 15:13:14,262 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.146e+01 8.226e+01 8.979e+01 1.003e+02 1.386e+02, threshold=1.796e+02, percent-clipped=0.0 2023-11-20 15:13:26,668 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 170450 2023-11-20 15:13:42,504 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.61 vs. limit=15.0 2023-11-20 15:13:45,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1136400.0, ans=0.125 2023-11-20 15:14:02,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1136466.6666666667, ans=0.125 2023-11-20 15:14:07,086 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 2150, loss[loss=0.07964, simple_loss=0.09336, pruned_loss=0.02273, audio_tagging_loss=0.01022, over 15308.00 frames. ], tot_loss[loss=0.0781, simple_loss=0.09903, pruned_loss=0.01881, audio_tagging_loss=0.009776, over 3043629.14 frames. ], batch size: 59, lr: 4.65e-03, grad_scale: 32.0 2023-11-20 15:14:10,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1136533.3333333333, ans=0.0 2023-11-20 15:14:30,409 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 170500 2023-11-20 15:14:44,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=1136733.3333333333, ans=0.0 2023-11-20 15:14:45,077 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 15:14:52,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1136733.3333333333, ans=0.1 2023-11-20 15:15:12,345 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 2200, loss[loss=0.09224, simple_loss=0.1151, pruned_loss=0.02553, audio_tagging_loss=0.009154, over 16219.00 frames. ], tot_loss[loss=0.07798, simple_loss=0.09873, pruned_loss=0.01881, audio_tagging_loss=0.00981, over 3048023.60 frames. ], batch size: 61, lr: 4.65e-03, grad_scale: 32.0 2023-11-20 15:15:14,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1136866.6666666667, ans=0.125 2023-11-20 15:15:23,630 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.205e+01 8.421e+01 8.880e+01 9.731e+01 1.234e+02, threshold=1.776e+02, percent-clipped=0.0 2023-11-20 15:15:30,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=1136933.3333333333, ans=0.2 2023-11-20 15:15:33,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1136933.3333333333, ans=0.125 2023-11-20 15:15:34,860 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 170550 2023-11-20 15:15:38,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1137000.0, ans=0.1 2023-11-20 15:15:41,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=1137000.0, ans=0.0 2023-11-20 15:16:00,032 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.09 vs. limit=15.0 2023-11-20 15:16:02,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1137066.6666666667, ans=0.125 2023-11-20 15:16:03,440 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-20 15:16:05,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=1137133.3333333333, ans=0.125 2023-11-20 15:16:16,549 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 2250, loss[loss=0.0891, simple_loss=0.108, pruned_loss=0.02552, audio_tagging_loss=0.009575, over 15148.00 frames. ], tot_loss[loss=0.0779, simple_loss=0.09859, pruned_loss=0.01873, audio_tagging_loss=0.009876, over 3045785.03 frames. ], batch size: 56, lr: 4.64e-03, grad_scale: 32.0 2023-11-20 15:16:30,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1137266.6666666667, ans=0.035 2023-11-20 15:16:31,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1137266.6666666667, ans=0.0 2023-11-20 15:16:36,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=1137266.6666666667, ans=0.125 2023-11-20 15:16:39,901 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 170600 2023-11-20 15:17:08,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1137466.6666666667, ans=0.1 2023-11-20 15:17:10,433 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.85 vs. limit=22.5 2023-11-20 15:17:21,563 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 2300, loss[loss=0.05837, simple_loss=0.06872, pruned_loss=0.009263, audio_tagging_loss=0.01475, over 14344.00 frames. ], tot_loss[loss=0.07794, simple_loss=0.09863, pruned_loss=0.01868, audio_tagging_loss=0.009938, over 3045462.53 frames. ], batch size: 55, lr: 4.64e-03, grad_scale: 32.0 2023-11-20 15:17:33,172 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.682e+01 8.112e+01 8.584e+01 9.317e+01 1.375e+02, threshold=1.717e+02, percent-clipped=0.0 2023-11-20 15:17:35,951 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.29 vs. limit=15.0 2023-11-20 15:17:38,281 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.33 vs. limit=15.0 2023-11-20 15:17:45,536 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 170650 2023-11-20 15:17:46,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=1137666.6666666667, ans=0.05 2023-11-20 15:18:06,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=1137733.3333333333, ans=0.125 2023-11-20 15:18:18,342 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 15:18:19,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1137800.0, ans=0.1 2023-11-20 15:18:24,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=1137800.0, ans=0.025 2023-11-20 15:18:26,355 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 2350, loss[loss=0.08297, simple_loss=0.09875, pruned_loss=0.0216, audio_tagging_loss=0.012, over 15124.00 frames. ], tot_loss[loss=0.07813, simple_loss=0.09907, pruned_loss=0.01867, audio_tagging_loss=0.009918, over 3040876.21 frames. ], batch size: 56, lr: 4.64e-03, grad_scale: 32.0 2023-11-20 15:18:48,988 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 170700 2023-11-20 15:18:51,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=1138000.0, ans=0.125 2023-11-20 15:18:56,545 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.38 vs. limit=15.0 2023-11-20 15:19:10,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1138066.6666666667, ans=0.125 2023-11-20 15:19:19,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1138133.3333333333, ans=0.0 2023-11-20 15:19:26,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1138133.3333333333, ans=0.2 2023-11-20 15:19:27,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1138133.3333333333, ans=0.1 2023-11-20 15:19:30,682 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 2400, loss[loss=0.08209, simple_loss=0.1028, pruned_loss=0.02063, audio_tagging_loss=0.01006, over 14785.00 frames. ], tot_loss[loss=0.07758, simple_loss=0.09843, pruned_loss=0.01847, audio_tagging_loss=0.009895, over 3040525.67 frames. ], batch size: 57, lr: 4.64e-03, grad_scale: 32.0 2023-11-20 15:19:32,228 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1138200.0, ans=0.1 2023-11-20 15:19:33,690 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.85 vs. limit=15.0 2023-11-20 15:19:42,894 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.480e+01 8.080e+01 8.821e+01 9.568e+01 1.388e+02, threshold=1.764e+02, percent-clipped=0.0 2023-11-20 15:19:46,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1138266.6666666667, ans=0.0 2023-11-20 15:19:54,148 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 170750 2023-11-20 15:19:56,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=1138333.3333333333, ans=0.125 2023-11-20 15:19:58,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1138333.3333333333, ans=0.1 2023-11-20 15:20:10,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=1138400.0, ans=0.0 2023-11-20 15:20:20,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=1138400.0, ans=0.125 2023-11-20 15:20:34,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1138533.3333333333, ans=0.1 2023-11-20 15:20:35,610 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 2450, loss[loss=0.08675, simple_loss=0.1083, pruned_loss=0.01943, audio_tagging_loss=0.01316, over 15960.00 frames. ], tot_loss[loss=0.07737, simple_loss=0.09796, pruned_loss=0.01837, audio_tagging_loss=0.01002, over 3046503.86 frames. ], batch size: 61, lr: 4.64e-03, grad_scale: 16.0 2023-11-20 15:20:35,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1138533.3333333333, ans=0.125 2023-11-20 15:20:46,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=1138533.3333333333, ans=0.0 2023-11-20 15:20:59,029 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 170800 2023-11-20 15:21:40,190 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.52 vs. limit=15.0 2023-11-20 15:21:41,384 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 2500, loss[loss=0.06828, simple_loss=0.08614, pruned_loss=0.01479, audio_tagging_loss=0.01041, over 14971.00 frames. ], tot_loss[loss=0.07823, simple_loss=0.09922, pruned_loss=0.01865, audio_tagging_loss=0.009964, over 3038097.23 frames. ], batch size: 56, lr: 4.64e-03, grad_scale: 16.0 2023-11-20 15:21:45,692 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.87 vs. limit=22.5 2023-11-20 15:21:54,766 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.766e+01 8.090e+01 8.633e+01 9.562e+01 1.495e+02, threshold=1.727e+02, percent-clipped=0.0 2023-11-20 15:21:57,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=1138933.3333333333, ans=0.07 2023-11-20 15:22:04,112 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 170850 2023-11-20 15:22:10,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=1139000.0, ans=0.2 2023-11-20 15:22:11,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=1139000.0, ans=10.0 2023-11-20 15:22:22,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=1139066.6666666667, ans=0.0 2023-11-20 15:22:25,515 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.49 vs. limit=22.5 2023-11-20 15:22:35,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1139133.3333333333, ans=0.125 2023-11-20 15:22:38,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1139133.3333333333, ans=0.1 2023-11-20 15:22:45,287 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 2550, loss[loss=0.09204, simple_loss=0.1043, pruned_loss=0.03131, audio_tagging_loss=0.008594, over 13750.00 frames. ], tot_loss[loss=0.07861, simple_loss=0.09953, pruned_loss=0.01886, audio_tagging_loss=0.009992, over 3043900.68 frames. ], batch size: 53, lr: 4.64e-03, grad_scale: 16.0 2023-11-20 15:22:54,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=1139200.0, ans=0.125 2023-11-20 15:22:59,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=1139266.6666666667, ans=0.125 2023-11-20 15:23:08,681 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 170900 2023-11-20 15:23:11,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=1139333.3333333333, ans=0.125 2023-11-20 15:23:14,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1139333.3333333333, ans=0.1 2023-11-20 15:23:20,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1139333.3333333333, ans=0.125 2023-11-20 15:23:22,792 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.50 vs. limit=15.0 2023-11-20 15:23:48,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=1139533.3333333333, ans=0.125 2023-11-20 15:23:50,116 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 2600, loss[loss=0.06708, simple_loss=0.09029, pruned_loss=0.013, audio_tagging_loss=0.008934, over 14839.00 frames. ], tot_loss[loss=0.07743, simple_loss=0.09805, pruned_loss=0.01845, audio_tagging_loss=0.009952, over 3040587.49 frames. ], batch size: 56, lr: 4.64e-03, grad_scale: 16.0 2023-11-20 15:23:54,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1139533.3333333333, ans=0.1 2023-11-20 15:24:00,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=1139533.3333333333, ans=0.125 2023-11-20 15:24:00,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1139533.3333333333, ans=0.0 2023-11-20 15:24:04,208 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.595e+01 8.100e+01 8.871e+01 9.785e+01 4.234e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-20 15:24:13,749 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 170950 2023-11-20 15:24:55,131 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 2650, loss[loss=0.08928, simple_loss=0.1082, pruned_loss=0.02569, audio_tagging_loss=0.009502, over 14239.00 frames. ], tot_loss[loss=0.07815, simple_loss=0.09909, pruned_loss=0.01881, audio_tagging_loss=0.009799, over 3049468.17 frames. ], batch size: 55, lr: 4.64e-03, grad_scale: 16.0 2023-11-20 15:25:06,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=1139933.3333333333, ans=0.0 2023-11-20 15:25:10,734 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.98 vs. limit=10.0 2023-11-20 15:25:18,217 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 171000 2023-11-20 15:25:41,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=1140066.6666666667, ans=0.125 2023-11-20 15:25:50,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=1140133.3333333333, ans=0.2 2023-11-20 15:25:53,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1140133.3333333333, ans=10.0 2023-11-20 15:25:56,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=1140133.3333333333, ans=0.0 2023-11-20 15:26:00,155 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 2700, loss[loss=0.06505, simple_loss=0.08847, pruned_loss=0.00957, audio_tagging_loss=0.01124, over 16472.00 frames. ], tot_loss[loss=0.0778, simple_loss=0.09875, pruned_loss=0.01869, audio_tagging_loss=0.009732, over 3051680.79 frames. ], batch size: 62, lr: 4.64e-03, grad_scale: 16.0 2023-11-20 15:26:14,307 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.575e+01 8.137e+01 8.718e+01 9.635e+01 1.314e+02, threshold=1.744e+02, percent-clipped=1.0 2023-11-20 15:26:21,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1140266.6666666667, ans=0.2 2023-11-20 15:26:23,564 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 171050 2023-11-20 15:26:29,845 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.36 vs. limit=15.0 2023-11-20 15:26:34,642 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.18 vs. limit=15.0 2023-11-20 15:27:04,570 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 2750, loss[loss=0.07463, simple_loss=0.08437, pruned_loss=0.02011, audio_tagging_loss=0.01233, over 15543.00 frames. ], tot_loss[loss=0.0786, simple_loss=0.09937, pruned_loss=0.01912, audio_tagging_loss=0.009797, over 3044983.57 frames. ], batch size: 58, lr: 4.64e-03, grad_scale: 16.0 2023-11-20 15:27:16,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1140600.0, ans=0.2 2023-11-20 15:27:16,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1140600.0, ans=0.125 2023-11-20 15:27:27,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1140600.0, ans=0.07 2023-11-20 15:27:28,515 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 171100 2023-11-20 15:27:52,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=1140733.3333333333, ans=0.2 2023-11-20 15:27:53,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1140733.3333333333, ans=0.125 2023-11-20 15:28:00,066 WARNING [train_asr.py:1506] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-20 15:28:05,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1140800.0, ans=0.2 2023-11-20 15:28:09,480 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 2800, loss[loss=0.08646, simple_loss=0.1103, pruned_loss=0.01979, audio_tagging_loss=0.01153, over 15305.00 frames. ], tot_loss[loss=0.07842, simple_loss=0.09911, pruned_loss=0.019, audio_tagging_loss=0.009869, over 3044681.87 frames. ], batch size: 59, lr: 4.64e-03, grad_scale: 32.0 2023-11-20 15:28:12,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1140866.6666666667, ans=0.125 2023-11-20 15:28:23,471 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.091e+01 8.017e+01 8.655e+01 9.428e+01 1.274e+02, threshold=1.731e+02, percent-clipped=0.0 2023-11-20 15:28:24,262 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.45 vs. limit=12.0 2023-11-20 15:28:32,328 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 171150 2023-11-20 15:28:39,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=1141000.0, ans=0.0 2023-11-20 15:28:48,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1141066.6666666667, ans=0.125 2023-11-20 15:28:57,045 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.63 vs. limit=15.0 2023-11-20 15:29:10,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1141133.3333333333, ans=0.0 2023-11-20 15:29:12,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=1141200.0, ans=0.125 2023-11-20 15:29:13,862 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 2850, loss[loss=0.08774, simple_loss=0.1171, pruned_loss=0.02017, audio_tagging_loss=0.008998, over 15556.00 frames. ], tot_loss[loss=0.07841, simple_loss=0.09917, pruned_loss=0.01896, audio_tagging_loss=0.009857, over 3039625.17 frames. ], batch size: 56, lr: 4.64e-03, grad_scale: 32.0 2023-11-20 15:29:21,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=1141200.0, ans=0.07 2023-11-20 15:29:37,377 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 171200 2023-11-20 15:29:49,878 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.07 vs. limit=15.0 2023-11-20 15:29:51,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=1141400.0, ans=0.125 2023-11-20 15:30:10,232 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.08 vs. limit=15.0 2023-11-20 15:30:12,469 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.74 vs. limit=15.0 2023-11-20 15:30:18,051 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 2900, loss[loss=0.1109, simple_loss=0.1571, pruned_loss=0.02539, audio_tagging_loss=0.006909, over 16284.00 frames. ], tot_loss[loss=0.07844, simple_loss=0.09937, pruned_loss=0.01895, audio_tagging_loss=0.009801, over 3040975.35 frames. ], batch size: 55, lr: 4.64e-03, grad_scale: 32.0 2023-11-20 15:30:32,494 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.283e+01 7.914e+01 8.602e+01 9.219e+01 1.779e+02, threshold=1.720e+02, percent-clipped=1.0 2023-11-20 15:30:42,564 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 171250 2023-11-20 15:30:45,691 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.55 vs. limit=15.0 2023-11-20 15:30:54,333 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.54 vs. limit=15.0 2023-11-20 15:31:05,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=1141733.3333333333, ans=0.07 2023-11-20 15:31:19,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=1141800.0, ans=0.2 2023-11-20 15:31:22,594 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.94 vs. limit=12.0 2023-11-20 15:31:23,107 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 2950, loss[loss=0.07336, simple_loss=0.08831, pruned_loss=0.01639, audio_tagging_loss=0.01282, over 15494.00 frames. ], tot_loss[loss=0.07879, simple_loss=0.09978, pruned_loss=0.01906, audio_tagging_loss=0.009836, over 3044179.49 frames. ], batch size: 60, lr: 4.64e-03, grad_scale: 32.0 2023-11-20 15:31:27,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=1141866.6666666667, ans=0.125 2023-11-20 15:31:29,357 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.75 vs. limit=15.0 2023-11-20 15:31:39,748 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.38 vs. limit=15.0 2023-11-20 15:31:46,780 INFO [model.py:792] (1/4) Freeze_encoder: False; Current batch idx: 171300 2023-11-20 15:32:04,703 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-20 15:32:13,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1142066.6666666667, ans=0.0 2023-11-20 15:32:28,333 INFO [train_asr.py:1262] (1/4) Epoch 15, batch 3000, loss[loss=0.08221, simple_loss=0.1027, pruned_loss=0.02042, audio_tagging_loss=0.01045, over 15100.00 frames. ], tot_loss[loss=0.07857, simple_loss=0.09941, pruned_loss=0.01902, audio_tagging_loss=0.009838, over 3046537.87 frames. ], batch size: 56, lr: 4.63e-03, grad_scale: 32.0 2023-11-20 15:32:28,333 INFO [train_asr.py:1285] (1/4) Computing validation loss