xezpeleta commited on
Commit
7cb6af5
Β·
verified Β·
1 Parent(s): 874b57d

End of training

Browse files
README.md CHANGED
@@ -3,20 +3,33 @@ library_name: transformers
3
  license: apache-2.0
4
  base_model: openai/whisper-base
5
  tags:
 
6
  - generated_from_trainer
 
 
7
  metrics:
8
  - wer
9
  model-index:
10
- - name: openai/whisper-base
11
- results: []
 
 
 
 
 
 
 
 
 
 
12
  ---
13
 
14
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
  should probably proofread and complete it, then remove this comment. -->
16
 
17
- # openai/whisper-base
18
 
19
- This model is a fine-tuned version of [openai/whisper-base](https://huggingface.co/openai/whisper-base) on an unknown dataset.
20
  It achieves the following results on the evaluation set:
21
  - Loss: 0.2452
22
  - Wer: 13.8170
 
3
  license: apache-2.0
4
  base_model: openai/whisper-base
5
  tags:
6
+ - whisper-event
7
  - generated_from_trainer
8
+ datasets:
9
+ - asierhv/composite_corpus_eu_v2.1
10
  metrics:
11
  - wer
12
  model-index:
13
+ - name: Whisper Base Basque
14
+ results:
15
+ - task:
16
+ name: Automatic Speech Recognition
17
+ type: automatic-speech-recognition
18
+ dataset:
19
+ name: asierhv/composite_corpus_eu_v2.1
20
+ type: asierhv/composite_corpus_eu_v2.1
21
+ metrics:
22
+ - name: Wer
23
+ type: wer
24
+ value: 13.816958025614658
25
  ---
26
 
27
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
28
  should probably proofread and complete it, then remove this comment. -->
29
 
30
+ # Whisper Base Basque
31
 
32
+ This model is a fine-tuned version of [openai/whisper-base](https://huggingface.co/openai/whisper-base) on the asierhv/composite_corpus_eu_v2.1 dataset.
33
  It achieves the following results on the evaluation set:
34
  - Loss: 0.2452
35
  - Wer: 13.8170
all_results.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "eval_loss": 0.24521400034427643,
4
+ "eval_runtime": 74.5154,
5
+ "eval_samples_per_second": 28.236,
6
+ "eval_steps_per_second": 1.771,
7
+ "eval_wer": 13.816958025614658,
8
+ "total_flos": 1.660415901696e+19,
9
+ "train_loss": 0.22206098145246506,
10
+ "train_runtime": 4270.5513,
11
+ "train_samples_per_second": 59.945,
12
+ "train_steps_per_second": 1.873
13
+ }
eval_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "eval_loss": 0.24521400034427643,
4
+ "eval_runtime": 74.5154,
5
+ "eval_samples_per_second": 28.236,
6
+ "eval_steps_per_second": 1.771,
7
+ "eval_wer": 13.816958025614658
8
+ }
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "total_flos": 1.660415901696e+19,
4
+ "train_loss": 0.22206098145246506,
5
+ "train_runtime": 4270.5513,
6
+ "train_samples_per_second": 59.945,
7
+ "train_steps_per_second": 1.873
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,2354 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 13.816958025614658,
3
+ "best_model_checkpoint": "./checkpoint-8000",
4
+ "epoch": 1.0,
5
+ "eval_steps": 1000,
6
+ "global_step": 8000,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.003125,
13
+ "grad_norm": 21.98771095275879,
14
+ "learning_rate": 8.8e-07,
15
+ "loss": 3.1382,
16
+ "step": 25
17
+ },
18
+ {
19
+ "epoch": 0.00625,
20
+ "grad_norm": 11.417325019836426,
21
+ "learning_rate": 1.8800000000000002e-06,
22
+ "loss": 2.4524,
23
+ "step": 50
24
+ },
25
+ {
26
+ "epoch": 0.009375,
27
+ "grad_norm": 9.939048767089844,
28
+ "learning_rate": 2.88e-06,
29
+ "loss": 1.6386,
30
+ "step": 75
31
+ },
32
+ {
33
+ "epoch": 0.0125,
34
+ "grad_norm": 8.167191505432129,
35
+ "learning_rate": 3.88e-06,
36
+ "loss": 1.26,
37
+ "step": 100
38
+ },
39
+ {
40
+ "epoch": 0.015625,
41
+ "grad_norm": 7.114121913909912,
42
+ "learning_rate": 4.880000000000001e-06,
43
+ "loss": 1.0564,
44
+ "step": 125
45
+ },
46
+ {
47
+ "epoch": 0.01875,
48
+ "grad_norm": 6.8056840896606445,
49
+ "learning_rate": 5.8800000000000005e-06,
50
+ "loss": 0.897,
51
+ "step": 150
52
+ },
53
+ {
54
+ "epoch": 0.021875,
55
+ "grad_norm": 6.363353252410889,
56
+ "learning_rate": 6.88e-06,
57
+ "loss": 0.8252,
58
+ "step": 175
59
+ },
60
+ {
61
+ "epoch": 0.025,
62
+ "grad_norm": 6.622057914733887,
63
+ "learning_rate": 7.88e-06,
64
+ "loss": 0.7693,
65
+ "step": 200
66
+ },
67
+ {
68
+ "epoch": 0.028125,
69
+ "grad_norm": 5.045984745025635,
70
+ "learning_rate": 8.880000000000001e-06,
71
+ "loss": 0.6621,
72
+ "step": 225
73
+ },
74
+ {
75
+ "epoch": 0.03125,
76
+ "grad_norm": 7.978261947631836,
77
+ "learning_rate": 9.88e-06,
78
+ "loss": 0.6861,
79
+ "step": 250
80
+ },
81
+ {
82
+ "epoch": 0.034375,
83
+ "grad_norm": 6.535711288452148,
84
+ "learning_rate": 1.0880000000000001e-05,
85
+ "loss": 0.6888,
86
+ "step": 275
87
+ },
88
+ {
89
+ "epoch": 0.0375,
90
+ "grad_norm": 6.781430721282959,
91
+ "learning_rate": 1.188e-05,
92
+ "loss": 0.6648,
93
+ "step": 300
94
+ },
95
+ {
96
+ "epoch": 0.040625,
97
+ "grad_norm": 5.826904773712158,
98
+ "learning_rate": 1.2880000000000002e-05,
99
+ "loss": 0.5983,
100
+ "step": 325
101
+ },
102
+ {
103
+ "epoch": 0.04375,
104
+ "grad_norm": 5.830564975738525,
105
+ "learning_rate": 1.3880000000000001e-05,
106
+ "loss": 0.5272,
107
+ "step": 350
108
+ },
109
+ {
110
+ "epoch": 0.046875,
111
+ "grad_norm": 5.638543128967285,
112
+ "learning_rate": 1.4880000000000002e-05,
113
+ "loss": 0.4479,
114
+ "step": 375
115
+ },
116
+ {
117
+ "epoch": 0.05,
118
+ "grad_norm": 4.451868057250977,
119
+ "learning_rate": 1.588e-05,
120
+ "loss": 0.4341,
121
+ "step": 400
122
+ },
123
+ {
124
+ "epoch": 0.053125,
125
+ "grad_norm": 4.475216865539551,
126
+ "learning_rate": 1.688e-05,
127
+ "loss": 0.3916,
128
+ "step": 425
129
+ },
130
+ {
131
+ "epoch": 0.05625,
132
+ "grad_norm": 4.683038711547852,
133
+ "learning_rate": 1.788e-05,
134
+ "loss": 0.3745,
135
+ "step": 450
136
+ },
137
+ {
138
+ "epoch": 0.059375,
139
+ "grad_norm": 4.93367862701416,
140
+ "learning_rate": 1.8880000000000002e-05,
141
+ "loss": 0.3569,
142
+ "step": 475
143
+ },
144
+ {
145
+ "epoch": 0.0625,
146
+ "grad_norm": 4.0592041015625,
147
+ "learning_rate": 1.9880000000000003e-05,
148
+ "loss": 0.3336,
149
+ "step": 500
150
+ },
151
+ {
152
+ "epoch": 0.065625,
153
+ "grad_norm": 5.144535064697266,
154
+ "learning_rate": 1.9941333333333335e-05,
155
+ "loss": 0.3092,
156
+ "step": 525
157
+ },
158
+ {
159
+ "epoch": 0.06875,
160
+ "grad_norm": 4.806638717651367,
161
+ "learning_rate": 1.9874666666666668e-05,
162
+ "loss": 0.3106,
163
+ "step": 550
164
+ },
165
+ {
166
+ "epoch": 0.071875,
167
+ "grad_norm": 4.3809638023376465,
168
+ "learning_rate": 1.9808e-05,
169
+ "loss": 0.2971,
170
+ "step": 575
171
+ },
172
+ {
173
+ "epoch": 0.075,
174
+ "grad_norm": 5.35611629486084,
175
+ "learning_rate": 1.9741333333333334e-05,
176
+ "loss": 0.2897,
177
+ "step": 600
178
+ },
179
+ {
180
+ "epoch": 0.078125,
181
+ "grad_norm": 4.62730598449707,
182
+ "learning_rate": 1.967466666666667e-05,
183
+ "loss": 0.2607,
184
+ "step": 625
185
+ },
186
+ {
187
+ "epoch": 0.08125,
188
+ "grad_norm": 3.60905122756958,
189
+ "learning_rate": 1.9608000000000003e-05,
190
+ "loss": 0.2765,
191
+ "step": 650
192
+ },
193
+ {
194
+ "epoch": 0.084375,
195
+ "grad_norm": 3.7476887702941895,
196
+ "learning_rate": 1.9541333333333336e-05,
197
+ "loss": 0.2661,
198
+ "step": 675
199
+ },
200
+ {
201
+ "epoch": 0.0875,
202
+ "grad_norm": 4.03351354598999,
203
+ "learning_rate": 1.947466666666667e-05,
204
+ "loss": 0.271,
205
+ "step": 700
206
+ },
207
+ {
208
+ "epoch": 0.090625,
209
+ "grad_norm": 6.19671106338501,
210
+ "learning_rate": 1.9408e-05,
211
+ "loss": 0.3045,
212
+ "step": 725
213
+ },
214
+ {
215
+ "epoch": 0.09375,
216
+ "grad_norm": 6.079224586486816,
217
+ "learning_rate": 1.9341333333333334e-05,
218
+ "loss": 0.4168,
219
+ "step": 750
220
+ },
221
+ {
222
+ "epoch": 0.096875,
223
+ "grad_norm": 5.619544982910156,
224
+ "learning_rate": 1.9274666666666667e-05,
225
+ "loss": 0.4038,
226
+ "step": 775
227
+ },
228
+ {
229
+ "epoch": 0.1,
230
+ "grad_norm": 5.917449951171875,
231
+ "learning_rate": 1.9208000000000003e-05,
232
+ "loss": 0.4418,
233
+ "step": 800
234
+ },
235
+ {
236
+ "epoch": 0.103125,
237
+ "grad_norm": 3.9295430183410645,
238
+ "learning_rate": 1.9141333333333333e-05,
239
+ "loss": 0.3051,
240
+ "step": 825
241
+ },
242
+ {
243
+ "epoch": 0.10625,
244
+ "grad_norm": 3.26605486869812,
245
+ "learning_rate": 1.907466666666667e-05,
246
+ "loss": 0.2447,
247
+ "step": 850
248
+ },
249
+ {
250
+ "epoch": 0.109375,
251
+ "grad_norm": 4.2773051261901855,
252
+ "learning_rate": 1.9008e-05,
253
+ "loss": 0.2316,
254
+ "step": 875
255
+ },
256
+ {
257
+ "epoch": 0.1125,
258
+ "grad_norm": 5.690479755401611,
259
+ "learning_rate": 1.8941333333333334e-05,
260
+ "loss": 0.3193,
261
+ "step": 900
262
+ },
263
+ {
264
+ "epoch": 0.115625,
265
+ "grad_norm": 5.861849308013916,
266
+ "learning_rate": 1.8874666666666667e-05,
267
+ "loss": 0.3742,
268
+ "step": 925
269
+ },
270
+ {
271
+ "epoch": 0.11875,
272
+ "grad_norm": 5.605452537536621,
273
+ "learning_rate": 1.8808e-05,
274
+ "loss": 0.3622,
275
+ "step": 950
276
+ },
277
+ {
278
+ "epoch": 0.121875,
279
+ "grad_norm": 6.191706657409668,
280
+ "learning_rate": 1.8741333333333336e-05,
281
+ "loss": 0.5693,
282
+ "step": 975
283
+ },
284
+ {
285
+ "epoch": 0.125,
286
+ "grad_norm": 6.441635608673096,
287
+ "learning_rate": 1.867466666666667e-05,
288
+ "loss": 0.4951,
289
+ "step": 1000
290
+ },
291
+ {
292
+ "epoch": 0.125,
293
+ "eval_loss": 0.49006596207618713,
294
+ "eval_runtime": 77.9711,
295
+ "eval_samples_per_second": 26.984,
296
+ "eval_steps_per_second": 1.693,
297
+ "eval_wer": 27.05431429372721,
298
+ "step": 1000
299
+ },
300
+ {
301
+ "epoch": 0.128125,
302
+ "grad_norm": 5.985304832458496,
303
+ "learning_rate": 1.8608000000000002e-05,
304
+ "loss": 0.4113,
305
+ "step": 1025
306
+ },
307
+ {
308
+ "epoch": 0.13125,
309
+ "grad_norm": 5.439868450164795,
310
+ "learning_rate": 1.8541333333333335e-05,
311
+ "loss": 0.3116,
312
+ "step": 1050
313
+ },
314
+ {
315
+ "epoch": 0.134375,
316
+ "grad_norm": 3.6065237522125244,
317
+ "learning_rate": 1.8474666666666668e-05,
318
+ "loss": 0.223,
319
+ "step": 1075
320
+ },
321
+ {
322
+ "epoch": 0.1375,
323
+ "grad_norm": 3.4091665744781494,
324
+ "learning_rate": 1.8408e-05,
325
+ "loss": 0.1918,
326
+ "step": 1100
327
+ },
328
+ {
329
+ "epoch": 0.140625,
330
+ "grad_norm": 2.80391001701355,
331
+ "learning_rate": 1.8341333333333337e-05,
332
+ "loss": 0.1784,
333
+ "step": 1125
334
+ },
335
+ {
336
+ "epoch": 0.14375,
337
+ "grad_norm": 3.194566011428833,
338
+ "learning_rate": 1.8274666666666666e-05,
339
+ "loss": 0.1988,
340
+ "step": 1150
341
+ },
342
+ {
343
+ "epoch": 0.146875,
344
+ "grad_norm": 3.294611930847168,
345
+ "learning_rate": 1.8208000000000003e-05,
346
+ "loss": 0.2032,
347
+ "step": 1175
348
+ },
349
+ {
350
+ "epoch": 0.15,
351
+ "grad_norm": 3.653074264526367,
352
+ "learning_rate": 1.8141333333333335e-05,
353
+ "loss": 0.1914,
354
+ "step": 1200
355
+ },
356
+ {
357
+ "epoch": 0.153125,
358
+ "grad_norm": 4.978360176086426,
359
+ "learning_rate": 1.8074666666666668e-05,
360
+ "loss": 0.2865,
361
+ "step": 1225
362
+ },
363
+ {
364
+ "epoch": 0.15625,
365
+ "grad_norm": 5.2725701332092285,
366
+ "learning_rate": 1.8008e-05,
367
+ "loss": 0.306,
368
+ "step": 1250
369
+ },
370
+ {
371
+ "epoch": 0.159375,
372
+ "grad_norm": 5.122636318206787,
373
+ "learning_rate": 1.7941333333333334e-05,
374
+ "loss": 0.3283,
375
+ "step": 1275
376
+ },
377
+ {
378
+ "epoch": 0.1625,
379
+ "grad_norm": 5.599034786224365,
380
+ "learning_rate": 1.787466666666667e-05,
381
+ "loss": 0.3472,
382
+ "step": 1300
383
+ },
384
+ {
385
+ "epoch": 0.165625,
386
+ "grad_norm": 5.392984867095947,
387
+ "learning_rate": 1.7808e-05,
388
+ "loss": 0.3073,
389
+ "step": 1325
390
+ },
391
+ {
392
+ "epoch": 0.16875,
393
+ "grad_norm": 4.690824508666992,
394
+ "learning_rate": 1.7741333333333336e-05,
395
+ "loss": 0.3096,
396
+ "step": 1350
397
+ },
398
+ {
399
+ "epoch": 0.171875,
400
+ "grad_norm": 5.487542152404785,
401
+ "learning_rate": 1.767466666666667e-05,
402
+ "loss": 0.2872,
403
+ "step": 1375
404
+ },
405
+ {
406
+ "epoch": 0.175,
407
+ "grad_norm": 5.0103440284729,
408
+ "learning_rate": 1.7608e-05,
409
+ "loss": 0.28,
410
+ "step": 1400
411
+ },
412
+ {
413
+ "epoch": 0.178125,
414
+ "grad_norm": 4.374607563018799,
415
+ "learning_rate": 1.7541333333333334e-05,
416
+ "loss": 0.2907,
417
+ "step": 1425
418
+ },
419
+ {
420
+ "epoch": 0.18125,
421
+ "grad_norm": 3.8199336528778076,
422
+ "learning_rate": 1.7474666666666667e-05,
423
+ "loss": 0.2721,
424
+ "step": 1450
425
+ },
426
+ {
427
+ "epoch": 0.184375,
428
+ "grad_norm": 3.263697862625122,
429
+ "learning_rate": 1.7408e-05,
430
+ "loss": 0.2018,
431
+ "step": 1475
432
+ },
433
+ {
434
+ "epoch": 0.1875,
435
+ "grad_norm": 3.686453104019165,
436
+ "learning_rate": 1.7341333333333333e-05,
437
+ "loss": 0.1804,
438
+ "step": 1500
439
+ },
440
+ {
441
+ "epoch": 0.190625,
442
+ "grad_norm": 3.122502088546753,
443
+ "learning_rate": 1.727466666666667e-05,
444
+ "loss": 0.1864,
445
+ "step": 1525
446
+ },
447
+ {
448
+ "epoch": 0.19375,
449
+ "grad_norm": 3.8203911781311035,
450
+ "learning_rate": 1.7208000000000002e-05,
451
+ "loss": 0.1822,
452
+ "step": 1550
453
+ },
454
+ {
455
+ "epoch": 0.196875,
456
+ "grad_norm": 3.1615259647369385,
457
+ "learning_rate": 1.7141333333333335e-05,
458
+ "loss": 0.1607,
459
+ "step": 1575
460
+ },
461
+ {
462
+ "epoch": 0.2,
463
+ "grad_norm": 3.136589288711548,
464
+ "learning_rate": 1.7074666666666668e-05,
465
+ "loss": 0.1681,
466
+ "step": 1600
467
+ },
468
+ {
469
+ "epoch": 0.203125,
470
+ "grad_norm": 5.022261142730713,
471
+ "learning_rate": 1.7008000000000004e-05,
472
+ "loss": 0.2476,
473
+ "step": 1625
474
+ },
475
+ {
476
+ "epoch": 0.20625,
477
+ "grad_norm": 4.3968377113342285,
478
+ "learning_rate": 1.6941333333333333e-05,
479
+ "loss": 0.2578,
480
+ "step": 1650
481
+ },
482
+ {
483
+ "epoch": 0.209375,
484
+ "grad_norm": 5.124329090118408,
485
+ "learning_rate": 1.687466666666667e-05,
486
+ "loss": 0.2646,
487
+ "step": 1675
488
+ },
489
+ {
490
+ "epoch": 0.2125,
491
+ "grad_norm": 3.7056658267974854,
492
+ "learning_rate": 1.6808000000000002e-05,
493
+ "loss": 0.251,
494
+ "step": 1700
495
+ },
496
+ {
497
+ "epoch": 0.215625,
498
+ "grad_norm": 3.477151870727539,
499
+ "learning_rate": 1.6741333333333335e-05,
500
+ "loss": 0.1996,
501
+ "step": 1725
502
+ },
503
+ {
504
+ "epoch": 0.21875,
505
+ "grad_norm": 3.3584837913513184,
506
+ "learning_rate": 1.6674666666666668e-05,
507
+ "loss": 0.1939,
508
+ "step": 1750
509
+ },
510
+ {
511
+ "epoch": 0.221875,
512
+ "grad_norm": 3.136394739151001,
513
+ "learning_rate": 1.6608e-05,
514
+ "loss": 0.1664,
515
+ "step": 1775
516
+ },
517
+ {
518
+ "epoch": 0.225,
519
+ "grad_norm": 2.997995376586914,
520
+ "learning_rate": 1.6541333333333334e-05,
521
+ "loss": 0.1731,
522
+ "step": 1800
523
+ },
524
+ {
525
+ "epoch": 0.228125,
526
+ "grad_norm": 3.235027551651001,
527
+ "learning_rate": 1.6474666666666667e-05,
528
+ "loss": 0.1609,
529
+ "step": 1825
530
+ },
531
+ {
532
+ "epoch": 0.23125,
533
+ "grad_norm": 2.657120704650879,
534
+ "learning_rate": 1.6408000000000003e-05,
535
+ "loss": 0.1563,
536
+ "step": 1850
537
+ },
538
+ {
539
+ "epoch": 0.234375,
540
+ "grad_norm": 3.5700979232788086,
541
+ "learning_rate": 1.6341333333333336e-05,
542
+ "loss": 0.1554,
543
+ "step": 1875
544
+ },
545
+ {
546
+ "epoch": 0.2375,
547
+ "grad_norm": 3.4939069747924805,
548
+ "learning_rate": 1.627466666666667e-05,
549
+ "loss": 0.1553,
550
+ "step": 1900
551
+ },
552
+ {
553
+ "epoch": 0.240625,
554
+ "grad_norm": 2.4021010398864746,
555
+ "learning_rate": 1.6208e-05,
556
+ "loss": 0.1494,
557
+ "step": 1925
558
+ },
559
+ {
560
+ "epoch": 0.24375,
561
+ "grad_norm": 4.335009574890137,
562
+ "learning_rate": 1.6141333333333334e-05,
563
+ "loss": 0.2011,
564
+ "step": 1950
565
+ },
566
+ {
567
+ "epoch": 0.246875,
568
+ "grad_norm": 4.650504112243652,
569
+ "learning_rate": 1.6074666666666667e-05,
570
+ "loss": 0.2702,
571
+ "step": 1975
572
+ },
573
+ {
574
+ "epoch": 0.25,
575
+ "grad_norm": 4.410916328430176,
576
+ "learning_rate": 1.6008e-05,
577
+ "loss": 0.2607,
578
+ "step": 2000
579
+ },
580
+ {
581
+ "epoch": 0.25,
582
+ "eval_loss": 0.3708300292491913,
583
+ "eval_runtime": 74.5053,
584
+ "eval_samples_per_second": 28.24,
585
+ "eval_steps_per_second": 1.772,
586
+ "eval_wer": 19.865382817612414,
587
+ "step": 2000
588
+ },
589
+ {
590
+ "epoch": 0.253125,
591
+ "grad_norm": 3.3271749019622803,
592
+ "learning_rate": 1.5941333333333336e-05,
593
+ "loss": 0.2175,
594
+ "step": 2025
595
+ },
596
+ {
597
+ "epoch": 0.25625,
598
+ "grad_norm": 3.365081310272217,
599
+ "learning_rate": 1.5874666666666666e-05,
600
+ "loss": 0.1527,
601
+ "step": 2050
602
+ },
603
+ {
604
+ "epoch": 0.259375,
605
+ "grad_norm": 3.701395273208618,
606
+ "learning_rate": 1.5808000000000002e-05,
607
+ "loss": 0.1461,
608
+ "step": 2075
609
+ },
610
+ {
611
+ "epoch": 0.2625,
612
+ "grad_norm": 2.8837661743164062,
613
+ "learning_rate": 1.5741333333333335e-05,
614
+ "loss": 0.1507,
615
+ "step": 2100
616
+ },
617
+ {
618
+ "epoch": 0.265625,
619
+ "grad_norm": 3.2435319423675537,
620
+ "learning_rate": 1.5674666666666667e-05,
621
+ "loss": 0.1314,
622
+ "step": 2125
623
+ },
624
+ {
625
+ "epoch": 0.26875,
626
+ "grad_norm": 2.9637367725372314,
627
+ "learning_rate": 1.5608e-05,
628
+ "loss": 0.1292,
629
+ "step": 2150
630
+ },
631
+ {
632
+ "epoch": 0.271875,
633
+ "grad_norm": 3.535871982574463,
634
+ "learning_rate": 1.5541333333333337e-05,
635
+ "loss": 0.1332,
636
+ "step": 2175
637
+ },
638
+ {
639
+ "epoch": 0.275,
640
+ "grad_norm": 3.633970022201538,
641
+ "learning_rate": 1.547466666666667e-05,
642
+ "loss": 0.2285,
643
+ "step": 2200
644
+ },
645
+ {
646
+ "epoch": 0.278125,
647
+ "grad_norm": 4.483553409576416,
648
+ "learning_rate": 1.5408000000000002e-05,
649
+ "loss": 0.2325,
650
+ "step": 2225
651
+ },
652
+ {
653
+ "epoch": 0.28125,
654
+ "grad_norm": 4.577600955963135,
655
+ "learning_rate": 1.5341333333333335e-05,
656
+ "loss": 0.256,
657
+ "step": 2250
658
+ },
659
+ {
660
+ "epoch": 0.284375,
661
+ "grad_norm": 3.32087779045105,
662
+ "learning_rate": 1.5274666666666668e-05,
663
+ "loss": 0.1758,
664
+ "step": 2275
665
+ },
666
+ {
667
+ "epoch": 0.2875,
668
+ "grad_norm": 3.1856462955474854,
669
+ "learning_rate": 1.5208e-05,
670
+ "loss": 0.1311,
671
+ "step": 2300
672
+ },
673
+ {
674
+ "epoch": 0.290625,
675
+ "grad_norm": 3.373046636581421,
676
+ "learning_rate": 1.5141333333333335e-05,
677
+ "loss": 0.1197,
678
+ "step": 2325
679
+ },
680
+ {
681
+ "epoch": 0.29375,
682
+ "grad_norm": 3.019298553466797,
683
+ "learning_rate": 1.5074666666666668e-05,
684
+ "loss": 0.1243,
685
+ "step": 2350
686
+ },
687
+ {
688
+ "epoch": 0.296875,
689
+ "grad_norm": 2.5749433040618896,
690
+ "learning_rate": 1.5008000000000001e-05,
691
+ "loss": 0.1302,
692
+ "step": 2375
693
+ },
694
+ {
695
+ "epoch": 0.3,
696
+ "grad_norm": 4.252561092376709,
697
+ "learning_rate": 1.4941333333333334e-05,
698
+ "loss": 0.1277,
699
+ "step": 2400
700
+ },
701
+ {
702
+ "epoch": 0.303125,
703
+ "grad_norm": 2.421847343444824,
704
+ "learning_rate": 1.4874666666666668e-05,
705
+ "loss": 0.1352,
706
+ "step": 2425
707
+ },
708
+ {
709
+ "epoch": 0.30625,
710
+ "grad_norm": 2.247629165649414,
711
+ "learning_rate": 1.4808e-05,
712
+ "loss": 0.1286,
713
+ "step": 2450
714
+ },
715
+ {
716
+ "epoch": 0.309375,
717
+ "grad_norm": 2.594142436981201,
718
+ "learning_rate": 1.4741333333333334e-05,
719
+ "loss": 0.1355,
720
+ "step": 2475
721
+ },
722
+ {
723
+ "epoch": 0.3125,
724
+ "grad_norm": 3.3857474327087402,
725
+ "learning_rate": 1.4674666666666669e-05,
726
+ "loss": 0.1539,
727
+ "step": 2500
728
+ },
729
+ {
730
+ "epoch": 0.315625,
731
+ "grad_norm": 3.906268835067749,
732
+ "learning_rate": 1.4608000000000001e-05,
733
+ "loss": 0.2039,
734
+ "step": 2525
735
+ },
736
+ {
737
+ "epoch": 0.31875,
738
+ "grad_norm": 4.0277204513549805,
739
+ "learning_rate": 1.4541333333333334e-05,
740
+ "loss": 0.1978,
741
+ "step": 2550
742
+ },
743
+ {
744
+ "epoch": 0.321875,
745
+ "grad_norm": 3.9858391284942627,
746
+ "learning_rate": 1.4474666666666669e-05,
747
+ "loss": 0.2316,
748
+ "step": 2575
749
+ },
750
+ {
751
+ "epoch": 0.325,
752
+ "grad_norm": 2.84332537651062,
753
+ "learning_rate": 1.4408000000000002e-05,
754
+ "loss": 0.1719,
755
+ "step": 2600
756
+ },
757
+ {
758
+ "epoch": 0.328125,
759
+ "grad_norm": 3.294312000274658,
760
+ "learning_rate": 1.4341333333333334e-05,
761
+ "loss": 0.1421,
762
+ "step": 2625
763
+ },
764
+ {
765
+ "epoch": 0.33125,
766
+ "grad_norm": 2.738583564758301,
767
+ "learning_rate": 1.4274666666666667e-05,
768
+ "loss": 0.1297,
769
+ "step": 2650
770
+ },
771
+ {
772
+ "epoch": 0.334375,
773
+ "grad_norm": 2.4573464393615723,
774
+ "learning_rate": 1.4208000000000002e-05,
775
+ "loss": 0.1223,
776
+ "step": 2675
777
+ },
778
+ {
779
+ "epoch": 0.3375,
780
+ "grad_norm": 2.678255319595337,
781
+ "learning_rate": 1.4141333333333333e-05,
782
+ "loss": 0.1249,
783
+ "step": 2700
784
+ },
785
+ {
786
+ "epoch": 0.340625,
787
+ "grad_norm": 3.8852200508117676,
788
+ "learning_rate": 1.4074666666666668e-05,
789
+ "loss": 0.1456,
790
+ "step": 2725
791
+ },
792
+ {
793
+ "epoch": 0.34375,
794
+ "grad_norm": 3.630040407180786,
795
+ "learning_rate": 1.4008000000000002e-05,
796
+ "loss": 0.1466,
797
+ "step": 2750
798
+ },
799
+ {
800
+ "epoch": 0.346875,
801
+ "grad_norm": 4.463013648986816,
802
+ "learning_rate": 1.3941333333333333e-05,
803
+ "loss": 0.2719,
804
+ "step": 2775
805
+ },
806
+ {
807
+ "epoch": 0.35,
808
+ "grad_norm": 4.058387279510498,
809
+ "learning_rate": 1.3874666666666668e-05,
810
+ "loss": 0.244,
811
+ "step": 2800
812
+ },
813
+ {
814
+ "epoch": 0.353125,
815
+ "grad_norm": 4.477724552154541,
816
+ "learning_rate": 1.3808e-05,
817
+ "loss": 0.2144,
818
+ "step": 2825
819
+ },
820
+ {
821
+ "epoch": 0.35625,
822
+ "grad_norm": 3.583326578140259,
823
+ "learning_rate": 1.3741333333333335e-05,
824
+ "loss": 0.2157,
825
+ "step": 2850
826
+ },
827
+ {
828
+ "epoch": 0.359375,
829
+ "grad_norm": 3.8221640586853027,
830
+ "learning_rate": 1.3674666666666668e-05,
831
+ "loss": 0.2151,
832
+ "step": 2875
833
+ },
834
+ {
835
+ "epoch": 0.3625,
836
+ "grad_norm": 3.6878933906555176,
837
+ "learning_rate": 1.3608e-05,
838
+ "loss": 0.2063,
839
+ "step": 2900
840
+ },
841
+ {
842
+ "epoch": 0.365625,
843
+ "grad_norm": 3.0994338989257812,
844
+ "learning_rate": 1.3541333333333335e-05,
845
+ "loss": 0.1613,
846
+ "step": 2925
847
+ },
848
+ {
849
+ "epoch": 0.36875,
850
+ "grad_norm": 2.727104902267456,
851
+ "learning_rate": 1.3474666666666667e-05,
852
+ "loss": 0.1278,
853
+ "step": 2950
854
+ },
855
+ {
856
+ "epoch": 0.371875,
857
+ "grad_norm": 3.0303940773010254,
858
+ "learning_rate": 1.3408000000000001e-05,
859
+ "loss": 0.1347,
860
+ "step": 2975
861
+ },
862
+ {
863
+ "epoch": 0.375,
864
+ "grad_norm": 3.6327948570251465,
865
+ "learning_rate": 1.3341333333333336e-05,
866
+ "loss": 0.1887,
867
+ "step": 3000
868
+ },
869
+ {
870
+ "epoch": 0.375,
871
+ "eval_loss": 0.34539544582366943,
872
+ "eval_runtime": 75.5211,
873
+ "eval_samples_per_second": 27.86,
874
+ "eval_steps_per_second": 1.748,
875
+ "eval_wer": 18.39768159296999,
876
+ "step": 3000
877
+ },
878
+ {
879
+ "epoch": 0.378125,
880
+ "grad_norm": 4.803223133087158,
881
+ "learning_rate": 1.3274666666666667e-05,
882
+ "loss": 0.2279,
883
+ "step": 3025
884
+ },
885
+ {
886
+ "epoch": 0.38125,
887
+ "grad_norm": 4.9647536277771,
888
+ "learning_rate": 1.3208000000000001e-05,
889
+ "loss": 0.2556,
890
+ "step": 3050
891
+ },
892
+ {
893
+ "epoch": 0.384375,
894
+ "grad_norm": 3.8229358196258545,
895
+ "learning_rate": 1.3141333333333334e-05,
896
+ "loss": 0.1795,
897
+ "step": 3075
898
+ },
899
+ {
900
+ "epoch": 0.3875,
901
+ "grad_norm": 3.2917191982269287,
902
+ "learning_rate": 1.3074666666666669e-05,
903
+ "loss": 0.1355,
904
+ "step": 3100
905
+ },
906
+ {
907
+ "epoch": 0.390625,
908
+ "grad_norm": 2.797985076904297,
909
+ "learning_rate": 1.3008e-05,
910
+ "loss": 0.1243,
911
+ "step": 3125
912
+ },
913
+ {
914
+ "epoch": 0.39375,
915
+ "grad_norm": 3.5542221069335938,
916
+ "learning_rate": 1.2941333333333334e-05,
917
+ "loss": 0.1494,
918
+ "step": 3150
919
+ },
920
+ {
921
+ "epoch": 0.396875,
922
+ "grad_norm": 3.7407355308532715,
923
+ "learning_rate": 1.2874666666666669e-05,
924
+ "loss": 0.2271,
925
+ "step": 3175
926
+ },
927
+ {
928
+ "epoch": 0.4,
929
+ "grad_norm": 4.250736713409424,
930
+ "learning_rate": 1.2808e-05,
931
+ "loss": 0.2051,
932
+ "step": 3200
933
+ },
934
+ {
935
+ "epoch": 0.403125,
936
+ "grad_norm": 4.248854160308838,
937
+ "learning_rate": 1.2741333333333335e-05,
938
+ "loss": 0.2251,
939
+ "step": 3225
940
+ },
941
+ {
942
+ "epoch": 0.40625,
943
+ "grad_norm": 3.6842331886291504,
944
+ "learning_rate": 1.2674666666666669e-05,
945
+ "loss": 0.1806,
946
+ "step": 3250
947
+ },
948
+ {
949
+ "epoch": 0.409375,
950
+ "grad_norm": 4.215803623199463,
951
+ "learning_rate": 1.2608e-05,
952
+ "loss": 0.1938,
953
+ "step": 3275
954
+ },
955
+ {
956
+ "epoch": 0.4125,
957
+ "grad_norm": 2.878873348236084,
958
+ "learning_rate": 1.2541333333333335e-05,
959
+ "loss": 0.211,
960
+ "step": 3300
961
+ },
962
+ {
963
+ "epoch": 0.415625,
964
+ "grad_norm": 3.9587783813476562,
965
+ "learning_rate": 1.2474666666666668e-05,
966
+ "loss": 0.1601,
967
+ "step": 3325
968
+ },
969
+ {
970
+ "epoch": 0.41875,
971
+ "grad_norm": 3.0218279361724854,
972
+ "learning_rate": 1.2408e-05,
973
+ "loss": 0.1236,
974
+ "step": 3350
975
+ },
976
+ {
977
+ "epoch": 0.421875,
978
+ "grad_norm": 2.6972806453704834,
979
+ "learning_rate": 1.2341333333333333e-05,
980
+ "loss": 0.1112,
981
+ "step": 3375
982
+ },
983
+ {
984
+ "epoch": 0.425,
985
+ "grad_norm": 2.4185972213745117,
986
+ "learning_rate": 1.2274666666666668e-05,
987
+ "loss": 0.1212,
988
+ "step": 3400
989
+ },
990
+ {
991
+ "epoch": 0.428125,
992
+ "grad_norm": 3.9205973148345947,
993
+ "learning_rate": 1.2208000000000002e-05,
994
+ "loss": 0.1225,
995
+ "step": 3425
996
+ },
997
+ {
998
+ "epoch": 0.43125,
999
+ "grad_norm": 2.392932176589966,
1000
+ "learning_rate": 1.2141333333333334e-05,
1001
+ "loss": 0.1167,
1002
+ "step": 3450
1003
+ },
1004
+ {
1005
+ "epoch": 0.434375,
1006
+ "grad_norm": 4.024794101715088,
1007
+ "learning_rate": 1.2074666666666668e-05,
1008
+ "loss": 0.1057,
1009
+ "step": 3475
1010
+ },
1011
+ {
1012
+ "epoch": 0.4375,
1013
+ "grad_norm": 3.367401599884033,
1014
+ "learning_rate": 1.2008000000000003e-05,
1015
+ "loss": 0.1309,
1016
+ "step": 3500
1017
+ },
1018
+ {
1019
+ "epoch": 0.440625,
1020
+ "grad_norm": 2.7556755542755127,
1021
+ "learning_rate": 1.1941333333333334e-05,
1022
+ "loss": 0.1235,
1023
+ "step": 3525
1024
+ },
1025
+ {
1026
+ "epoch": 0.44375,
1027
+ "grad_norm": 3.158759117126465,
1028
+ "learning_rate": 1.1874666666666668e-05,
1029
+ "loss": 0.1233,
1030
+ "step": 3550
1031
+ },
1032
+ {
1033
+ "epoch": 0.446875,
1034
+ "grad_norm": 4.0478010177612305,
1035
+ "learning_rate": 1.1808000000000001e-05,
1036
+ "loss": 0.1892,
1037
+ "step": 3575
1038
+ },
1039
+ {
1040
+ "epoch": 0.45,
1041
+ "grad_norm": 3.5508739948272705,
1042
+ "learning_rate": 1.1741333333333334e-05,
1043
+ "loss": 0.2186,
1044
+ "step": 3600
1045
+ },
1046
+ {
1047
+ "epoch": 0.453125,
1048
+ "grad_norm": 3.6009671688079834,
1049
+ "learning_rate": 1.1674666666666667e-05,
1050
+ "loss": 0.2093,
1051
+ "step": 3625
1052
+ },
1053
+ {
1054
+ "epoch": 0.45625,
1055
+ "grad_norm": 2.172722578048706,
1056
+ "learning_rate": 1.1608000000000001e-05,
1057
+ "loss": 0.1282,
1058
+ "step": 3650
1059
+ },
1060
+ {
1061
+ "epoch": 0.459375,
1062
+ "grad_norm": 2.729567050933838,
1063
+ "learning_rate": 1.1541333333333332e-05,
1064
+ "loss": 0.1096,
1065
+ "step": 3675
1066
+ },
1067
+ {
1068
+ "epoch": 0.4625,
1069
+ "grad_norm": 2.7863428592681885,
1070
+ "learning_rate": 1.1474666666666667e-05,
1071
+ "loss": 0.1115,
1072
+ "step": 3700
1073
+ },
1074
+ {
1075
+ "epoch": 0.465625,
1076
+ "grad_norm": 2.7164411544799805,
1077
+ "learning_rate": 1.1408000000000002e-05,
1078
+ "loss": 0.1365,
1079
+ "step": 3725
1080
+ },
1081
+ {
1082
+ "epoch": 0.46875,
1083
+ "grad_norm": 3.918790340423584,
1084
+ "learning_rate": 1.1341333333333336e-05,
1085
+ "loss": 0.1759,
1086
+ "step": 3750
1087
+ },
1088
+ {
1089
+ "epoch": 0.471875,
1090
+ "grad_norm": 3.520095109939575,
1091
+ "learning_rate": 1.1274666666666667e-05,
1092
+ "loss": 0.2138,
1093
+ "step": 3775
1094
+ },
1095
+ {
1096
+ "epoch": 0.475,
1097
+ "grad_norm": 4.172083854675293,
1098
+ "learning_rate": 1.1208000000000002e-05,
1099
+ "loss": 0.2189,
1100
+ "step": 3800
1101
+ },
1102
+ {
1103
+ "epoch": 0.478125,
1104
+ "grad_norm": 4.076236724853516,
1105
+ "learning_rate": 1.1141333333333335e-05,
1106
+ "loss": 0.2079,
1107
+ "step": 3825
1108
+ },
1109
+ {
1110
+ "epoch": 0.48125,
1111
+ "grad_norm": 4.950024604797363,
1112
+ "learning_rate": 1.1074666666666667e-05,
1113
+ "loss": 0.2233,
1114
+ "step": 3850
1115
+ },
1116
+ {
1117
+ "epoch": 0.484375,
1118
+ "grad_norm": 3.6588923931121826,
1119
+ "learning_rate": 1.1008e-05,
1120
+ "loss": 0.2011,
1121
+ "step": 3875
1122
+ },
1123
+ {
1124
+ "epoch": 0.4875,
1125
+ "grad_norm": 2.7259421348571777,
1126
+ "learning_rate": 1.0941333333333335e-05,
1127
+ "loss": 0.1419,
1128
+ "step": 3900
1129
+ },
1130
+ {
1131
+ "epoch": 0.490625,
1132
+ "grad_norm": 4.260537147521973,
1133
+ "learning_rate": 1.0874666666666666e-05,
1134
+ "loss": 0.1221,
1135
+ "step": 3925
1136
+ },
1137
+ {
1138
+ "epoch": 0.49375,
1139
+ "grad_norm": 2.9953160285949707,
1140
+ "learning_rate": 1.0808e-05,
1141
+ "loss": 0.1192,
1142
+ "step": 3950
1143
+ },
1144
+ {
1145
+ "epoch": 0.496875,
1146
+ "grad_norm": 5.537333011627197,
1147
+ "learning_rate": 1.0741333333333335e-05,
1148
+ "loss": 0.2057,
1149
+ "step": 3975
1150
+ },
1151
+ {
1152
+ "epoch": 0.5,
1153
+ "grad_norm": 4.265567302703857,
1154
+ "learning_rate": 1.0674666666666666e-05,
1155
+ "loss": 0.2607,
1156
+ "step": 4000
1157
+ },
1158
+ {
1159
+ "epoch": 0.5,
1160
+ "eval_loss": 0.3218025863170624,
1161
+ "eval_runtime": 75.7693,
1162
+ "eval_samples_per_second": 27.768,
1163
+ "eval_steps_per_second": 1.742,
1164
+ "eval_wer": 16.808450967560997,
1165
+ "step": 4000
1166
+ },
1167
+ {
1168
+ "epoch": 0.503125,
1169
+ "grad_norm": 2.978968620300293,
1170
+ "learning_rate": 1.0608e-05,
1171
+ "loss": 0.1951,
1172
+ "step": 4025
1173
+ },
1174
+ {
1175
+ "epoch": 0.50625,
1176
+ "grad_norm": 5.042616367340088,
1177
+ "learning_rate": 1.0541333333333335e-05,
1178
+ "loss": 0.2567,
1179
+ "step": 4050
1180
+ },
1181
+ {
1182
+ "epoch": 0.509375,
1183
+ "grad_norm": 4.18173885345459,
1184
+ "learning_rate": 1.0474666666666668e-05,
1185
+ "loss": 0.1932,
1186
+ "step": 4075
1187
+ },
1188
+ {
1189
+ "epoch": 0.5125,
1190
+ "grad_norm": 3.0428967475891113,
1191
+ "learning_rate": 1.0408000000000001e-05,
1192
+ "loss": 0.1743,
1193
+ "step": 4100
1194
+ },
1195
+ {
1196
+ "epoch": 0.515625,
1197
+ "grad_norm": 2.8713204860687256,
1198
+ "learning_rate": 1.0341333333333334e-05,
1199
+ "loss": 0.1261,
1200
+ "step": 4125
1201
+ },
1202
+ {
1203
+ "epoch": 0.51875,
1204
+ "grad_norm": 2.912363052368164,
1205
+ "learning_rate": 1.0274666666666668e-05,
1206
+ "loss": 0.1112,
1207
+ "step": 4150
1208
+ },
1209
+ {
1210
+ "epoch": 0.521875,
1211
+ "grad_norm": 2.445664167404175,
1212
+ "learning_rate": 1.0208e-05,
1213
+ "loss": 0.1133,
1214
+ "step": 4175
1215
+ },
1216
+ {
1217
+ "epoch": 0.525,
1218
+ "grad_norm": 2.2317187786102295,
1219
+ "learning_rate": 1.0141333333333334e-05,
1220
+ "loss": 0.106,
1221
+ "step": 4200
1222
+ },
1223
+ {
1224
+ "epoch": 0.528125,
1225
+ "grad_norm": 2.4223077297210693,
1226
+ "learning_rate": 1.0074666666666669e-05,
1227
+ "loss": 0.1142,
1228
+ "step": 4225
1229
+ },
1230
+ {
1231
+ "epoch": 0.53125,
1232
+ "grad_norm": 2.8847713470458984,
1233
+ "learning_rate": 1.0008e-05,
1234
+ "loss": 0.1158,
1235
+ "step": 4250
1236
+ },
1237
+ {
1238
+ "epoch": 0.534375,
1239
+ "grad_norm": 3.4072630405426025,
1240
+ "learning_rate": 9.941333333333334e-06,
1241
+ "loss": 0.1283,
1242
+ "step": 4275
1243
+ },
1244
+ {
1245
+ "epoch": 0.5375,
1246
+ "grad_norm": 4.455233573913574,
1247
+ "learning_rate": 9.874666666666669e-06,
1248
+ "loss": 0.1867,
1249
+ "step": 4300
1250
+ },
1251
+ {
1252
+ "epoch": 0.540625,
1253
+ "grad_norm": 3.465684175491333,
1254
+ "learning_rate": 9.808000000000002e-06,
1255
+ "loss": 0.1681,
1256
+ "step": 4325
1257
+ },
1258
+ {
1259
+ "epoch": 0.54375,
1260
+ "grad_norm": 3.8950164318084717,
1261
+ "learning_rate": 9.741333333333334e-06,
1262
+ "loss": 0.1812,
1263
+ "step": 4350
1264
+ },
1265
+ {
1266
+ "epoch": 0.546875,
1267
+ "grad_norm": 2.216827630996704,
1268
+ "learning_rate": 9.674666666666667e-06,
1269
+ "loss": 0.1069,
1270
+ "step": 4375
1271
+ },
1272
+ {
1273
+ "epoch": 0.55,
1274
+ "grad_norm": 2.3940842151641846,
1275
+ "learning_rate": 9.608e-06,
1276
+ "loss": 0.095,
1277
+ "step": 4400
1278
+ },
1279
+ {
1280
+ "epoch": 0.553125,
1281
+ "grad_norm": 2.4614291191101074,
1282
+ "learning_rate": 9.541333333333335e-06,
1283
+ "loss": 0.1019,
1284
+ "step": 4425
1285
+ },
1286
+ {
1287
+ "epoch": 0.55625,
1288
+ "grad_norm": 2.891763925552368,
1289
+ "learning_rate": 9.474666666666668e-06,
1290
+ "loss": 0.12,
1291
+ "step": 4450
1292
+ },
1293
+ {
1294
+ "epoch": 0.559375,
1295
+ "grad_norm": 2.791774272918701,
1296
+ "learning_rate": 9.408e-06,
1297
+ "loss": 0.1268,
1298
+ "step": 4475
1299
+ },
1300
+ {
1301
+ "epoch": 0.5625,
1302
+ "grad_norm": 2.9557909965515137,
1303
+ "learning_rate": 9.341333333333335e-06,
1304
+ "loss": 0.1164,
1305
+ "step": 4500
1306
+ },
1307
+ {
1308
+ "epoch": 0.565625,
1309
+ "grad_norm": 3.381051540374756,
1310
+ "learning_rate": 9.274666666666668e-06,
1311
+ "loss": 0.1464,
1312
+ "step": 4525
1313
+ },
1314
+ {
1315
+ "epoch": 0.56875,
1316
+ "grad_norm": 3.789724588394165,
1317
+ "learning_rate": 9.208e-06,
1318
+ "loss": 0.1947,
1319
+ "step": 4550
1320
+ },
1321
+ {
1322
+ "epoch": 0.571875,
1323
+ "grad_norm": 3.7860305309295654,
1324
+ "learning_rate": 9.141333333333333e-06,
1325
+ "loss": 0.1897,
1326
+ "step": 4575
1327
+ },
1328
+ {
1329
+ "epoch": 0.575,
1330
+ "grad_norm": 4.125986576080322,
1331
+ "learning_rate": 9.074666666666668e-06,
1332
+ "loss": 0.2003,
1333
+ "step": 4600
1334
+ },
1335
+ {
1336
+ "epoch": 0.578125,
1337
+ "grad_norm": 4.029356002807617,
1338
+ "learning_rate": 9.008e-06,
1339
+ "loss": 0.2014,
1340
+ "step": 4625
1341
+ },
1342
+ {
1343
+ "epoch": 0.58125,
1344
+ "grad_norm": 4.662783622741699,
1345
+ "learning_rate": 8.941333333333334e-06,
1346
+ "loss": 0.1962,
1347
+ "step": 4650
1348
+ },
1349
+ {
1350
+ "epoch": 0.584375,
1351
+ "grad_norm": 3.978227138519287,
1352
+ "learning_rate": 8.874666666666667e-06,
1353
+ "loss": 0.1693,
1354
+ "step": 4675
1355
+ },
1356
+ {
1357
+ "epoch": 0.5875,
1358
+ "grad_norm": 2.98833966255188,
1359
+ "learning_rate": 8.808000000000001e-06,
1360
+ "loss": 0.1565,
1361
+ "step": 4700
1362
+ },
1363
+ {
1364
+ "epoch": 0.590625,
1365
+ "grad_norm": 4.219015121459961,
1366
+ "learning_rate": 8.741333333333334e-06,
1367
+ "loss": 0.1687,
1368
+ "step": 4725
1369
+ },
1370
+ {
1371
+ "epoch": 0.59375,
1372
+ "grad_norm": 2.8378167152404785,
1373
+ "learning_rate": 8.674666666666668e-06,
1374
+ "loss": 0.1611,
1375
+ "step": 4750
1376
+ },
1377
+ {
1378
+ "epoch": 0.596875,
1379
+ "grad_norm": 2.5076210498809814,
1380
+ "learning_rate": 8.608000000000001e-06,
1381
+ "loss": 0.1186,
1382
+ "step": 4775
1383
+ },
1384
+ {
1385
+ "epoch": 0.6,
1386
+ "grad_norm": 2.2282755374908447,
1387
+ "learning_rate": 8.541333333333334e-06,
1388
+ "loss": 0.1267,
1389
+ "step": 4800
1390
+ },
1391
+ {
1392
+ "epoch": 0.603125,
1393
+ "grad_norm": 3.080812692642212,
1394
+ "learning_rate": 8.474666666666667e-06,
1395
+ "loss": 0.1112,
1396
+ "step": 4825
1397
+ },
1398
+ {
1399
+ "epoch": 0.60625,
1400
+ "grad_norm": 3.513218641281128,
1401
+ "learning_rate": 8.408e-06,
1402
+ "loss": 0.1326,
1403
+ "step": 4850
1404
+ },
1405
+ {
1406
+ "epoch": 0.609375,
1407
+ "grad_norm": 3.9359219074249268,
1408
+ "learning_rate": 8.341333333333334e-06,
1409
+ "loss": 0.1454,
1410
+ "step": 4875
1411
+ },
1412
+ {
1413
+ "epoch": 0.6125,
1414
+ "grad_norm": 3.585268259048462,
1415
+ "learning_rate": 8.274666666666667e-06,
1416
+ "loss": 0.1583,
1417
+ "step": 4900
1418
+ },
1419
+ {
1420
+ "epoch": 0.615625,
1421
+ "grad_norm": 3.322193145751953,
1422
+ "learning_rate": 8.208e-06,
1423
+ "loss": 0.1417,
1424
+ "step": 4925
1425
+ },
1426
+ {
1427
+ "epoch": 0.61875,
1428
+ "grad_norm": 2.7378623485565186,
1429
+ "learning_rate": 8.141333333333335e-06,
1430
+ "loss": 0.1164,
1431
+ "step": 4950
1432
+ },
1433
+ {
1434
+ "epoch": 0.621875,
1435
+ "grad_norm": 5.096762657165527,
1436
+ "learning_rate": 8.074666666666667e-06,
1437
+ "loss": 0.1077,
1438
+ "step": 4975
1439
+ },
1440
+ {
1441
+ "epoch": 0.625,
1442
+ "grad_norm": 3.030876636505127,
1443
+ "learning_rate": 8.008e-06,
1444
+ "loss": 0.106,
1445
+ "step": 5000
1446
+ },
1447
+ {
1448
+ "epoch": 0.625,
1449
+ "eval_loss": 0.32894453406333923,
1450
+ "eval_runtime": 75.4732,
1451
+ "eval_samples_per_second": 27.877,
1452
+ "eval_steps_per_second": 1.749,
1453
+ "eval_wer": 15.780125268766945,
1454
+ "step": 5000
1455
+ },
1456
+ {
1457
+ "epoch": 0.628125,
1458
+ "grad_norm": 3.1544792652130127,
1459
+ "learning_rate": 7.941333333333335e-06,
1460
+ "loss": 0.1599,
1461
+ "step": 5025
1462
+ },
1463
+ {
1464
+ "epoch": 0.63125,
1465
+ "grad_norm": 3.2215702533721924,
1466
+ "learning_rate": 7.874666666666668e-06,
1467
+ "loss": 0.1461,
1468
+ "step": 5050
1469
+ },
1470
+ {
1471
+ "epoch": 0.634375,
1472
+ "grad_norm": 3.7183897495269775,
1473
+ "learning_rate": 7.808e-06,
1474
+ "loss": 0.2221,
1475
+ "step": 5075
1476
+ },
1477
+ {
1478
+ "epoch": 0.6375,
1479
+ "grad_norm": 3.5974223613739014,
1480
+ "learning_rate": 7.741333333333333e-06,
1481
+ "loss": 0.2,
1482
+ "step": 5100
1483
+ },
1484
+ {
1485
+ "epoch": 0.640625,
1486
+ "grad_norm": 3.8318378925323486,
1487
+ "learning_rate": 7.674666666666666e-06,
1488
+ "loss": 0.1743,
1489
+ "step": 5125
1490
+ },
1491
+ {
1492
+ "epoch": 0.64375,
1493
+ "grad_norm": 4.3595290184021,
1494
+ "learning_rate": 7.608000000000001e-06,
1495
+ "loss": 0.174,
1496
+ "step": 5150
1497
+ },
1498
+ {
1499
+ "epoch": 0.646875,
1500
+ "grad_norm": 2.820388078689575,
1501
+ "learning_rate": 7.5413333333333335e-06,
1502
+ "loss": 0.1415,
1503
+ "step": 5175
1504
+ },
1505
+ {
1506
+ "epoch": 0.65,
1507
+ "grad_norm": 2.1610324382781982,
1508
+ "learning_rate": 7.474666666666666e-06,
1509
+ "loss": 0.0955,
1510
+ "step": 5200
1511
+ },
1512
+ {
1513
+ "epoch": 0.653125,
1514
+ "grad_norm": 2.079317092895508,
1515
+ "learning_rate": 7.408000000000001e-06,
1516
+ "loss": 0.1016,
1517
+ "step": 5225
1518
+ },
1519
+ {
1520
+ "epoch": 0.65625,
1521
+ "grad_norm": 2.3071603775024414,
1522
+ "learning_rate": 7.341333333333334e-06,
1523
+ "loss": 0.1042,
1524
+ "step": 5250
1525
+ },
1526
+ {
1527
+ "epoch": 0.659375,
1528
+ "grad_norm": 2.3762526512145996,
1529
+ "learning_rate": 7.2746666666666674e-06,
1530
+ "loss": 0.1058,
1531
+ "step": 5275
1532
+ },
1533
+ {
1534
+ "epoch": 0.6625,
1535
+ "grad_norm": 3.6836395263671875,
1536
+ "learning_rate": 7.208e-06,
1537
+ "loss": 0.1087,
1538
+ "step": 5300
1539
+ },
1540
+ {
1541
+ "epoch": 0.665625,
1542
+ "grad_norm": 3.0931732654571533,
1543
+ "learning_rate": 7.141333333333333e-06,
1544
+ "loss": 0.1062,
1545
+ "step": 5325
1546
+ },
1547
+ {
1548
+ "epoch": 0.66875,
1549
+ "grad_norm": 4.019095420837402,
1550
+ "learning_rate": 7.074666666666668e-06,
1551
+ "loss": 0.1453,
1552
+ "step": 5350
1553
+ },
1554
+ {
1555
+ "epoch": 0.671875,
1556
+ "grad_norm": 3.419175386428833,
1557
+ "learning_rate": 7.0080000000000005e-06,
1558
+ "loss": 0.1721,
1559
+ "step": 5375
1560
+ },
1561
+ {
1562
+ "epoch": 0.675,
1563
+ "grad_norm": 3.387830972671509,
1564
+ "learning_rate": 6.941333333333334e-06,
1565
+ "loss": 0.1652,
1566
+ "step": 5400
1567
+ },
1568
+ {
1569
+ "epoch": 0.678125,
1570
+ "grad_norm": 3.58986234664917,
1571
+ "learning_rate": 6.874666666666667e-06,
1572
+ "loss": 0.1639,
1573
+ "step": 5425
1574
+ },
1575
+ {
1576
+ "epoch": 0.68125,
1577
+ "grad_norm": 4.178884506225586,
1578
+ "learning_rate": 6.808e-06,
1579
+ "loss": 0.1533,
1580
+ "step": 5450
1581
+ },
1582
+ {
1583
+ "epoch": 0.684375,
1584
+ "grad_norm": 3.2337114810943604,
1585
+ "learning_rate": 6.741333333333334e-06,
1586
+ "loss": 0.1237,
1587
+ "step": 5475
1588
+ },
1589
+ {
1590
+ "epoch": 0.6875,
1591
+ "grad_norm": 2.892301321029663,
1592
+ "learning_rate": 6.674666666666667e-06,
1593
+ "loss": 0.1215,
1594
+ "step": 5500
1595
+ },
1596
+ {
1597
+ "epoch": 0.690625,
1598
+ "grad_norm": 4.553407669067383,
1599
+ "learning_rate": 6.608000000000001e-06,
1600
+ "loss": 0.1537,
1601
+ "step": 5525
1602
+ },
1603
+ {
1604
+ "epoch": 0.69375,
1605
+ "grad_norm": 3.8401100635528564,
1606
+ "learning_rate": 6.541333333333334e-06,
1607
+ "loss": 0.1816,
1608
+ "step": 5550
1609
+ },
1610
+ {
1611
+ "epoch": 0.696875,
1612
+ "grad_norm": 2.8084216117858887,
1613
+ "learning_rate": 6.474666666666667e-06,
1614
+ "loss": 0.1381,
1615
+ "step": 5575
1616
+ },
1617
+ {
1618
+ "epoch": 0.7,
1619
+ "grad_norm": 2.182170867919922,
1620
+ "learning_rate": 6.408000000000001e-06,
1621
+ "loss": 0.0979,
1622
+ "step": 5600
1623
+ },
1624
+ {
1625
+ "epoch": 0.703125,
1626
+ "grad_norm": 3.2050559520721436,
1627
+ "learning_rate": 6.341333333333334e-06,
1628
+ "loss": 0.0822,
1629
+ "step": 5625
1630
+ },
1631
+ {
1632
+ "epoch": 0.70625,
1633
+ "grad_norm": 2.4150376319885254,
1634
+ "learning_rate": 6.274666666666667e-06,
1635
+ "loss": 0.0815,
1636
+ "step": 5650
1637
+ },
1638
+ {
1639
+ "epoch": 0.709375,
1640
+ "grad_norm": 2.0708541870117188,
1641
+ "learning_rate": 6.2080000000000005e-06,
1642
+ "loss": 0.0944,
1643
+ "step": 5675
1644
+ },
1645
+ {
1646
+ "epoch": 0.7125,
1647
+ "grad_norm": 2.932088851928711,
1648
+ "learning_rate": 6.141333333333333e-06,
1649
+ "loss": 0.1024,
1650
+ "step": 5700
1651
+ },
1652
+ {
1653
+ "epoch": 0.715625,
1654
+ "grad_norm": 2.245450258255005,
1655
+ "learning_rate": 6.074666666666668e-06,
1656
+ "loss": 0.101,
1657
+ "step": 5725
1658
+ },
1659
+ {
1660
+ "epoch": 0.71875,
1661
+ "grad_norm": 2.2716262340545654,
1662
+ "learning_rate": 6.008000000000001e-06,
1663
+ "loss": 0.1,
1664
+ "step": 5750
1665
+ },
1666
+ {
1667
+ "epoch": 0.721875,
1668
+ "grad_norm": 2.496361494064331,
1669
+ "learning_rate": 5.941333333333334e-06,
1670
+ "loss": 0.0966,
1671
+ "step": 5775
1672
+ },
1673
+ {
1674
+ "epoch": 0.725,
1675
+ "grad_norm": 3.3539814949035645,
1676
+ "learning_rate": 5.874666666666667e-06,
1677
+ "loss": 0.0911,
1678
+ "step": 5800
1679
+ },
1680
+ {
1681
+ "epoch": 0.728125,
1682
+ "grad_norm": 2.1496963500976562,
1683
+ "learning_rate": 5.808e-06,
1684
+ "loss": 0.0799,
1685
+ "step": 5825
1686
+ },
1687
+ {
1688
+ "epoch": 0.73125,
1689
+ "grad_norm": 2.5061728954315186,
1690
+ "learning_rate": 5.741333333333335e-06,
1691
+ "loss": 0.0877,
1692
+ "step": 5850
1693
+ },
1694
+ {
1695
+ "epoch": 0.734375,
1696
+ "grad_norm": 2.4256293773651123,
1697
+ "learning_rate": 5.6746666666666675e-06,
1698
+ "loss": 0.0945,
1699
+ "step": 5875
1700
+ },
1701
+ {
1702
+ "epoch": 0.7375,
1703
+ "grad_norm": 3.3995237350463867,
1704
+ "learning_rate": 5.608e-06,
1705
+ "loss": 0.1245,
1706
+ "step": 5900
1707
+ },
1708
+ {
1709
+ "epoch": 0.740625,
1710
+ "grad_norm": 4.556021213531494,
1711
+ "learning_rate": 5.541333333333334e-06,
1712
+ "loss": 0.181,
1713
+ "step": 5925
1714
+ },
1715
+ {
1716
+ "epoch": 0.74375,
1717
+ "grad_norm": 3.8894693851470947,
1718
+ "learning_rate": 5.474666666666667e-06,
1719
+ "loss": 0.1692,
1720
+ "step": 5950
1721
+ },
1722
+ {
1723
+ "epoch": 0.746875,
1724
+ "grad_norm": 3.464264392852783,
1725
+ "learning_rate": 5.408e-06,
1726
+ "loss": 0.1457,
1727
+ "step": 5975
1728
+ },
1729
+ {
1730
+ "epoch": 0.75,
1731
+ "grad_norm": 3.351585865020752,
1732
+ "learning_rate": 5.341333333333334e-06,
1733
+ "loss": 0.1376,
1734
+ "step": 6000
1735
+ },
1736
+ {
1737
+ "epoch": 0.75,
1738
+ "eval_loss": 0.3052073121070862,
1739
+ "eval_runtime": 73.9486,
1740
+ "eval_samples_per_second": 28.452,
1741
+ "eval_steps_per_second": 1.785,
1742
+ "eval_wer": 15.02757782555857,
1743
+ "step": 6000
1744
+ },
1745
+ {
1746
+ "epoch": 0.753125,
1747
+ "grad_norm": 3.405545473098755,
1748
+ "learning_rate": 5.274666666666667e-06,
1749
+ "loss": 0.1327,
1750
+ "step": 6025
1751
+ },
1752
+ {
1753
+ "epoch": 0.75625,
1754
+ "grad_norm": 2.5915920734405518,
1755
+ "learning_rate": 5.208000000000001e-06,
1756
+ "loss": 0.1145,
1757
+ "step": 6050
1758
+ },
1759
+ {
1760
+ "epoch": 0.759375,
1761
+ "grad_norm": 2.2758867740631104,
1762
+ "learning_rate": 5.141333333333334e-06,
1763
+ "loss": 0.1053,
1764
+ "step": 6075
1765
+ },
1766
+ {
1767
+ "epoch": 0.7625,
1768
+ "grad_norm": 2.8261423110961914,
1769
+ "learning_rate": 5.0746666666666665e-06,
1770
+ "loss": 0.0942,
1771
+ "step": 6100
1772
+ },
1773
+ {
1774
+ "epoch": 0.765625,
1775
+ "grad_norm": 3.362257480621338,
1776
+ "learning_rate": 5.008000000000001e-06,
1777
+ "loss": 0.0969,
1778
+ "step": 6125
1779
+ },
1780
+ {
1781
+ "epoch": 0.76875,
1782
+ "grad_norm": 3.890949249267578,
1783
+ "learning_rate": 4.941333333333334e-06,
1784
+ "loss": 0.1068,
1785
+ "step": 6150
1786
+ },
1787
+ {
1788
+ "epoch": 0.771875,
1789
+ "grad_norm": 3.03787899017334,
1790
+ "learning_rate": 4.874666666666667e-06,
1791
+ "loss": 0.1094,
1792
+ "step": 6175
1793
+ },
1794
+ {
1795
+ "epoch": 0.775,
1796
+ "grad_norm": 2.8833038806915283,
1797
+ "learning_rate": 4.808e-06,
1798
+ "loss": 0.1043,
1799
+ "step": 6200
1800
+ },
1801
+ {
1802
+ "epoch": 0.778125,
1803
+ "grad_norm": 3.1083550453186035,
1804
+ "learning_rate": 4.741333333333334e-06,
1805
+ "loss": 0.1061,
1806
+ "step": 6225
1807
+ },
1808
+ {
1809
+ "epoch": 0.78125,
1810
+ "grad_norm": 3.4954771995544434,
1811
+ "learning_rate": 4.674666666666667e-06,
1812
+ "loss": 0.1013,
1813
+ "step": 6250
1814
+ },
1815
+ {
1816
+ "epoch": 0.784375,
1817
+ "grad_norm": 3.035095691680908,
1818
+ "learning_rate": 4.608000000000001e-06,
1819
+ "loss": 0.1119,
1820
+ "step": 6275
1821
+ },
1822
+ {
1823
+ "epoch": 0.7875,
1824
+ "grad_norm": 3.62898850440979,
1825
+ "learning_rate": 4.5413333333333334e-06,
1826
+ "loss": 0.1253,
1827
+ "step": 6300
1828
+ },
1829
+ {
1830
+ "epoch": 0.790625,
1831
+ "grad_norm": 2.595010280609131,
1832
+ "learning_rate": 4.474666666666667e-06,
1833
+ "loss": 0.1681,
1834
+ "step": 6325
1835
+ },
1836
+ {
1837
+ "epoch": 0.79375,
1838
+ "grad_norm": 3.7245900630950928,
1839
+ "learning_rate": 4.408000000000001e-06,
1840
+ "loss": 0.1501,
1841
+ "step": 6350
1842
+ },
1843
+ {
1844
+ "epoch": 0.796875,
1845
+ "grad_norm": 2.7315571308135986,
1846
+ "learning_rate": 4.344e-06,
1847
+ "loss": 0.1395,
1848
+ "step": 6375
1849
+ },
1850
+ {
1851
+ "epoch": 0.8,
1852
+ "grad_norm": 2.3649332523345947,
1853
+ "learning_rate": 4.277333333333334e-06,
1854
+ "loss": 0.1111,
1855
+ "step": 6400
1856
+ },
1857
+ {
1858
+ "epoch": 0.803125,
1859
+ "grad_norm": 2.491359233856201,
1860
+ "learning_rate": 4.210666666666667e-06,
1861
+ "loss": 0.1024,
1862
+ "step": 6425
1863
+ },
1864
+ {
1865
+ "epoch": 0.80625,
1866
+ "grad_norm": 2.5985302925109863,
1867
+ "learning_rate": 4.1440000000000005e-06,
1868
+ "loss": 0.0925,
1869
+ "step": 6450
1870
+ },
1871
+ {
1872
+ "epoch": 0.809375,
1873
+ "grad_norm": 4.008167266845703,
1874
+ "learning_rate": 4.077333333333333e-06,
1875
+ "loss": 0.1385,
1876
+ "step": 6475
1877
+ },
1878
+ {
1879
+ "epoch": 0.8125,
1880
+ "grad_norm": 2.743041753768921,
1881
+ "learning_rate": 4.010666666666667e-06,
1882
+ "loss": 0.1289,
1883
+ "step": 6500
1884
+ },
1885
+ {
1886
+ "epoch": 0.815625,
1887
+ "grad_norm": 4.4984893798828125,
1888
+ "learning_rate": 3.944e-06,
1889
+ "loss": 0.1709,
1890
+ "step": 6525
1891
+ },
1892
+ {
1893
+ "epoch": 0.81875,
1894
+ "grad_norm": 3.432147741317749,
1895
+ "learning_rate": 3.8773333333333335e-06,
1896
+ "loss": 0.1563,
1897
+ "step": 6550
1898
+ },
1899
+ {
1900
+ "epoch": 0.821875,
1901
+ "grad_norm": 3.6097943782806396,
1902
+ "learning_rate": 3.810666666666667e-06,
1903
+ "loss": 0.159,
1904
+ "step": 6575
1905
+ },
1906
+ {
1907
+ "epoch": 0.825,
1908
+ "grad_norm": 3.096435308456421,
1909
+ "learning_rate": 3.7440000000000005e-06,
1910
+ "loss": 0.1444,
1911
+ "step": 6600
1912
+ },
1913
+ {
1914
+ "epoch": 0.828125,
1915
+ "grad_norm": 3.5198802947998047,
1916
+ "learning_rate": 3.6773333333333338e-06,
1917
+ "loss": 0.1493,
1918
+ "step": 6625
1919
+ },
1920
+ {
1921
+ "epoch": 0.83125,
1922
+ "grad_norm": 3.3834660053253174,
1923
+ "learning_rate": 3.6106666666666666e-06,
1924
+ "loss": 0.1737,
1925
+ "step": 6650
1926
+ },
1927
+ {
1928
+ "epoch": 0.834375,
1929
+ "grad_norm": 3.2613072395324707,
1930
+ "learning_rate": 3.5440000000000003e-06,
1931
+ "loss": 0.1408,
1932
+ "step": 6675
1933
+ },
1934
+ {
1935
+ "epoch": 0.8375,
1936
+ "grad_norm": 2.618708848953247,
1937
+ "learning_rate": 3.4773333333333336e-06,
1938
+ "loss": 0.1427,
1939
+ "step": 6700
1940
+ },
1941
+ {
1942
+ "epoch": 0.840625,
1943
+ "grad_norm": 2.236100196838379,
1944
+ "learning_rate": 3.4106666666666672e-06,
1945
+ "loss": 0.0976,
1946
+ "step": 6725
1947
+ },
1948
+ {
1949
+ "epoch": 0.84375,
1950
+ "grad_norm": 2.4650626182556152,
1951
+ "learning_rate": 3.344e-06,
1952
+ "loss": 0.0899,
1953
+ "step": 6750
1954
+ },
1955
+ {
1956
+ "epoch": 0.846875,
1957
+ "grad_norm": 3.514897346496582,
1958
+ "learning_rate": 3.2773333333333334e-06,
1959
+ "loss": 0.1099,
1960
+ "step": 6775
1961
+ },
1962
+ {
1963
+ "epoch": 0.85,
1964
+ "grad_norm": 3.703801155090332,
1965
+ "learning_rate": 3.210666666666667e-06,
1966
+ "loss": 0.1979,
1967
+ "step": 6800
1968
+ },
1969
+ {
1970
+ "epoch": 0.853125,
1971
+ "grad_norm": 4.976568698883057,
1972
+ "learning_rate": 3.1440000000000003e-06,
1973
+ "loss": 0.1801,
1974
+ "step": 6825
1975
+ },
1976
+ {
1977
+ "epoch": 0.85625,
1978
+ "grad_norm": 4.201725959777832,
1979
+ "learning_rate": 3.077333333333334e-06,
1980
+ "loss": 0.1839,
1981
+ "step": 6850
1982
+ },
1983
+ {
1984
+ "epoch": 0.859375,
1985
+ "grad_norm": 3.662229061126709,
1986
+ "learning_rate": 3.010666666666667e-06,
1987
+ "loss": 0.1695,
1988
+ "step": 6875
1989
+ },
1990
+ {
1991
+ "epoch": 0.8625,
1992
+ "grad_norm": 3.8069918155670166,
1993
+ "learning_rate": 2.944e-06,
1994
+ "loss": 0.1615,
1995
+ "step": 6900
1996
+ },
1997
+ {
1998
+ "epoch": 0.865625,
1999
+ "grad_norm": 3.208935499191284,
2000
+ "learning_rate": 2.877333333333334e-06,
2001
+ "loss": 0.1496,
2002
+ "step": 6925
2003
+ },
2004
+ {
2005
+ "epoch": 0.86875,
2006
+ "grad_norm": 3.4923043251037598,
2007
+ "learning_rate": 2.810666666666667e-06,
2008
+ "loss": 0.147,
2009
+ "step": 6950
2010
+ },
2011
+ {
2012
+ "epoch": 0.871875,
2013
+ "grad_norm": 4.01771354675293,
2014
+ "learning_rate": 2.744e-06,
2015
+ "loss": 0.1751,
2016
+ "step": 6975
2017
+ },
2018
+ {
2019
+ "epoch": 0.875,
2020
+ "grad_norm": 4.1294355392456055,
2021
+ "learning_rate": 2.6773333333333336e-06,
2022
+ "loss": 0.1733,
2023
+ "step": 7000
2024
+ },
2025
+ {
2026
+ "epoch": 0.875,
2027
+ "eval_loss": 0.3004015386104584,
2028
+ "eval_runtime": 74.1497,
2029
+ "eval_samples_per_second": 28.375,
2030
+ "eval_steps_per_second": 1.78,
2031
+ "eval_wer": 13.933813218659438,
2032
+ "step": 7000
2033
+ },
2034
+ {
2035
+ "epoch": 0.878125,
2036
+ "grad_norm": 12.625326156616211,
2037
+ "learning_rate": 2.616e-06,
2038
+ "loss": 0.3186,
2039
+ "step": 7025
2040
+ },
2041
+ {
2042
+ "epoch": 0.88125,
2043
+ "grad_norm": 7.057121753692627,
2044
+ "learning_rate": 2.5493333333333337e-06,
2045
+ "loss": 0.5086,
2046
+ "step": 7050
2047
+ },
2048
+ {
2049
+ "epoch": 0.884375,
2050
+ "grad_norm": 5.440456390380859,
2051
+ "learning_rate": 2.482666666666667e-06,
2052
+ "loss": 0.481,
2053
+ "step": 7075
2054
+ },
2055
+ {
2056
+ "epoch": 0.8875,
2057
+ "grad_norm": 6.303742408752441,
2058
+ "learning_rate": 2.4160000000000002e-06,
2059
+ "loss": 0.4034,
2060
+ "step": 7100
2061
+ },
2062
+ {
2063
+ "epoch": 0.890625,
2064
+ "grad_norm": 3.7720141410827637,
2065
+ "learning_rate": 2.3493333333333335e-06,
2066
+ "loss": 0.2274,
2067
+ "step": 7125
2068
+ },
2069
+ {
2070
+ "epoch": 0.89375,
2071
+ "grad_norm": 4.611368656158447,
2072
+ "learning_rate": 2.2826666666666668e-06,
2073
+ "loss": 0.1717,
2074
+ "step": 7150
2075
+ },
2076
+ {
2077
+ "epoch": 0.896875,
2078
+ "grad_norm": 3.155137777328491,
2079
+ "learning_rate": 2.216e-06,
2080
+ "loss": 0.1594,
2081
+ "step": 7175
2082
+ },
2083
+ {
2084
+ "epoch": 0.9,
2085
+ "grad_norm": 3.6036856174468994,
2086
+ "learning_rate": 2.1493333333333337e-06,
2087
+ "loss": 0.1264,
2088
+ "step": 7200
2089
+ },
2090
+ {
2091
+ "epoch": 0.903125,
2092
+ "grad_norm": 3.040969133377075,
2093
+ "learning_rate": 2.0826666666666666e-06,
2094
+ "loss": 0.1198,
2095
+ "step": 7225
2096
+ },
2097
+ {
2098
+ "epoch": 0.90625,
2099
+ "grad_norm": 2.538546562194824,
2100
+ "learning_rate": 2.0160000000000003e-06,
2101
+ "loss": 0.1233,
2102
+ "step": 7250
2103
+ },
2104
+ {
2105
+ "epoch": 0.909375,
2106
+ "grad_norm": 2.2235965728759766,
2107
+ "learning_rate": 1.9493333333333335e-06,
2108
+ "loss": 0.0948,
2109
+ "step": 7275
2110
+ },
2111
+ {
2112
+ "epoch": 0.9125,
2113
+ "grad_norm": 2.112567663192749,
2114
+ "learning_rate": 1.8826666666666668e-06,
2115
+ "loss": 0.0796,
2116
+ "step": 7300
2117
+ },
2118
+ {
2119
+ "epoch": 0.915625,
2120
+ "grad_norm": 2.5596227645874023,
2121
+ "learning_rate": 1.8160000000000003e-06,
2122
+ "loss": 0.0871,
2123
+ "step": 7325
2124
+ },
2125
+ {
2126
+ "epoch": 0.91875,
2127
+ "grad_norm": 3.282794713973999,
2128
+ "learning_rate": 1.7493333333333335e-06,
2129
+ "loss": 0.1154,
2130
+ "step": 7350
2131
+ },
2132
+ {
2133
+ "epoch": 0.921875,
2134
+ "grad_norm": 3.568565607070923,
2135
+ "learning_rate": 1.6826666666666668e-06,
2136
+ "loss": 0.1648,
2137
+ "step": 7375
2138
+ },
2139
+ {
2140
+ "epoch": 0.925,
2141
+ "grad_norm": 3.731203079223633,
2142
+ "learning_rate": 1.616e-06,
2143
+ "loss": 0.132,
2144
+ "step": 7400
2145
+ },
2146
+ {
2147
+ "epoch": 0.928125,
2148
+ "grad_norm": 2.649831771850586,
2149
+ "learning_rate": 1.5493333333333335e-06,
2150
+ "loss": 0.1518,
2151
+ "step": 7425
2152
+ },
2153
+ {
2154
+ "epoch": 0.93125,
2155
+ "grad_norm": 2.2203938961029053,
2156
+ "learning_rate": 1.4826666666666666e-06,
2157
+ "loss": 0.1045,
2158
+ "step": 7450
2159
+ },
2160
+ {
2161
+ "epoch": 0.934375,
2162
+ "grad_norm": 3.9395177364349365,
2163
+ "learning_rate": 1.416e-06,
2164
+ "loss": 0.0957,
2165
+ "step": 7475
2166
+ },
2167
+ {
2168
+ "epoch": 0.9375,
2169
+ "grad_norm": 3.2605130672454834,
2170
+ "learning_rate": 1.3493333333333333e-06,
2171
+ "loss": 0.0901,
2172
+ "step": 7500
2173
+ },
2174
+ {
2175
+ "epoch": 0.940625,
2176
+ "grad_norm": 4.289961814880371,
2177
+ "learning_rate": 1.2826666666666668e-06,
2178
+ "loss": 0.1846,
2179
+ "step": 7525
2180
+ },
2181
+ {
2182
+ "epoch": 0.94375,
2183
+ "grad_norm": 4.553671836853027,
2184
+ "learning_rate": 1.216e-06,
2185
+ "loss": 0.2492,
2186
+ "step": 7550
2187
+ },
2188
+ {
2189
+ "epoch": 0.946875,
2190
+ "grad_norm": 5.279869079589844,
2191
+ "learning_rate": 1.1493333333333334e-06,
2192
+ "loss": 0.2844,
2193
+ "step": 7575
2194
+ },
2195
+ {
2196
+ "epoch": 0.95,
2197
+ "grad_norm": 2.1614372730255127,
2198
+ "learning_rate": 1.0826666666666668e-06,
2199
+ "loss": 0.1715,
2200
+ "step": 7600
2201
+ },
2202
+ {
2203
+ "epoch": 0.953125,
2204
+ "grad_norm": 2.8423452377319336,
2205
+ "learning_rate": 1.016e-06,
2206
+ "loss": 0.1162,
2207
+ "step": 7625
2208
+ },
2209
+ {
2210
+ "epoch": 0.95625,
2211
+ "grad_norm": 2.077169895172119,
2212
+ "learning_rate": 9.493333333333334e-07,
2213
+ "loss": 0.094,
2214
+ "step": 7650
2215
+ },
2216
+ {
2217
+ "epoch": 0.959375,
2218
+ "grad_norm": 3.680450201034546,
2219
+ "learning_rate": 8.826666666666666e-07,
2220
+ "loss": 0.1291,
2221
+ "step": 7675
2222
+ },
2223
+ {
2224
+ "epoch": 0.9625,
2225
+ "grad_norm": 3.4633026123046875,
2226
+ "learning_rate": 8.160000000000001e-07,
2227
+ "loss": 0.1549,
2228
+ "step": 7700
2229
+ },
2230
+ {
2231
+ "epoch": 0.965625,
2232
+ "grad_norm": 3.8427698612213135,
2233
+ "learning_rate": 7.493333333333335e-07,
2234
+ "loss": 0.1646,
2235
+ "step": 7725
2236
+ },
2237
+ {
2238
+ "epoch": 0.96875,
2239
+ "grad_norm": 2.8945538997650146,
2240
+ "learning_rate": 6.826666666666667e-07,
2241
+ "loss": 0.1524,
2242
+ "step": 7750
2243
+ },
2244
+ {
2245
+ "epoch": 0.971875,
2246
+ "grad_norm": 3.424391269683838,
2247
+ "learning_rate": 6.160000000000001e-07,
2248
+ "loss": 0.1083,
2249
+ "step": 7775
2250
+ },
2251
+ {
2252
+ "epoch": 0.975,
2253
+ "grad_norm": 5.2906951904296875,
2254
+ "learning_rate": 5.493333333333334e-07,
2255
+ "loss": 0.1327,
2256
+ "step": 7800
2257
+ },
2258
+ {
2259
+ "epoch": 0.978125,
2260
+ "grad_norm": 6.5452046394348145,
2261
+ "learning_rate": 4.853333333333333e-07,
2262
+ "loss": 0.3624,
2263
+ "step": 7825
2264
+ },
2265
+ {
2266
+ "epoch": 0.98125,
2267
+ "grad_norm": 2.549628496170044,
2268
+ "learning_rate": 4.186666666666667e-07,
2269
+ "loss": 0.1518,
2270
+ "step": 7850
2271
+ },
2272
+ {
2273
+ "epoch": 0.984375,
2274
+ "grad_norm": 2.076683759689331,
2275
+ "learning_rate": 3.5200000000000003e-07,
2276
+ "loss": 0.1048,
2277
+ "step": 7875
2278
+ },
2279
+ {
2280
+ "epoch": 0.9875,
2281
+ "grad_norm": 2.8122971057891846,
2282
+ "learning_rate": 2.8533333333333335e-07,
2283
+ "loss": 0.0981,
2284
+ "step": 7900
2285
+ },
2286
+ {
2287
+ "epoch": 0.990625,
2288
+ "grad_norm": 2.3224234580993652,
2289
+ "learning_rate": 2.186666666666667e-07,
2290
+ "loss": 0.1017,
2291
+ "step": 7925
2292
+ },
2293
+ {
2294
+ "epoch": 0.99375,
2295
+ "grad_norm": 2.10176682472229,
2296
+ "learning_rate": 1.52e-07,
2297
+ "loss": 0.0937,
2298
+ "step": 7950
2299
+ },
2300
+ {
2301
+ "epoch": 0.996875,
2302
+ "grad_norm": 3.3718252182006836,
2303
+ "learning_rate": 8.533333333333334e-08,
2304
+ "loss": 0.1,
2305
+ "step": 7975
2306
+ },
2307
+ {
2308
+ "epoch": 1.0,
2309
+ "grad_norm": 4.459543704986572,
2310
+ "learning_rate": 1.866666666666667e-08,
2311
+ "loss": 0.1228,
2312
+ "step": 8000
2313
+ },
2314
+ {
2315
+ "epoch": 1.0,
2316
+ "eval_loss": 0.24521400034427643,
2317
+ "eval_runtime": 75.7252,
2318
+ "eval_samples_per_second": 27.785,
2319
+ "eval_steps_per_second": 1.743,
2320
+ "eval_wer": 13.816958025614658,
2321
+ "step": 8000
2322
+ },
2323
+ {
2324
+ "epoch": 1.0,
2325
+ "step": 8000,
2326
+ "total_flos": 1.660415901696e+19,
2327
+ "train_loss": 0.22206098145246506,
2328
+ "train_runtime": 4270.5513,
2329
+ "train_samples_per_second": 59.945,
2330
+ "train_steps_per_second": 1.873
2331
+ }
2332
+ ],
2333
+ "logging_steps": 25,
2334
+ "max_steps": 8000,
2335
+ "num_input_tokens_seen": 0,
2336
+ "num_train_epochs": 9223372036854775807,
2337
+ "save_steps": 1000,
2338
+ "stateful_callbacks": {
2339
+ "TrainerControl": {
2340
+ "args": {
2341
+ "should_epoch_stop": false,
2342
+ "should_evaluate": false,
2343
+ "should_log": false,
2344
+ "should_save": true,
2345
+ "should_training_stop": true
2346
+ },
2347
+ "attributes": {}
2348
+ }
2349
+ },
2350
+ "total_flos": 1.660415901696e+19,
2351
+ "train_batch_size": 32,
2352
+ "trial_name": null,
2353
+ "trial_params": null
2354
+ }
wandb/run-20250214_113805-769lwzm2/files/output.log CHANGED
@@ -1553,3 +1553,161 @@ Training completed. Do not forget to share your model on huggingface.co/models =
1553
  [INFO|feature_extraction_utils.py:437] 2025-02-14 12:49:32,767 >> Feature extractor saved in ./preprocessor_config.json
1554
  [INFO|modelcard.py:449] 2025-02-14 12:49:32,953 >> Dropping the following result as it does not have all the necessary fields:
1555
  {'task': {'name': 'Automatic Speech Recognition', 'type': 'automatic-speech-recognition'}, 'metrics': [{'name': 'Wer', 'type': 'wer', 'value': 13.816958025614658}]}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1553
  [INFO|feature_extraction_utils.py:437] 2025-02-14 12:49:32,767 >> Feature extractor saved in ./preprocessor_config.json
1554
  [INFO|modelcard.py:449] 2025-02-14 12:49:32,953 >> Dropping the following result as it does not have all the necessary fields:
1555
  {'task': {'name': 'Automatic Speech Recognition', 'type': 'automatic-speech-recognition'}, 'metrics': [{'name': 'Wer', 'type': 'wer', 'value': 13.816958025614658}]}
1556
+ ***** train metrics *****
1557
+ epoch = 1.0
1558
+ total_flos = 15463828125GF
1559
+ train_loss = 0.2221
1560
+ train_runtime = 1:11:10.55
1561
+ train_samples_per_second = 59.945
1562
+ train_steps_per_second = 1.873
1563
+ 02/14/2025 12:49:36 - INFO - __main__ - *** Evaluate ***
1564
+ [INFO|trainer.py:4176] 2025-02-14 12:49:36,135 >>
1565
+ ***** Running Evaluation *****
1566
+ [INFO|trainer.py:4180] 2025-02-14 12:49:36,135 >> Num examples: Unknown
1567
+ [INFO|trainer.py:4181] 2025-02-14 12:49:36,360 >> Batch size = 16
1568
+ [INFO|trainer_utils.py:837] 2025-02-14 12:49:43,950 >> The following columns in the evaluation set don't have a corresponding argument in `WhisperForConditionalGeneration.forward` and have been ignored: input_length. If input_length are not expected by `WhisperForConditionalGeneration.forward`, you can safely ignore this message.
1569
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:49:44,088 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1570
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:49:44,654 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1571
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:49:45,564 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1572
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:49:46,260 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1573
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:49:46,754 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1574
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:49:47,308 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1575
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:49:47,931 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1576
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:49:48,501 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1577
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:49:49,141 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1578
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:49:49,695 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1579
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:49:50,267 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1580
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:49:50,914 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1581
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:49:51,407 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1582
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:49:51,908 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1583
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:49:52,450 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1584
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:49:52,868 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1585
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:49:53,315 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1586
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:49:53,848 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1587
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:49:54,262 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1588
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:49:54,738 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1589
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:49:55,208 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1590
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:49:55,706 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1591
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:49:56,132 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1592
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:49:56,581 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1593
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:49:56,973 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1594
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:49:57,385 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1595
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:49:57,843 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1596
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:49:58,338 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1597
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:49:58,713 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1598
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:49:59,132 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1599
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:49:59,574 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1600
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:00,003 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1601
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:00,467 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1602
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:00,862 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1603
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:01,326 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1604
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:01,773 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1605
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:02,167 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1606
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:02,612 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1607
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:03,040 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1608
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:03,649 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1609
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:04,062 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1610
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:04,494 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1611
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:04,870 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1612
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:05,277 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1613
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:05,674 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1614
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:06,139 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1615
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:06,557 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1616
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:06,963 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1617
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:07,474 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1618
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:07,892 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1619
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:08,329 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1620
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:08,757 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1621
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:09,132 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1622
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:09,554 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1623
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:09,970 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1624
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:10,439 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1625
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:10,807 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1626
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:11,197 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1627
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:11,646 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1628
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:12,028 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1629
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:12,475 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1630
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:12,906 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1631
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:13,352 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1632
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:13,772 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1633
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:14,141 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1634
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:14,494 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1635
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:14,909 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1636
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:15,326 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1637
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:15,743 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1638
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:16,168 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1639
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:16,565 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1640
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:17,038 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1641
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:17,466 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1642
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:17,944 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1643
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:18,374 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1644
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:18,821 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1645
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:19,197 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1646
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:19,609 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1647
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:20,038 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1648
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:20,447 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1649
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:20,846 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1650
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:21,237 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1651
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:21,657 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1652
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:22,106 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1653
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:22,560 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1654
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:22,975 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1655
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:23,406 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1656
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:23,899 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1657
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:24,343 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1658
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:24,850 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1659
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:25,273 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1660
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:25,693 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1661
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:26,113 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1662
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:26,516 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1663
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:26,998 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1664
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:27,442 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1665
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:27,869 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1666
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:28,297 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1667
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:28,718 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1668
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:29,158 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1669
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:29,597 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1670
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:30,052 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1671
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:30,469 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1672
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:30,922 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1673
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:31,405 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1674
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:31,834 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1675
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:32,271 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1676
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:32,744 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1677
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:33,154 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1678
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:33,548 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1679
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:33,986 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1680
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:34,429 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1681
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:34,820 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1682
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:35,204 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1683
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:35,605 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1684
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:36,003 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1685
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:36,456 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1686
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:36,919 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1687
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:37,326 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1688
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:37,739 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1689
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:38,165 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1690
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:38,544 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1691
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:38,977 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1692
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:39,392 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1693
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:39,796 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1694
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:40,194 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1695
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:40,597 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1696
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:41,016 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1697
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:41,415 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1698
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:41,838 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1699
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:42,209 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1700
+ [INFO|generation_whisper.py:1844] 2025-02-14 12:50:42,555 >> Increase max_length from 225 to 228 since input is conditioned on previous segment.
1701
+ ***** eval metrics *****
1702
+ epoch = 1.0
1703
+ eval_loss = 0.2452
1704
+ eval_runtime = 0:01:14.51
1705
+ eval_samples_per_second = 28.236
1706
+ eval_steps_per_second = 1.771
1707
+ eval_wer = 13.817
1708
+ [INFO|trainer.py:3860] 2025-02-14 12:50:50,651 >> Saving model checkpoint to ./
1709
+ [INFO|configuration_utils.py:423] 2025-02-14 12:50:50,652 >> Configuration saved in ./config.json
1710
+ [INFO|configuration_utils.py:906] 2025-02-14 12:50:50,653 >> Configuration saved in ./generation_config.json
1711
+ [INFO|modeling_utils.py:3040] 2025-02-14 12:50:51,227 >> Model weights saved in ./model.safetensors
1712
+ [INFO|feature_extraction_utils.py:437] 2025-02-14 12:50:51,228 >> Feature extractor saved in ./preprocessor_config.json
1713
+ run-769lwzm2.wandb: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 4.10M/4.10M [00:00<00:00, 5.43MB/s]
wandb/run-20250214_113805-769lwzm2/run-769lwzm2.wandb CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:5bd2b1631e813875cd713dcf98c810f3f515b442c8e5d11ba595cd916d228576
3
- size 4063232
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3d98c19965714e6634732bf8bd8b654d2f947be412067708c8249f41aaa7c73d
3
+ size 4096000