|
As soon as inf or |
|
nan is detected in at least one element of the activations or weights, the program will assert and print a report |
|
like this (this was caught with google/mt5-small under fp16 mixed precision): |
|
Detected inf/nan during batch_number=0 |
|
Last 21 forward frames: |
|
abs min abs max metadata |
|
encoder.block.1.layer.1.DenseReluDense.dropout Dropout |
|
0.00e+00 2.57e+02 input[0] |
|
0.00e+00 2.85e+02 output |
|
[] |
|
encoder.block.2.layer.0 T5LayerSelfAttention |
|
6.78e-04 3.15e+03 input[0] |
|
2.65e-04 3.42e+03 output[0] |
|
None output[1] |
|
2.25e-01 1.00e+04 output[2] |
|
encoder.block.2.layer.1.layer_norm T5LayerNorm |
|
8.69e-02 4.18e-01 weight |
|
2.65e-04 3.42e+03 input[0] |
|
1.79e-06 4.65e+00 output |
|
encoder.block.2.layer.1.DenseReluDense.wi_0 Linear |
|
2.17e-07 4.50e+00 weight |
|
1.79e-06 4.65e+00 input[0] |
|
2.68e-06 3.70e+01 output |
|
encoder.block.2.layer.1.DenseReluDense.wi_1 Linear |
|
8.08e-07 2.66e+01 weight |
|
1.79e-06 4.65e+00 input[0] |
|
1.27e-04 2.37e+02 output |
|
encoder.block.2.layer.1.DenseReluDense.dropout Dropout |
|
0.00e+00 8.76e+03 input[0] |
|
0.00e+00 9.74e+03 output |
|
encoder.block.2.layer.1.DenseReluDense.wo Linear |
|
1.01e-06 6.44e+00 weight |
|
0.00e+00 9.74e+03 input[0] |
|
3.18e-04 6.27e+04 output |
|
encoder.block.2.layer.1.DenseReluDense T5DenseGatedGeluDense |
|
1.79e-06 4.65e+00 input[0] |
|
3.18e-04 6.27e+04 output |
|
encoder.block.2.layer.1.dropout Dropout |
|
3.18e-04 6.27e+04 input[0] |
|
0.00e+00 inf output |
|
The example output has been trimmed in the middle for brevity. |