Spaces:

Ahmadzei
/

RAG

Runtime error

update 1

57bdca5 over 1 year ago

1.53 kB

	In the next frame we have Dropout which renormalizes
	the weights, after it zeroed some of the elements, which pushes the absolute max value to more than 64K, and we get an
	overflow (inf).
	As you can see it's the previous frames that we need to look into when the numbers start going into very large for fp16
	numbers.
	Let's match the report to the code from models/t5/modeling_t5.py:
	thon
	class T5DenseGatedGeluDense(nn.Module):
	def init(self, config):
	super().init()
	self.wi_0 = nn.Linear(config.d_model, config.d_ff, bias=False)
	self.wi_1 = nn.Linear(config.d_model, config.d_ff, bias=False)
	self.wo = nn.Linear(config.d_ff, config.d_model, bias=False)
	self.dropout = nn.Dropout(config.dropout_rate)
	self.gelu_act = ACT2FN["gelu_new"]
	def forward(self, hidden_states):
	hidden_gelu = self.gelu_act(self.wi_0(hidden_states))
	hidden_linear = self.wi_1(hidden_states)
	hidden_states = hidden_gelu * hidden_linear
	hidden_states = self.dropout(hidden_states)
	hidden_states = self.wo(hidden_states)
	return hidden_states

	Now it's easy to see the dropout call, and all the previous calls as well.
	Since the detection is happening in a forward hook, these reports are printed immediately after each forward
	returns.
	Going back to the full report, to act on it and to fix the problem, we need to go a few frames up where the numbers
	started to go up and most likely switch to the fp32 mode here, so that the numbers don't overflow when multiplied
	or summed up.