Spaces:

Ahmadzei
/

RAG

Runtime error

update 1

57bdca5 over 1 year ago

451 Bytes

	The largest version of GPT-2, for example, has a fixed length of 1024 tokens, so we
	cannot calculate \(p_\theta(x_t\|x_{<t})\) directly when \(t\) is greater than 1024.
	Instead, the sequence is typically broken into subsequences equal to the model's maximum input size. If a model's max
	input size is \(k\), we then approximate the likelihood of a token \(x_t\) by conditioning only on the
	\(k-1\) tokens that precede it rather than the entire context.