Spaces:
Runtime error
Utilities for Generation
This page lists all the utility functions used by [~generation.GenerationMixin.generate
],
[~generation.GenerationMixin.greedy_search
],
[~generation.GenerationMixin.contrastive_search
],
[~generation.GenerationMixin.sample
],
[~generation.GenerationMixin.beam_search
],
[~generation.GenerationMixin.beam_sample
],
[~generation.GenerationMixin.group_beam_search
], and
[~generation.GenerationMixin.constrained_beam_search
].
Most of those are only useful if you are studying the code of the generate methods in the library.
Generate Outputs
The output of [~generation.GenerationMixin.generate
] is an instance of a subclass of
[~utils.ModelOutput
]. This output is a data structure containing all the information returned
by [~generation.GenerationMixin.generate
], but that can also be used as tuple or dictionary.
Here's an example:
from transformers import GPT2Tokenizer, GPT2LMHeadModel
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
model = GPT2LMHeadModel.from_pretrained("gpt2")
inputs = tokenizer("Hello, my dog is cute and ", return_tensors="pt")
generation_output = model.generate(**inputs, return_dict_in_generate=True, output_scores=True)
The generation_output
object is a [~generation.GreedySearchDecoderOnlyOutput
], as we can
see in the documentation of that class below, it means it has the following attributes:
sequences
: the generated sequences of tokensscores
(optional): the prediction scores of the language modelling head, for each generation stephidden_states
(optional): the hidden states of the model, for each generation stepattentions
(optional): the attention weights of the model, for each generation step
Here we have the scores
since we passed along output_scores=True
, but we don't have hidden_states
and
attentions
because we didn't pass output_hidden_states=True
or output_attentions=True
.
You can access each attribute as you would usually do, and if that attribute has not been returned by the model, you
will get None
. Here for instance generation_output.scores
are all the generated prediction scores of the
language modeling head, and generation_output.attentions
is None
.
When using our generation_output
object as a tuple, it only keeps the attributes that don't have None
values.
Here, for instance, it has two elements, loss
then logits
, so
generation_output[:2]
will return the tuple (generation_output.sequences, generation_output.scores)
for instance.
When using our generation_output
object as a dictionary, it only keeps the attributes that don't have None
values. Here, for instance, it has two keys that are sequences
and scores
.
We document here all output types.
PyTorch
[[autodoc]] generation.GreedySearchEncoderDecoderOutput
[[autodoc]] generation.GreedySearchDecoderOnlyOutput
[[autodoc]] generation.SampleEncoderDecoderOutput
[[autodoc]] generation.SampleDecoderOnlyOutput
[[autodoc]] generation.BeamSearchEncoderDecoderOutput
[[autodoc]] generation.BeamSearchDecoderOnlyOutput
[[autodoc]] generation.BeamSampleEncoderDecoderOutput
[[autodoc]] generation.BeamSampleDecoderOnlyOutput
[[autodoc]] generation.ContrastiveSearchEncoderDecoderOutput
[[autodoc]] generation.ContrastiveSearchDecoderOnlyOutput
TensorFlow
[[autodoc]] generation.TFGreedySearchEncoderDecoderOutput
[[autodoc]] generation.TFGreedySearchDecoderOnlyOutput
[[autodoc]] generation.TFSampleEncoderDecoderOutput
[[autodoc]] generation.TFSampleDecoderOnlyOutput
[[autodoc]] generation.TFBeamSearchEncoderDecoderOutput
[[autodoc]] generation.TFBeamSearchDecoderOnlyOutput
[[autodoc]] generation.TFBeamSampleEncoderDecoderOutput
[[autodoc]] generation.TFBeamSampleDecoderOnlyOutput
[[autodoc]] generation.TFContrastiveSearchEncoderDecoderOutput
[[autodoc]] generation.TFContrastiveSearchDecoderOnlyOutput
FLAX
[[autodoc]] generation.FlaxSampleOutput
[[autodoc]] generation.FlaxGreedySearchOutput
[[autodoc]] generation.FlaxBeamSearchOutput
LogitsProcessor
A [LogitsProcessor
] can be used to modify the prediction scores of a language model head for
generation.
PyTorch
[[autodoc]] AlternatingCodebooksLogitsProcessor - call
[[autodoc]] ClassifierFreeGuidanceLogitsProcessor - call
[[autodoc]] EncoderNoRepeatNGramLogitsProcessor - call
[[autodoc]] EncoderRepetitionPenaltyLogitsProcessor - call
[[autodoc]] EpsilonLogitsWarper - call
[[autodoc]] EtaLogitsWarper - call
[[autodoc]] ExponentialDecayLengthPenalty - call
[[autodoc]] ForcedBOSTokenLogitsProcessor - call
[[autodoc]] ForcedEOSTokenLogitsProcessor - call
[[autodoc]] ForceTokensLogitsProcessor - call
[[autodoc]] HammingDiversityLogitsProcessor - call
[[autodoc]] InfNanRemoveLogitsProcessor - call
[[autodoc]] LogitNormalization - call
[[autodoc]] LogitsProcessor - call
[[autodoc]] LogitsProcessorList - call
[[autodoc]] LogitsWarper - call
[[autodoc]] MinLengthLogitsProcessor - call
[[autodoc]] MinNewTokensLengthLogitsProcessor - call
[[autodoc]] NoBadWordsLogitsProcessor - call
[[autodoc]] NoRepeatNGramLogitsProcessor - call
[[autodoc]] PrefixConstrainedLogitsProcessor - call
[[autodoc]] RepetitionPenaltyLogitsProcessor - call
[[autodoc]] SequenceBiasLogitsProcessor - call
[[autodoc]] SuppressTokensAtBeginLogitsProcessor - call
[[autodoc]] SuppressTokensLogitsProcessor - call
[[autodoc]] TemperatureLogitsWarper - call
[[autodoc]] TopKLogitsWarper - call
[[autodoc]] TopPLogitsWarper - call
[[autodoc]] TypicalLogitsWarper - call
[[autodoc]] UnbatchedClassifierFreeGuidanceLogitsProcessor - call
[[autodoc]] WhisperTimeStampLogitsProcessor - call
TensorFlow
[[autodoc]] TFForcedBOSTokenLogitsProcessor - call
[[autodoc]] TFForcedEOSTokenLogitsProcessor - call
[[autodoc]] TFForceTokensLogitsProcessor - call
[[autodoc]] TFLogitsProcessor - call
[[autodoc]] TFLogitsProcessorList - call
[[autodoc]] TFLogitsWarper - call
[[autodoc]] TFMinLengthLogitsProcessor - call
[[autodoc]] TFNoBadWordsLogitsProcessor - call
[[autodoc]] TFNoRepeatNGramLogitsProcessor - call
[[autodoc]] TFRepetitionPenaltyLogitsProcessor - call
[[autodoc]] TFSuppressTokensAtBeginLogitsProcessor - call
[[autodoc]] TFSuppressTokensLogitsProcessor - call
[[autodoc]] TFTemperatureLogitsWarper - call
[[autodoc]] TFTopKLogitsWarper - call
[[autodoc]] TFTopPLogitsWarper - call
FLAX
[[autodoc]] FlaxForcedBOSTokenLogitsProcessor - call
[[autodoc]] FlaxForcedEOSTokenLogitsProcessor - call
[[autodoc]] FlaxForceTokensLogitsProcessor - call
[[autodoc]] FlaxLogitsProcessor - call
[[autodoc]] FlaxLogitsProcessorList - call
[[autodoc]] FlaxLogitsWarper - call
[[autodoc]] FlaxMinLengthLogitsProcessor - call
[[autodoc]] FlaxSuppressTokensAtBeginLogitsProcessor - call
[[autodoc]] FlaxSuppressTokensLogitsProcessor - call
[[autodoc]] FlaxTemperatureLogitsWarper - call
[[autodoc]] FlaxTopKLogitsWarper - call
[[autodoc]] FlaxTopPLogitsWarper - call
[[autodoc]] FlaxWhisperTimeStampLogitsProcessor - call
StoppingCriteria
A [StoppingCriteria
] can be used to change when to stop generation (other than EOS token). Please note that this is exclusivelly available to our PyTorch implementations.
[[autodoc]] StoppingCriteria - call
[[autodoc]] StoppingCriteriaList - call
[[autodoc]] MaxLengthCriteria - call
[[autodoc]] MaxTimeCriteria - call
Constraints
A [Constraint
] can be used to force the generation to include specific tokens or sequences in the output. Please note that this is exclusivelly available to our PyTorch implementations.
[[autodoc]] Constraint
[[autodoc]] PhrasalConstraint
[[autodoc]] DisjunctiveConstraint
[[autodoc]] ConstraintListState
BeamSearch
[[autodoc]] BeamScorer - process - finalize
[[autodoc]] BeamSearchScorer - process - finalize
[[autodoc]] ConstrainedBeamSearchScorer - process - finalize
Utilities
[[autodoc]] top_k_top_p_filtering
[[autodoc]] tf_top_k_top_p_filtering
Streamers
[[autodoc]] TextStreamer
[[autodoc]] TextIteratorStreamer