Diffusers documentation

LatteTransformer3DModel

You are viewing main version, which requires installation from source. If you'd like regular pip install, checkout the latest stable version (v0.33.1).
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

LatteTransformer3DModel

A Diffusion Transformer model for 3D data from Latte.

LatteTransformer3DModel

class diffusers.LatteTransformer3DModel

< >

( num_attention_heads: int = 16attention_head_dim: int = 88in_channels: typing.Optional[int] = Noneout_channels: typing.Optional[int] = Nonenum_layers: int = 1dropout: float = 0.0cross_attention_dim: typing.Optional[int] = Noneattention_bias: bool = Falsesample_size: int = 64patch_size: typing.Optional[int] = Noneactivation_fn: str = 'geglu'num_embeds_ada_norm: typing.Optional[int] = Nonenorm_type: str = 'layer_norm'norm_elementwise_affine: bool = Truenorm_eps: float = 1e-05caption_channels: int = Nonevideo_length: int = 16 )

forward

< >

( hidden_states: Tensortimestep: typing.Optional[torch.LongTensor] = Noneencoder_hidden_states: typing.Optional[torch.Tensor] = Noneencoder_attention_mask: typing.Optional[torch.Tensor] = Noneenable_temporal_attentions: bool = Truereturn_dict: bool = True )

Parameters

  • hidden_states shape (batch size, channel, num_frame, height, width) — Input hidden_states.
  • timestep ( torch.LongTensor, optional) — Used to indicate denoising step. Optional timestep to be applied as an embedding in AdaLayerNorm.
  • encoder_hidden_states ( torch.FloatTensor of shape (batch size, sequence len, embed dims), optional) — Conditional embeddings for cross attention layer. If not given, cross-attention defaults to self-attention.
  • encoder_attention_mask ( torch.Tensor, optional) — Cross-attention mask applied to encoder_hidden_states. Two formats supported:

    • Mask (batcheight, sequence_length) True = keep, False = discard.
    • Bias (batcheight, 1, sequence_length) 0 = keep, -10000 = discard.

    If ndim == 2: will be interpreted as a mask, then converted into a bias consistent with the format above. This bias will be added to the cross-attention scores.

  • enable_temporal_attentions — (bool, optional, defaults to True): Whether to enable temporal attentions.
  • return_dict (bool, optional, defaults to True) — Whether or not to return a ~models.unet_2d_condition.UNet2DConditionOutput instead of a plain tuple.

The LatteTransformer3DModel forward method.

< > Update on GitHub