Diffusers documentation
LuminaNextDiT2DModel
LuminaNextDiT2DModel
A Next Version of Diffusion Transformer model for 2D data from Lumina-T2X.
LuminaNextDiT2DModel
class diffusers.LuminaNextDiT2DModel
< source >( sample_size: int = 128 patch_size: typing.Optional[int] = 2 in_channels: typing.Optional[int] = 4 hidden_size: typing.Optional[int] = 2304 num_layers: typing.Optional[int] = 32 num_attention_heads: typing.Optional[int] = 32 num_kv_heads: typing.Optional[int] = None multiple_of: typing.Optional[int] = 256 ffn_dim_multiplier: typing.Optional[float] = None norm_eps: typing.Optional[float] = 1e-05 learn_sigma: typing.Optional[bool] = True qk_norm: typing.Optional[bool] = True cross_attention_dim: typing.Optional[int] = 2048 scaling_factor: typing.Optional[float] = 1.0 )
Parameters
-  sample_size (int) — The width of the latent images. This is fixed during training since it is used to learn a number of position embeddings.
-  patch_size (int, optional, (int, optional, defaults to 2) — The size of each patch in the image. This parameter defines the resolution of patches fed into the model.
-  in_channels (int, optional, defaults to 4) — The number of input channels for the model. Typically, this matches the number of channels in the input images.
-  hidden_size (int, optional, defaults to 4096) — The dimensionality of the hidden layers in the model. This parameter determines the width of the model’s hidden representations.
-  num_layers (int, optional, default to 32) — The number of layers in the model. This defines the depth of the neural network.
-  num_attention_heads (int, optional, defaults to 32) — The number of attention heads in each attention layer. This parameter specifies how many separate attention mechanisms are used.
-  num_kv_heads (int, optional, defaults to 8) — The number of key-value heads in the attention mechanism, if different from the number of attention heads. If None, it defaults to num_attention_heads.
-  multiple_of (int, optional, defaults to 256) — A factor that the hidden size should be a multiple of. This can help optimize certain hardware configurations.
-  ffn_dim_multiplier (float, optional) — A multiplier for the dimensionality of the feed-forward network. If None, it uses a default value based on the model configuration.
-  norm_eps (float, optional, defaults to 1e-5) — A small value added to the denominator for numerical stability in normalization layers.
-  learn_sigma (bool, optional, defaults to True) — Whether the model should learn the sigma parameter, which might be related to uncertainty or variance in predictions.
-  qk_norm (bool, optional, defaults to True) — Indicates if the queries and keys in the attention mechanism should be normalized.
-  cross_attention_dim (int, optional, defaults to 2048) — The dimensionality of the text embeddings. This parameter defines the size of the text representations used in the model.
-  scaling_factor (float, optional, defaults to 1.0) — A scaling factor applied to certain parameters or layers in the model. This can be used for adjusting the overall scale of the model’s operations.
LuminaNextDiT: Diffusion model with a Transformer backbone.
Inherit ModelMixin and ConfigMixin to be compatible with the sampler StableDiffusionPipeline of diffusers.
forward
< source >( hidden_states: Tensor timestep: Tensor encoder_hidden_states: Tensor encoder_mask: Tensor image_rotary_emb: Tensor cross_attention_kwargs: typing.Dict[str, typing.Any] = None return_dict = True )
Parameters
Forward pass of LuminaNextDiT.