Diffusers documentation
ChromaTransformer2DModel
ChromaTransformer2DModel
A modified flux Transformer model from Chroma
ChromaTransformer2DModel
class diffusers.ChromaTransformer2DModel
< source >( patch_size: int = 1 in_channels: int = 64 out_channels: typing.Optional[int] = None num_layers: int = 19 num_single_layers: int = 38 attention_head_dim: int = 128 num_attention_heads: int = 24 joint_attention_dim: int = 4096 axes_dims_rope: typing.Tuple[int, ...] = (16, 56, 56) approximator_num_channels: int = 64 approximator_hidden_dim: int = 5120 approximator_layers: int = 5 )
Parameters
-  patch_size (int, defaults to1) — Patch size to turn the input data into small patches.
-  in_channels (int, defaults to64) — The number of channels in the input.
-  out_channels (int, optional, defaults toNone) — The number of channels in the output. If not specified, it defaults toin_channels.
-  num_layers (int, defaults to19) — The number of layers of dual stream DiT blocks to use.
-  num_single_layers (int, defaults to38) — The number of layers of single stream DiT blocks to use.
-  attention_head_dim (int, defaults to128) — The number of dimensions to use for each attention head.
-  num_attention_heads (int, defaults to24) — The number of attention heads to use.
-  joint_attention_dim (int, defaults to4096) — The number of dimensions to use for the joint attention (embedding/channel dimension ofencoder_hidden_states).
-  axes_dims_rope (Tuple[int], defaults to(16, 56, 56)) — The dimensions to use for the rotary positional embeddings.
The Transformer model introduced in Flux, modified for Chroma.
Reference: https://huggingface.co/lodestones/Chroma
forward
< source >( hidden_states: Tensor encoder_hidden_states: Tensor = None timestep: LongTensor = None img_ids: Tensor = None txt_ids: Tensor = None attention_mask: Tensor = None joint_attention_kwargs: typing.Optional[typing.Dict[str, typing.Any]] = None controlnet_block_samples = None controlnet_single_block_samples = None return_dict: bool = True controlnet_blocks_repeat: bool = False )
Parameters
-  hidden_states (torch.Tensorof shape(batch_size, image_sequence_length, in_channels)) — Inputhidden_states.
-  encoder_hidden_states (torch.Tensorof shape(batch_size, text_sequence_length, joint_attention_dim)) — Conditional embeddings (embeddings computed from the input conditions such as prompts) to use.
-  timestep ( torch.LongTensor) — Used to indicate denoising step.
-  block_controlnet_hidden_states — (listoftorch.Tensor): A list of tensors that if specified are added to the residuals of transformer blocks.
-  joint_attention_kwargs (dict, optional) — A kwargs dictionary that if specified is passed along to theAttentionProcessoras defined underself.processorin diffusers.models.attention_processor.
-  return_dict (bool, optional, defaults toTrue) — Whether or not to return a~models.transformer_2d.Transformer2DModelOutputinstead of a plain tuple.
The FluxTransformer2DModel forward method.