metadata
license: creativeml-openrail-m
language:
- en
library_name: diffusers
pipeline_tag: text-to-image
tags:
- stable-diffusion
TokenCompose SD14 Model Card
TokenCompose_SD14_A is a latent text-to-image diffusion model finetuned from the Stable-Diffusion-v1-4 checkpoint at resolution 512x512 on the VSR split of COCO image-caption pairs for 24,000 steps with a learning rate of 5e-6. The training objective involves token-level grounding terms in addition to denoising loss for enhanced multi-category instance composition and photorealism.