TokenCompose_SD14_A / README.md
zwcolin's picture
Update README.md
d6016ce
|
raw
history blame
785 Bytes
metadata
license: creativeml-openrail-m
language:
  - en
library_name: diffusers
pipeline_tag: text-to-image
tags:
  - stable-diffusion

TokenCompose SD14 Model Card

TokenCompose_SD14_A is a latent text-to-image diffusion model finetuned from the Stable-Diffusion-v1-4 checkpoint at resolution 512x512 on the VSR split of COCO image-caption pairs for 24,000 steps with a learning rate of 5e-6. The training objective involves token-level grounding terms in addition to denoising loss for enhanced multi-category instance composition and photorealism.