EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models
This is the official model weights of the model ''Edgen'' trained by EvolveDirector. For more datails, please refer to our paper and code repo.
Setup
Requirements
- Build virtual environment for EvolveDirector
# create virtual environment for EvolveDirector
conda create -n evolvedirector python=3.9
conda activate evolvedirector
# cd to the path of this repo
# install packages
pip install --upgrade pip
pip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt
pip install -U transformers accelerate diffusers SentencePiece ftfy beautifulsoup4
Usage
- Inference
python Inference/inference.py --image_size=1024 \
--t5_path "./model" \
--tokenizer_path "./model/sd-vae-ft-ema" \
--txt_file "text_prompts.txt" \ # put your text prompts in this file
--model_path "model/Edgen_1024px_v1.pth" \
--save_folder "output/test_model"
Citation
Shoutouts
- This code builds heavily on PixArt-$\alpha$. Thanks for open-sourcing!