|
# [CVPR2024] StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On |
|
This repository is the official implementation of [StableVITON](https://arxiv.org/abs/2312.01725) |
|
|
|
> **StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On**<br> |
|
> [Jeongho Kim](https://scholar.google.co.kr/citations?user=ucoiLHQAAAAJ&hl=ko), [Gyojung Gu](https://www.linkedin.com/in/gyojung-gu-29033118b/), [Minho Park](https://pmh9960.github.io/), [Sunghyun Park](https://psh01087.github.io/), [Jaegul Choo](https://sites.google.com/site/jaegulchoo/) |
|
|
|
[[Arxiv Paper](https://arxiv.org/abs/2312.01725)] |
|
[[Website Page](https://rlawjdghek.github.io/StableVITON/)] |
|
|
|
 |
|
|
|
## TODO List |
|
- [x] ~~Inference code~~ |
|
- [x] ~~Release model weights~~ |
|
- [x] ~~Training code~~ |
|
|
|
## Environments |
|
```bash |
|
git clone https://github.com/rlawjdghek/StableVITON |
|
cd StableVITON |
|
|
|
conda create --name StableVITON python=3.10 -y |
|
conda activate StableVITON |
|
|
|
# install packages |
|
pip install torch==2.0.0+cu117 torchvision==0.15.1+cu117 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu117 |
|
pip install pytorch-lightning==1.5.0 |
|
pip install einops |
|
pip install opencv-python==4.7.0.72 |
|
pip install matplotlib |
|
pip install omegaconf |
|
pip install albumentations |
|
pip install transformers==4.33.2 |
|
pip install xformers==0.0.19 |
|
pip install triton==2.0.0 |
|
pip install open-clip-torch==2.19.0 |
|
pip install diffusers==0.20.2 |
|
pip install scipy==1.10.1 |
|
conda install -c anaconda ipython -y |
|
``` |
|
|
|
## Weights and Data |
|
Our [checkpoint](https://kaistackr-my.sharepoint.com/:f:/g/personal/rlawjdghek_kaist_ac_kr/EjzAZHJu9MlEoKIxG4tqPr0BM_Ry20NHyNw5Sic2vItxiA?e=5mGa1c) on VITONHD have been released! <br> |
|
You can download the VITON-HD dataset from [here](https://github.com/shadow2496/VITON-HD).<br> |
|
For both training and inference, the following dataset structure is required: |
|
|
|
``` |
|
train |
|
|-- image |
|
|-- image-densepose |
|
|-- agnostic |
|
|-- agnostic-mask |
|
|-- cloth |
|
|-- cloth_mask |
|
|-- gt_cloth_warped_mask (for ATV loss) |
|
|
|
test |
|
|-- image |
|
|-- image-densepose |
|
|-- agnostic |
|
|-- agnostic-mask |
|
|-- cloth |
|
|-- cloth_mask |
|
``` |
|
|
|
## Preprocessing |
|
The VITON-HD dataset serves as a benchmark and provides an agnostic mask. However, you can attempt virtual try-on on **arbitrary images** using segmentation tools like [SAM](https://github.com/facebookresearch/segment-anything). Please note that for densepose, you should use the same densepose model as used in VITON-HD. |
|
|
|
## Inference |
|
```bash |
|
#### paired |
|
CUDA_VISIBLE_DEVICES=4 python inference.py \ |
|
--config_path ./configs/VITONHD.yaml \ |
|
--batch_size 4 \ |
|
--model_load_path <model weight path> \ |
|
--save_dir <save directory> |
|
|
|
#### unpaired |
|
CUDA_VISIBLE_DEVICES=4 python inference.py \ |
|
--config_path ./configs/VITONHD.yaml \ |
|
--batch_size 4 \ |
|
--model_load_path <model weight path> \ |
|
--unpair \ |
|
--save_dir <save directory> |
|
|
|
#### paired repaint |
|
CUDA_VISIBLE_DEVICES=4 python inference.py \ |
|
--config_path ./configs/VITONHD.yaml \ |
|
--batch_size 4 \ |
|
--model_load_path <model weight path>t \ |
|
--repaint \ |
|
--save_dir <save directory> |
|
|
|
#### unpaired repaint |
|
CUDA_VISIBLE_DEVICES=4 python inference.py \ |
|
--config_path ./configs/VITONHD.yaml \ |
|
--batch_size 4 \ |
|
--model_load_path <model weight path> \ |
|
--unpair \ |
|
--repaint \ |
|
--save_dir <save directory> |
|
``` |
|
|
|
You can also preserve the unmasked region by '--repaint' option. |
|
|
|
## Training |
|
For VITON training, we increased the first block of U-Net from 9 to 13 channels (add zero conv) based on the Paint-by-Example (PBE) model. Therefore, you should download the modified checkpoint (named as 'VITONHD_PBE_pose.ckpt') from the [Link](https://kaistackr-my.sharepoint.com/:f:/g/personal/rlawjdghek_kaist_ac_kr/EjzAZHJu9MlEoKIxG4tqPr0BM_Ry20NHyNw5Sic2vItxiA?e=5mGa1c) and place it in the './ckpts/' folder first. |
|
|
|
Additionally, for more refined person texture, we utilized a VAE fine-tuned on the VITONHD dataset. You should also download the checkpoint (named as VITONHD_VAE_finetuning.ckpt') from the [Link](https://kaistackr-my.sharepoint.com/:f:/g/personal/rlawjdghek_kaist_ac_kr/EjzAZHJu9MlEoKIxG4tqPr0BM_Ry20NHyNw5Sic2vItxiA?e=5mGa1c) and place it in the './ckpts/' folder. |
|
|
|
```bash |
|
### Base model training |
|
CUDA_VISIBLE_DEVICES=3,4 python train.py \ |
|
--config_name VITONHD \ |
|
--transform_size shiftscale3 hflip \ |
|
--transform_color hsv bright_contrast \ |
|
--save_name Base_test |
|
|
|
### ATV loss finetuning |
|
CUDA_VISIBLE_DEVICES=5,6 python train.py \ |
|
--config_name VITONHD \ |
|
--transform_size shiftscale3 hflip \ |
|
--transform_color hsv bright_contrast \ |
|
--use_atv_loss \ |
|
--resume_path <first stage model path> \ |
|
--save_name ATVloss_test |
|
``` |
|
|
|
## Citation |
|
If you find our work useful for your research, please cite us: |
|
``` |
|
@artical{kim2023stableviton, |
|
title={StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On}, |
|
author={Kim, Jeongho and Gu, Gyojung and Park, Minho and Park, Sunghyun and Choo, Jaegul}, |
|
booktitle={arXiv preprint arxiv:2312.01725}, |
|
year={2023} |
|
} |
|
``` |
|
|
|
**Acknowledgements** Sunghyun Park is the corresponding author. |
|
|
|
## License |
|
Licensed under the CC BY-NC-SA 4.0 license (https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode). |