Spaces:

XiangpengYang
/

VideoGrain

Configuration error

App Files Files Community

XiangpengYang commited on Mar 1

Commit

c7c6869

1 Parent(s): b0f868c

new

Browse files

Files changed (6) hide show

README.md +15 -21
annotator/dwpose/__pycache__/wholebody.cpython-310.pyc +0 -0
annotator/dwpose/wholebody.py +18 -1
config/instance_level/running_two_man/running_3cls_polar_spider_vis_weight.yaml +1 -1
requirements.txt +2 -1
video_diffusion/common/__pycache__/image_util.cpython-310.pyc +0 -0

README.md CHANGED Viewed

@@ -43,7 +43,7 @@ https://github.com/user-attachments/assets/dc54bc11-48cc-4814-9879-bf2699ee9d1d
 * **[2025/1/23]**  Our paper is accepted to [ICLR2025](https://openreview.net/forum?id=SSslAtcPB6)! Welcome to **watch** 👀 this repository for the latest updates.
-## ▶️ Setup Environment
 Our method is tested using cuda12.1, fp16 of accelerator and xformers on a single L40.
 ```bash
@@ -68,23 +68,12 @@ You may download all the base model checkpoints using the following bash command
 bash download_all.sh
 ```
-Prepare ControlNet annotator weights (e.g., DW-Pose, depth_zoe, depth_midas, OpenPose)
-```
-mkdir annotator/ckpts
-```
-Method 1: Download dwpose models
-(Note: if your are avaiable to huggingface, other models like depth_zoe etc can be automatically downloaded)
-Download dwpose model dw-ll_ucoco_384.onnx ([baidu](https://pan.baidu.com/s/1nuBjw-KKSxD_BkpmwXUJiw?pwd=28d7), [google](https://drive.google.com/file/d/12L8E2oAgZy4VACGSK9RaZBZrfgx7VTA2/view?usp=sharing)) and Det model yolox_l.onnx ([baidu](https://pan.baidu.com/s/1fpfIVpv5ypo4c1bUlzkMYQ?pwd=mjdn), [google](https://drive.google.com/file/d/1w9pXC8tT0p9ndMN-CArp1__b2GbzewWI/view?usp=sharing)),
-Then put them into ./annotator/ckpts.
-Method 2: Download all annotator checkpoints from google or baiduyun (when can not access to huggingface)
-If you cannot access HuggingFace, you can download all the annotator checkpoints (such as DW-Pose, depth_zoe, depth_midas, and OpenPose, cost around 4G.) from [baidu](https://pan.baidu.com/s/1sgBFLFkdTCDTn4oqHjGb9A?pwd=pdm5) or [google](https://drive.google.com/file/d/1qOsmWshnFMMr8x1HteaTViTSQLh_4rle/view?usp=drive_link)
 Then extract them into ./annotator/ckpts
 ## 🔛 Prepare all the data
@@ -95,11 +84,12 @@ tar -zxvf videograin_data.tar.gz
 ## 🔥 VideoGrain Editing
-You could reproduce multi-grained editing results in our teaser by running:
 ```bash
 bash test.sh
-#or accelerate launch test.py --config config/instance_level/running_two_man/running_3cls_polar_spider_vis_weight.yaml
 ```
 <details><summary>The result is saved at `./result` . (Click for directory structure) </summary>
@@ -107,12 +97,16 @@ bash test.sh
 ```
 result
 ├── run_two_man
 │   ├── infer_samples
 │   ├── sample
-│           ├── step_0         # result image folder
-│           ├── step_0.mp4       # result video
-│           ├── source_video.mp4    # the input video
 ```
 </details>

 * **[2025/1/23]**  Our paper is accepted to [ICLR2025](https://openreview.net/forum?id=SSslAtcPB6)! Welcome to **watch** 👀 this repository for the latest updates.
+## 🍻 Setup Environment
 Our method is tested using cuda12.1, fp16 of accelerator and xformers on a single L40.
 ```bash
 bash download_all.sh
 ```
+<details><summary>Click for ControlNet annotator weights (if you can not access to huggingface)</summary>
+You can download all the annotator checkpoints (such as DW-Pose, depth_zoe, depth_midas, and OpenPose, cost around 4G.) from [baidu](https://pan.baidu.com/s/1sgBFLFkdTCDTn4oqHjGb9A?pwd=pdm5) or [google](https://drive.google.com/file/d/1qOsmWshnFMMr8x1HteaTViTSQLh_4rle/view?usp=drive_link)
 Then extract them into ./annotator/ckpts
+</details>
 ## 🔛 Prepare all the data
 ## 🔥 VideoGrain Editing
+### Inference
+VideoGrain is a training-free framework. To run the inference script, use the following command:
 ```bash
 bash test.sh
+or accelerate launch test.py --config config/part_level/adding_new_object/run_two_man/running_spider_polar_sunglass.yaml
 ```
 <details><summary>The result is saved at `./result` . (Click for directory structure) </summary>
 ```
 result
 ├── run_two_man
+│   ├── control                # control conditon
 │   ├── infer_samples
+│           ├── input             # the input video frames
+│           ├── masked_video.mp4    # check whether edit regions are accuratedly covered
 │   ├── sample
+│           ├── step_0                  # result image folder
+│           ├── step_0.mp4              # result video
+│           ├── source_video.mp4        # the input video
+│           ├── visualization_denoise   # cross attention weight
+│           ├── sd_study                # cluster inversion feature
 ```
 </details>

annotator/dwpose/__pycache__/wholebody.cpython-310.pyc CHANGED Viewed

Binary files a/annotator/dwpose/__pycache__/wholebody.cpython-310.pyc and b/annotator/dwpose/__pycache__/wholebody.cpython-310.pyc differ

annotator/dwpose/wholebody.py CHANGED Viewed

@@ -1,15 +1,32 @@
 import cv2
 import numpy as np
 import onnxruntime as ort
 from .onnxdet import inference_detector
 from .onnxpose import inference_pose
 class Wholebody:
     def __init__(self):
         device = 'cuda:0'
         providers = ['CPUExecutionProvider'
                  ] if device == 'cpu' else ['CUDAExecutionProvider']
         onnx_det = 'annotator/ckpts/yolox_l.onnx'
         onnx_pose = 'annotator/ckpts/dw-ll_ucoco_384.onnx'

 import cv2
 import numpy as np
+import os
+os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE"
 import onnxruntime as ort
 from .onnxdet import inference_detector
 from .onnxpose import inference_pose
+from annotator.util import annotator_ckpts_path
 class Wholebody:
     def __init__(self):
         device = 'cuda:0'
         providers = ['CPUExecutionProvider'
                  ] if device == 'cpu' else ['CUDAExecutionProvider']
+        remote_dw_pose_path = "https://huggingface.co/sxela/dwpose_ckpts/resolve/main/dw-ll_ucoco_384.onnx"
+        remote_yolox_path = "https://huggingface.co/sxela/dwpose_ckpts/resolve/main/yolox_l.onnx"
+        dw_pose_path = os.path.join(annotator_ckpts_path, "dw-ll_ucoco_384.onnx")
+        yolox_path = os.path.join(annotator_ckpts_path, "yolox_l.onnx")
+        if not os.path.exists(dw_pose_path):
+            from basicsr.utils.download_util import load_file_from_url
+            load_file_from_url(remote_dw_pose_path, model_dir=annotator_ckpts_path)
+        if not os.path.exists(yolox_path):
+            from basicsr.utils.download_util import load_file_from_url
+            load_file_from_url(remote_yolox_path, model_dir=annotator_ckpts_path)
         onnx_det = 'annotator/ckpts/yolox_l.onnx'
         onnx_pose = 'annotator/ckpts/dw-ll_ucoco_384.onnx'

config/instance_level/running_two_man/running_3cls_polar_spider_vis_weight.yaml CHANGED Viewed

@@ -1,5 +1,5 @@
 pretrained_model_path: "./ckpt/stable-diffusion-v1-5"
-logdir: ./result/run_two_man/instance_level/3cls_vis_cross_attn_flag_test
 dataset_config:
     path: "data/run_two_man/run_two_man_fr2"

 pretrained_model_path: "./ckpt/stable-diffusion-v1-5"
+logdir: ./result/run_two_man/instance_level/3cls_spider_polar_vis_cross_attn
 dataset_config:
     path: "data/run_two_man/run_two_man_fr2"

requirements.txt CHANGED Viewed

@@ -65,4 +65,5 @@ scikit-learn==1.2.2
 nltk==3.8.1
 timm==0.6.7
 scikit-image==0.24.0
-gdown==5.1.0

 nltk==3.8.1
 timm==0.6.7
 scikit-image==0.24.0
+gdown==5.1.0
+basicsr-fixed

video_diffusion/common/__pycache__/image_util.cpython-310.pyc CHANGED Viewed

Binary files a/video_diffusion/common/__pycache__/image_util.cpython-310.pyc and b/video_diffusion/common/__pycache__/image_util.cpython-310.pyc differ