XiangpengYang commited on
Commit
c7c6869
Β·
1 Parent(s): b0f868c
README.md CHANGED
@@ -43,7 +43,7 @@ https://github.com/user-attachments/assets/dc54bc11-48cc-4814-9879-bf2699ee9d1d
43
  * **[2025/1/23]** Our paper is accepted to [ICLR2025](https://openreview.net/forum?id=SSslAtcPB6)! Welcome to **watch** πŸ‘€ this repository for the latest updates.
44
 
45
 
46
- ## ▢️ Setup Environment
47
  Our method is tested using cuda12.1, fp16 of accelerator and xformers on a single L40.
48
 
49
  ```bash
@@ -68,23 +68,12 @@ You may download all the base model checkpoints using the following bash command
68
  bash download_all.sh
69
  ```
70
 
71
- Prepare ControlNet annotator weights (e.g., DW-Pose, depth_zoe, depth_midas, OpenPose)
72
 
73
- ```
74
- mkdir annotator/ckpts
75
- ```
76
- Method 1: Download dwpose models
77
-
78
- (Note: if your are avaiable to huggingface, other models like depth_zoe etc can be automatically downloaded)
79
-
80
- Download dwpose model dw-ll_ucoco_384.onnx ([baidu](https://pan.baidu.com/s/1nuBjw-KKSxD_BkpmwXUJiw?pwd=28d7), [google](https://drive.google.com/file/d/12L8E2oAgZy4VACGSK9RaZBZrfgx7VTA2/view?usp=sharing)) and Det model yolox_l.onnx ([baidu](https://pan.baidu.com/s/1fpfIVpv5ypo4c1bUlzkMYQ?pwd=mjdn), [google](https://drive.google.com/file/d/1w9pXC8tT0p9ndMN-CArp1__b2GbzewWI/view?usp=sharing)),
81
- Then put them into ./annotator/ckpts.
82
-
83
- Method 2: Download all annotator checkpoints from google or baiduyun (when can not access to huggingface)
84
-
85
- If you cannot access HuggingFace, you can download all the annotator checkpoints (such as DW-Pose, depth_zoe, depth_midas, and OpenPose, cost around 4G.) from [baidu](https://pan.baidu.com/s/1sgBFLFkdTCDTn4oqHjGb9A?pwd=pdm5) or [google](https://drive.google.com/file/d/1qOsmWshnFMMr8x1HteaTViTSQLh_4rle/view?usp=drive_link)
86
  Then extract them into ./annotator/ckpts
87
 
 
88
 
89
  ## πŸ”› Prepare all the data
90
 
@@ -95,11 +84,12 @@ tar -zxvf videograin_data.tar.gz
95
 
96
  ## πŸ”₯ VideoGrain Editing
97
 
98
- You could reproduce multi-grained editing results in our teaser by running:
 
99
 
100
  ```bash
101
  bash test.sh
102
- #or accelerate launch test.py --config config/instance_level/running_two_man/running_3cls_polar_spider_vis_weight.yaml
103
  ```
104
 
105
  <details><summary>The result is saved at `./result` . (Click for directory structure) </summary>
@@ -107,12 +97,16 @@ bash test.sh
107
  ```
108
  result
109
  β”œβ”€β”€ run_two_man
 
110
  β”‚ β”œβ”€β”€ infer_samples
 
 
111
  β”‚ β”œβ”€β”€ sample
112
- β”‚ β”œβ”€β”€ step_0 # result image folder
113
- β”‚ β”œβ”€β”€ step_0.mp4 # result video
114
- β”‚ β”œβ”€β”€ source_video.mp4 # the input video
115
-
 
116
  ```
117
 
118
  </details>
 
43
  * **[2025/1/23]** Our paper is accepted to [ICLR2025](https://openreview.net/forum?id=SSslAtcPB6)! Welcome to **watch** πŸ‘€ this repository for the latest updates.
44
 
45
 
46
+ ## 🍻 Setup Environment
47
  Our method is tested using cuda12.1, fp16 of accelerator and xformers on a single L40.
48
 
49
  ```bash
 
68
  bash download_all.sh
69
  ```
70
 
71
+ <details><summary>Click for ControlNet annotator weights (if you can not access to huggingface)</summary>
72
 
73
+ You can download all the annotator checkpoints (such as DW-Pose, depth_zoe, depth_midas, and OpenPose, cost around 4G.) from [baidu](https://pan.baidu.com/s/1sgBFLFkdTCDTn4oqHjGb9A?pwd=pdm5) or [google](https://drive.google.com/file/d/1qOsmWshnFMMr8x1HteaTViTSQLh_4rle/view?usp=drive_link)
 
 
 
 
 
 
 
 
 
 
 
 
74
  Then extract them into ./annotator/ckpts
75
 
76
+ </details>
77
 
78
  ## πŸ”› Prepare all the data
79
 
 
84
 
85
  ## πŸ”₯ VideoGrain Editing
86
 
87
+ ### Inference
88
+ VideoGrain is a training-free framework. To run the inference script, use the following command:
89
 
90
  ```bash
91
  bash test.sh
92
+ or accelerate launch test.py --config config/part_level/adding_new_object/run_two_man/running_spider_polar_sunglass.yaml
93
  ```
94
 
95
  <details><summary>The result is saved at `./result` . (Click for directory structure) </summary>
 
97
  ```
98
  result
99
  β”œβ”€β”€ run_two_man
100
+ β”‚ β”œβ”€β”€ control # control conditon
101
  β”‚ β”œβ”€β”€ infer_samples
102
+ β”‚ β”œβ”€β”€ input # the input video frames
103
+ β”‚ β”œβ”€β”€ masked_video.mp4 # check whether edit regions are accuratedly covered
104
  β”‚ β”œβ”€β”€ sample
105
+ β”‚ β”œβ”€β”€ step_0 # result image folder
106
+ β”‚ β”œβ”€β”€ step_0.mp4 # result video
107
+ β”‚ β”œβ”€β”€ source_video.mp4 # the input video
108
+ β”‚ β”œβ”€β”€ visualization_denoise # cross attention weight
109
+ β”‚ β”œβ”€β”€ sd_study # cluster inversion feature
110
  ```
111
 
112
  </details>
annotator/dwpose/__pycache__/wholebody.cpython-310.pyc CHANGED
Binary files a/annotator/dwpose/__pycache__/wholebody.cpython-310.pyc and b/annotator/dwpose/__pycache__/wholebody.cpython-310.pyc differ
 
annotator/dwpose/wholebody.py CHANGED
@@ -1,15 +1,32 @@
1
  import cv2
2
  import numpy as np
3
-
 
4
  import onnxruntime as ort
5
  from .onnxdet import inference_detector
6
  from .onnxpose import inference_pose
 
 
7
 
8
  class Wholebody:
9
  def __init__(self):
10
  device = 'cuda:0'
11
  providers = ['CPUExecutionProvider'
12
  ] if device == 'cpu' else ['CUDAExecutionProvider']
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  onnx_det = 'annotator/ckpts/yolox_l.onnx'
14
  onnx_pose = 'annotator/ckpts/dw-ll_ucoco_384.onnx'
15
 
 
1
  import cv2
2
  import numpy as np
3
+ import os
4
+ os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE"
5
  import onnxruntime as ort
6
  from .onnxdet import inference_detector
7
  from .onnxpose import inference_pose
8
+ from annotator.util import annotator_ckpts_path
9
+
10
 
11
  class Wholebody:
12
  def __init__(self):
13
  device = 'cuda:0'
14
  providers = ['CPUExecutionProvider'
15
  ] if device == 'cpu' else ['CUDAExecutionProvider']
16
+
17
+ remote_dw_pose_path = "https://huggingface.co/sxela/dwpose_ckpts/resolve/main/dw-ll_ucoco_384.onnx"
18
+ remote_yolox_path = "https://huggingface.co/sxela/dwpose_ckpts/resolve/main/yolox_l.onnx"
19
+
20
+ dw_pose_path = os.path.join(annotator_ckpts_path, "dw-ll_ucoco_384.onnx")
21
+ yolox_path = os.path.join(annotator_ckpts_path, "yolox_l.onnx")
22
+
23
+ if not os.path.exists(dw_pose_path):
24
+ from basicsr.utils.download_util import load_file_from_url
25
+ load_file_from_url(remote_dw_pose_path, model_dir=annotator_ckpts_path)
26
+ if not os.path.exists(yolox_path):
27
+ from basicsr.utils.download_util import load_file_from_url
28
+ load_file_from_url(remote_yolox_path, model_dir=annotator_ckpts_path)
29
+
30
  onnx_det = 'annotator/ckpts/yolox_l.onnx'
31
  onnx_pose = 'annotator/ckpts/dw-ll_ucoco_384.onnx'
32
 
config/instance_level/running_two_man/running_3cls_polar_spider_vis_weight.yaml CHANGED
@@ -1,5 +1,5 @@
1
  pretrained_model_path: "./ckpt/stable-diffusion-v1-5"
2
- logdir: ./result/run_two_man/instance_level/3cls_vis_cross_attn_flag_test
3
 
4
  dataset_config:
5
  path: "data/run_two_man/run_two_man_fr2"
 
1
  pretrained_model_path: "./ckpt/stable-diffusion-v1-5"
2
+ logdir: ./result/run_two_man/instance_level/3cls_spider_polar_vis_cross_attn
3
 
4
  dataset_config:
5
  path: "data/run_two_man/run_two_man_fr2"
requirements.txt CHANGED
@@ -65,4 +65,5 @@ scikit-learn==1.2.2
65
  nltk==3.8.1
66
  timm==0.6.7
67
  scikit-image==0.24.0
68
- gdown==5.1.0
 
 
65
  nltk==3.8.1
66
  timm==0.6.7
67
  scikit-image==0.24.0
68
+ gdown==5.1.0
69
+ basicsr-fixed
video_diffusion/common/__pycache__/image_util.cpython-310.pyc CHANGED
Binary files a/video_diffusion/common/__pycache__/image_util.cpython-310.pyc and b/video_diffusion/common/__pycache__/image_util.cpython-310.pyc differ