XiangpengYang commited on
Commit
4ef0bb0
Β·
1 Parent(s): 2f117dd
Files changed (1) hide show
  1. README.md +12 -370
README.md CHANGED
@@ -1,370 +1,12 @@
1
- # VideoGrain: Modulating Space-Time Attention for Multi-Grained Video Editing (ICLR 2025)
2
- ## [<a href="https://knightyxp.github.io/VideoGrain_project_page/" target="_blank">Project Page</a>]
3
-
4
- [![arXiv](https://img.shields.io/badge/arXiv-2502.17258-B31B1B.svg)](https://arxiv.org/abs/2502.17258)
5
- [![HuggingFace Daily Papers Top1](https://img.shields.io/static/v1?label=HuggingFace%20Daily%20Papers&message=Top1&color=blue)](https://huggingface.co/papers/2502.17258)
6
- [![Project page](https://img.shields.io/badge/Project-Page-brightgreen)](https://knightyxp.github.io/VideoGrain_project_page/)
7
- [![Full Data](https://img.shields.io/badge/Full-Data-brightgreen)](https://drive.google.com/file/d/1dzdvLnXWeMFR3CE2Ew0Bs06vyFSvnGXA/view?usp=drive_link)
8
- ![visitors](https://visitor-badge.laobi.icu/badge?page_id=knightyxp.VideoGrain&left_color=green&right_color=red)
9
- [![Demo Video - VideoGrain](https://img.shields.io/badge/Demo_Video-VideoGrain-red)](https://www.youtube.com/watch?v=XEM4Pex7F9E)
10
-
11
-
12
- ## Introduction
13
- VideoGrain is a zero-shot method for class-level, instance-level, and part-level video editing.
14
- - **Multi-grained Video Editing**
15
- - class-level: Editing objects within the same class (previous SOTA limited to this level)
16
- - instance-level: Editing each individual instance to distinct object
17
- - part-level: Adding new objects or modifying existing attributes at the part-level
18
- - **Training-Free**
19
- - Does not require any training/fine-tuning
20
- - **One-Prompt Multi-region Control & Deep investigations about cross/self attn**
21
- - modulating cross-attn for multi-regions control (visualizations available)
22
- - modulating self-attn for feature decoupling (clustering are available)
23
-
24
- <table class="center" border="1" cellspacing="0" cellpadding="5">
25
- <tr>
26
- <td colspan="2" style="text-align:center;"><img src="assets/teaser/class_level.gif" style="width:250px; height:auto;"></td>
27
- <td colspan="2" style="text-align:center;"><img src="assets/teaser/instance_part.gif" style="width:250px; height:auto;"></td>
28
- <td colspan="2" style="text-align:center;"><img src="assets/teaser/2monkeys.gif" style="width:250px; height:auto;"></td>
29
- </tr>
30
- <tr>
31
- <!-- <td colspan="1" style="text-align:right; width:125px;"> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</td> -->
32
- <td colspan="2" style="text-align:right; width:250px;"> class level</td>
33
- <td colspan="1" style="text-align:center; width:125px;">instance level</td>
34
- <td colspan="1" style="text-align:center; width:125px;">part level</td>
35
- <td colspan="2" style="text-align:center; width:250px;">animal instances</td>
36
- </tr>
37
-
38
- <tr>
39
- <td colspan="2" style="text-align:center;"><img src="assets/teaser/2cats.gif" style="width:250px; height:auto;"></td>
40
- <td colspan="2" style="text-align:center;"><img src="assets/teaser/soap-box.gif" style="width:250px; height:auto;"></td>
41
- <td colspan="2" style="text-align:center;"><img src="assets/teaser/man-text-message.gif" style="width:250px; height:auto;"></td>
42
- </tr>
43
- <tr>
44
- <td colspan="2" style="text-align:center; width:250px;">animal instances</td>
45
- <td colspan="2" style="text-align:center; width:250px;">human instances</td>
46
- <td colspan="2" style="text-align:center; width:250px;">part-level modification</td>
47
- </tr>
48
- </table>
49
-
50
- ## πŸ“€ Demo Video
51
- <!-- [![Demo Video of VideoGrain](https://res.cloudinary.com/dii3btvh8/image/upload/v1740987943/cover_video_y6cjfe.png)](https://www.youtube.com/watch?v=XEM4Pex7F9E "Demo Video of VideoGrain") -->
52
- https://github.com/user-attachments/assets/9bec92fc-21bd-4459-86fa-62404d8762bf
53
-
54
-
55
- ## πŸ“£ News
56
- * **[2025/2/25]** Our VideoGrain is posted and recommended by Gradio on [LinkedIn](https://www.linkedin.com/posts/gradio_just-dropped-videograin-a-new-zero-shot-activity-7300094635094261760-hoiE) and [Twitter](https://x.com/Gradio/status/1894328911154028566), and recommended by [AK](https://x.com/_akhaliq/status/1894254599223017622).
57
- * **[2025/2/25]** Our VideoGrain is submited by AK to [HuggingFace-daily papers](https://huggingface.co/papers?date=2025-02-25), and rank [#1](https://huggingface.co/papers/2502.17258) paper of that day.
58
- * **[2025/2/24]** We release our paper on [arxiv](https://arxiv.org/abs/2502.17258), we also release [code](https://github.com/knightyxp/VideoGrain) and [full-data](https://drive.google.com/file/d/1dzdvLnXWeMFR3CE2Ew0Bs06vyFSvnGXA/view?usp=drive_link) on google drive.
59
- * **[2025/1/23]** Our paper is accepted to [ICLR2025](https://openreview.net/forum?id=SSslAtcPB6)! Welcome to **watch** πŸ‘€ this repository for the latest updates.
60
-
61
-
62
- ## 🍻 Setup Environment
63
- Our method is tested using cuda12.1, fp16 of accelerator and xformers on a single L40.
64
-
65
- ```bash
66
- # Step 1: Create and activate Conda environment
67
- conda create -n videograin python==3.10
68
- conda activate videograin
69
-
70
- # Step 2: Install PyTorch, CUDA and Xformers
71
- conda install pytorch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 pytorch-cuda=12.1 -c pytorch -c nvidia
72
- pip install --pre -U xformers==0.0.27
73
- # Step 3: Install additional dependencies with pip
74
- pip install -r requirements.txt
75
- ```
76
-
77
- `xformers` is recommended to save memory and running time.
78
-
79
- </details>
80
-
81
- You may download all the base model checkpoints using the following bash command
82
- ```bash
83
- ## download sd 1.5, controlnet depth/pose v10/v11
84
- bash download_all.sh
85
- ```
86
-
87
- <details><summary>Click for ControlNet annotator weights (if you can not access to huggingface)</summary>
88
-
89
- You can download all the annotator checkpoints (such as DW-Pose, depth_zoe, depth_midas, and OpenPose, cost around 4G) from [baidu](https://pan.baidu.com/s/1sgBFLFkdTCDTn4oqHjGb9A?pwd=pdm5) or [google](https://drive.google.com/file/d/1qOsmWshnFMMr8x1HteaTViTSQLh_4rle/view?usp=drive_link)
90
- Then extract them into ./annotator/ckpts
91
-
92
- </details>
93
-
94
- ## ⚑️ Prepare all the data
95
-
96
- ### Full VideoGrain Data
97
- We have provided `all the video data and layout masks in VideoGrain` at following link. Please download unzip the data and put them in the `./data' root directory.
98
- ```
99
- gdown https://drive.google.com/file/d/1dzdvLnXWeMFR3CE2Ew0Bs06vyFSvnGXA/view?usp=drive_link
100
- tar -zxvf videograin_data.tar.gz
101
- ```
102
- ### Customize Your Own Data
103
- **prepare video to frames**
104
- If the input video is mp4 file, using the following command to process it to frames:
105
- ```bash
106
- python image_util/sample_video2frames.py --video_path 'your video path' --output_dir './data/video_name/video_name'
107
- ```
108
- **prepare layout masks**
109
- We segment videos using our ReLER lab's [SAM-Track](https://github.com/z-x-yang/Segment-and-Track-Anything). I suggest using the `app.py` in SAM-Track for `graio` mode to manually select which region in the video your want to edit. Here, we also provided an script ` image_util/process_webui_mask.py` to process masks from SAM-Track path to VideoGrain path.
110
-
111
-
112
- ## πŸ”₯πŸ”₯πŸ”₯ VideoGrain Editing
113
-
114
- ### 🎨 Inference
115
- Your can reproduce the instance + part level results in our teaser by running:
116
-
117
- ```bash
118
- bash test.sh
119
- #or
120
- CUDA_VISIBLE_DEVICES=0 accelerate launch test.py --config config/part_level/adding_new_object/run_two_man/spider_polar_sunglass.yaml
121
- ```
122
-
123
- For other instance/part/class results in VideoGrain project page or teaser, we provide all the data (video frames and layout masks) and corresponding configs to reproduce, check results in [πŸš€Multi-Grained Video Editing](#multi-grained-video-editing-results).
124
-
125
- <details><summary>The result is saved at `./result` . (Click for directory structure) </summary>
126
-
127
- ```
128
- result
129
- β”œβ”€β”€ run_two_man
130
- β”‚ β”œβ”€β”€ control # control conditon
131
- β”‚ β”œβ”€β”€ infer_samples
132
- β”‚ β”œβ”€β”€ input # the input video frames
133
- β”‚ β”œβ”€β”€ masked_video.mp4 # check whether edit regions are accuratedly covered
134
- β”‚ β”œβ”€β”€ sample
135
- β”‚ β”œβ”€β”€ step_0 # result image folder
136
- β”‚ β”œβ”€β”€ step_0.mp4 # result video
137
- β”‚ β”œβ”€β”€ source_video.mp4 # the input video
138
- β”‚ β”œβ”€β”€ visualization_denoise # cross attention weight
139
- β”‚ β”œβ”€β”€ sd_study # cluster inversion feature
140
- ```
141
- </details>
142
-
143
-
144
- ## Editing guidance for YOUR Video
145
- ### πŸ”›prepare your config
146
-
147
- VideoGrain is a training-free framework. To run VideoGrain on your video, modify `./config/demo_config.yaml` based on your needs:
148
-
149
- 1. Replace your pretrained model path and controlnet path in your config. you can change the control_type to `dwpose` or `depth_zoe` or `depth`(midas).
150
- 2. Prepare your video frames and layout masks (edit regions) using SAM-Track or SAM2 in dataset config.
151
- 3. Change the `prompt`, and extract each `local prompt` in the editing prompts. the local prompt order should be same as layout masks order.
152
- 4. Your can change flatten resolution with 1->64, 2->16, 4->8. (commonly, flatten at 64 worked best)
153
- 5. To ensure temporal consistency, you can set `use_pnp: True` and `inject_step:5/10`. (Note: pnp>10 steps will be bad for multi-regions editing)
154
- 6. If you want to visualize the cross attn weight, set `vis_cross_attn: True`
155
- 7. If you want to cluster DDIM Inversion spatial temporal video feature, set `cluster_inversion_feature: True`
156
-
157
- ### 😍Editing your video
158
-
159
- ```bash
160
- bash test.sh
161
- #or
162
- CUDA_VISIBLE_DEVICES=0 accelerate launch test.py --config /path/to/the/config
163
- ```
164
-
165
- ## πŸš€Multi-Grained Video Editing Results
166
-
167
- ### 🌈 Multi-Grained Definition
168
- You can get multi-grained definition result, using the following command:
169
- ```bash
170
- CUDA_VISIBLE_DEVICES=0 accelerate launch test.py --config /config/class_level/running_two_man/man2spider.yaml #class-level
171
- # /config/instance_level/running_two_man/4cls_spider_polar.yaml #instance-level
172
- #config/part_level/adding_new_object/run_two_man/spider_polar_sunglass.yaml #part-level
173
- ```
174
- <table class="center">
175
- <tr>
176
- <td width=25% style="text-align:center;">source video</td>
177
- <td width=25% style="text-align:center;">class level</td>
178
- <td width=25% style="text-align:center;">instance level</td>
179
- <td width=25% style="text-align:center;">part level</td>
180
- </tr>
181
- <tr>
182
- <td><img src="./assets/teaser/run_two_man.gif"></td>
183
- <td><img src="./assets/teaser/class_level_0.gif"></td>
184
- <td><img src="./assets/teaser/instance_level.gif"></td>
185
- <td><img src="./assets/teaser/part_level.gif"></td>
186
- </tr>
187
- </table>
188
-
189
- ## πŸ’ƒ Instance-level Video Editing
190
- You can get instance-level video editing results, using the following command:
191
- ```bash
192
- CUDA_VISIBLE_DEVICES=0 accelerate launch test.py --config config/instance_level/running_two_man/running_3cls_iron_spider.yaml
193
- ```
194
-
195
- <table class="center">
196
- <tr>
197
- <td width=50% style="text-align:center;">running_two_man/3cls_iron_spider.yaml</td>
198
- <td width=50% style="text-align:center;">2_monkeys/2cls_teddy_bear_koala.yaml</td>
199
- </tr>
200
- <tr>
201
- <td><img src="assets/instance-level/left_iron_right_spider.gif"></td>
202
- <td><img src="assets/instance-level/teddy_koala.gif"></td>
203
- </tr>
204
- <tr>
205
- <td width=50% style="text-align:center;">badminton/2cls_wonder_woman_spiderman.yaml</td>
206
- <td width=50% style="text-align:center;">soap-box/soap-box.yaml</td>
207
- </tr>
208
- <tr>
209
- <td><img src="assets/instance-level/badminton.gif"></td>
210
- <td><img src="assets/teaser/soap-box.gif"></td>
211
- </tr>
212
- <tr>
213
- <td width=50% style="text-align:center;">2_cats/4cls_panda_vs_poddle.yaml</td>
214
- <td width=50% style="text-align:center;">2_cars/left_firetruck_right_bus.yaml</td>
215
- </tr>
216
- <tr>
217
- <td><img src="assets/instance-level/panda_vs_poddle.gif"></td>
218
- <td><img src="assets/instance-level/2cars.gif"></td>
219
- </tr>
220
- </table>
221
-
222
- ## πŸ•Ί Part-level Video Editing
223
- You can get part-level video editing results, using the following command:
224
- ```bash
225
- CUDA_VISIBLE_DEVICES=0 accelerate launch test.py --config config/part_level/modification/man_text_message/blue_shirt.yaml
226
- ```
227
-
228
- <table class="center">
229
- <tr>
230
- <td><img src="assets/part-level/man_text_message.gif"></td>
231
- <td><img src="assets/part-level/blue-shirt.gif"></td>
232
- <td><img src="assets/part-level/black-suit.gif"></td>
233
- <td><img src="assets/part-level/cat_flower.gif"></td>
234
- <td><img src="assets/part-level/ginger_head.gif"></td>
235
- <td><img src="assets/part-level/ginger_body.gif"></td>
236
- </tr>
237
- <tr>
238
- <td width=15% style="text-align:center;">source video</td>
239
- <td width=15% style="text-align:center;">blue shirt</td>
240
- <td width=15% style="text-align:center;">black suit</td>
241
- <td width=15% style="text-align:center;">source video</td>
242
- <td width=15% style="text-align:center;">ginger head </td>
243
- <td width=15% style="text-align:center;">ginger body</td>
244
- </tr>
245
- <tr>
246
- <td><img src="assets/part-level/man_text_message.gif"></td>
247
- <td><img src="assets/part-level/superman.gif"></td>
248
- <td><img src="assets/part-level/superman+cap.gif"></td>
249
- <td><img src="assets/part-level/spin-ball.gif"></td>
250
- <td><img src="assets/part-level/superman_spin.gif"></td>
251
- <td><img src="assets/part-level/super_sunglass_spin.gif"></td>
252
- </tr>
253
- <tr>
254
- <td width=15% style="text-align:center;">source video</td>
255
- <td width=15% style="text-align:center;">superman</td>
256
- <td width=15% style="text-align:center;">superman + cap</td>
257
- <td width=15% style="text-align:center;">source video</td>
258
- <td width=15% style="text-align:center;">superman </td>
259
- <td width=15% style="text-align:center;">superman + sunglasses</td>
260
- </tr>
261
- </table>
262
-
263
- ## πŸ₯³ Class-level Video Editing
264
- You can get class-level video editing results, using the following command:
265
- ```bash
266
- CUDA_VISIBLE_DEVICES=0 accelerate launch test.py --config config/class_level/wolf/wolf.yaml
267
- ```
268
-
269
- <table class="center">
270
- <tr>
271
- <td><img src="assets/class-level/wolf.gif"></td>
272
- <td><img src="assets/class-level/pig.gif"></td>
273
- <td><img src="assets/class-level/husky.gif"></td>
274
- <td><img src="assets/class-level/bear.gif"></td>
275
- <td><img src="assets/class-level/tiger.gif"></td>
276
- </tr>
277
- <tr>
278
- <td width=15% style="text-align:center;">input</td>
279
- <td width=15% style="text-align:center;">pig</td>
280
- <td width=15% style="text-align:center;">husky</td>
281
- <td width=15% style="text-align:center;">bear</td>
282
- <td width=15% style="text-align:center;">tiger</td>
283
- </tr>
284
- <tr>
285
- <td><img src="assets/class-level/tennis.gif"></td>
286
- <td><img src="assets/class-level/tennis_1cls.gif"></td>
287
- <td><img src="assets/class-level/tennis_3cls.gif"></td>
288
- <td><img src="assets/class-level/car-1.gif"></td>
289
- <td><img src="assets/class-level/posche.gif"></td>
290
- </tr>
291
- <tr>
292
- <td width=15% style="text-align:center;">input</td>
293
- <td width=15% style="text-align:center;">iron man</td>
294
- <td width=15% style="text-align:center;">Batman + snow court + iced wall</td>
295
- <td width=15% style="text-align:center;">input </td>
296
- <td width=15% style="text-align:center;">posche</td>
297
- </tr>
298
- </table>
299
-
300
-
301
- ## Soely Edit on specific subjects, keep background unchanged
302
- You can get soely video editing results, using the following command:
303
- ```bash
304
- CUDA_VISIBLE_DEVICES=0 accelerate launch test.py --config config/instance_level/soely_edit/only_left.yaml
305
- #--config config/instance_level/soely_edit/only_right.yaml
306
- #--config config/instance_level/soely_edit/joint_edit.yaml
307
- ```
308
-
309
- <table class="center">
310
- <tr>
311
- <td><img src="assets/soely_edit/input.gif"></td>
312
- <td><img src="assets/soely_edit/left.gif"></td>
313
- <td><img src="assets/soely_edit/right.gif"></td>
314
- <td><img src="assets/soely_edit/joint.gif"></td>
315
- </tr>
316
- <tr>
317
- <td width=25% style="text-align:center;">source video</td>
318
- <td width=25% style="text-align:center;">left→Iron Man</td>
319
- <td width=25% style="text-align:center;">right→Spiderman</td>
320
- <td width=25% style="text-align:center;">joint edit</td>
321
- </tr>
322
- </table>
323
-
324
- ## πŸ” Visualize Cross Attention Weight
325
- You can get visulize attention weight editing results, using the following command:
326
- ```bash
327
- #setting vis_cross_attn: True in your config
328
- CUDA_VISIBLE_DEVICES=0 accelerate launch test.py --config config/instance_level/running_two_man/3cls_spider_polar_vis_weight.yaml
329
- ```
330
-
331
- <table class="center">
332
- <tr>
333
- <td><img src="assets/soely_edit/input.gif"></td>
334
- <td><img src="assets/vis/edit.gif"></td>
335
- <td><img src="assets/vis/spiderman_weight.gif"></td>
336
- <td><img src="assets/vis/bear_weight.gif"></td>
337
- <td><img src="/assets/vis/cherry_weight.gif"></td>
338
- </tr>
339
- <tr>
340
- <td width=20% style="text-align:center;">source video</td>
341
- <td width=20% style="text-align:center;">left→spiderman, right→polar bear, trees→cherry blossoms</td>
342
- <td width=20% style="text-align:center;">spiderman weight</td>
343
- <td width=20% style="text-align:center;">bear weight</td>
344
- <td width=20% style="text-align:center;">cherry weight</td>
345
- </tr>
346
- </table>
347
-
348
- ## ✏️ Citation
349
- If you think this project is helpful, please feel free to leave a star⭐️⭐️⭐️ and cite our paper:
350
- ```bibtex
351
- @article{yang2025videograin,
352
- title={VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing},
353
- author={Yang, Xiangpeng and Zhu, Linchao and Fan, Hehe and Yang, Yi},
354
- journal={arXiv preprint arXiv:2502.17258},
355
- year={2025}
356
- }
357
- ```
358
-
359
- ## πŸ“ž Contact Authors
360
- Xiangpeng Yang [@knightyxp](https://github.com/knightyxp), email: [email protected]/[email protected]
361
-
362
- ## ✨ Acknowledgements
363
-
364
- - This code builds on [diffusers](https://github.com/huggingface/diffusers), and [FateZero](https://github.com/ChenyangQiQi/FateZero). Thanks for open-sourcing!
365
- - We would like to thank [AK(@_akhaliq)](https://x.com/_akhaliq/status/1894254599223017622) and Gradio team for recommendation!
366
-
367
-
368
- ## ⭐️ Star History
369
-
370
- [![Star History Chart](https://api.star-history.com/svg?repos=knightyxp/VideoGrain&type=Date)](https://star-history.com/#knightyxp/VideoGrain&Date)
 
1
+ ---
2
+ title: VideoGrain
3
+ emoji: πŸ”₯
4
+ colorFrom: gray
5
+ colorTo: pink
6
+ sdk: gradio
7
+ sdk_version: 4.10.0
8
+ app_file: app.py
9
+ pinned: false
10
+ license: mit
11
+ ---
12
+ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference