Update README.md
Browse files
README.md
CHANGED
@@ -25,51 +25,19 @@ This repo contains offical PyTorch model definitions, pre-trained weights and in
|
|
25 |
|
26 |
|
27 |
## π₯π₯π₯ News!!
|
|
|
28 |
* Mar 07, 2025: π₯ We have fixed the bug in our open-source version that caused ID changes. Please try the new model weights of [HunyuanVideo-I2V](https://huggingface.co/tencent/HunyuanVideo-I2V) to ensure full visual consistency in the first frame and produce higher quality videos.
|
29 |
* Mar 06, 2025: π We release the inference code and model weights of HunyuanVideo-I2V. [Download](https://github.com/Tencent/HunyuanVideo-I2V/blob/main/ckpts/README.md).
|
30 |
|
31 |
|
32 |
-
<!-- ### Frist Frame Consistency Demo
|
33 |
-
| Reference Image | Generated Video |
|
34 |
-
|:----------------:|:----------------:|
|
35 |
-
| <img src="https://github.com/user-attachments/assets/83e7a097-ffca-40db-9c72-be01d866aa7d" width="80%"> | <video src="https://github.com/user-attachments/assets/f81d2c88-bb1a-43f8-b40f-1ccc20774563" width="100%"> </video> |
|
36 |
-
ο½ <img src="https://github.com/user-attachments/assets/c385a11f-60c7-4919-b0f1-bc5e715f673c" width="80%"> | <video src="https://github.com/user-attachments/assets/0c29ede9-0481-4d40-9c67-a4b6267fdc2d" width="100%"> </video> |
|
37 |
-
ο½ <img src="https://github.com/user-attachments/assets/5763f5eb-0be5-4b36-866a-5199e31c5802" width="95%"> | <video src="https://github.com/user-attachments/assets/a8da0a1b-ba7d-45a4-a901-5d213ceaf50e" width="100%"> </video> |
|
38 |
-
-->
|
39 |
-
<!-- ### Customizable I2V LoRA Demo
|
40 |
-
|
41 |
-
| I2V Lora Effect | Reference Image | Generated Video |
|
42 |
-
|:---------------:|:--------------------------------:|:----------------:|
|
43 |
-
| Hair growth | <img src="./assets/demo/i2v_lora/imgs/hair_growth.png" width="40%"> | <video src="https://github.com/user-attachments/assets/06b998ae-bbde-4c1f-96cb-a25a9197d5cb" width="100%"> </video> |
|
44 |
-
| Embrace | <img src="./assets/demo/i2v_lora/imgs/embrace.png" width="40%"> | <video src="https://github.com/user-attachments/assets/f8c99eb1-2a43-489a-ba02-6bd50a6dd260" width="100%" > </video> |
|
45 |
-
<!-- | Hair growth | <img src="./assets/demo/i2v_lora/imgs/hair_growth.png" width="40%"> | <video src="https://github.com/user-attachments/assets/06b998ae-bbde-4c1f-96cb-a25a9197d5cb" width="100%" poster="./assets/demo/i2v_lora/imgs/hair_growth.png"> </video> |
|
46 |
-
| Embrace | <img src="./assets/demo/i2v_lora/imgs/embrace.png" width="40%"> | <video src="https://github.com/user-attachments/assets/f8c99eb1-2a43-489a-ba02-6bd50a6dd260" width="100%" poster="./assets/demo/i2v_lora/imgs/hair_growth.png"> </video> | -->
|
47 |
-
|
48 |
-
<!-- ## 𧩠Community Contributions -->
|
49 |
-
|
50 |
-
<!-- If you develop/use HunyuanVideo-I2V in your projects, welcome to let us know. -->
|
51 |
-
|
52 |
-
<!-- - ComfyUI-Kijai (FP8 Inference, V2V and IP2V Generation): [ComfyUI-HunyuanVideoWrapper](https://github.com/kijai/ComfyUI-HunyuanVideoWrapper) by [Kijai](https://github.com/kijai) -->
|
53 |
-
<!-- - ComfyUI-Native (Native Support): [ComfyUI-HunyuanVideo](https://comfyanonymous.github.io/ComfyUI_examples/hunyuan_video/) by [ComfyUI Official](https://github.com/comfyanonymous/ComfyUI) -->
|
54 |
-
|
55 |
-
<!-- - FastVideo (Consistency Distilled Model and Sliding Tile Attention): [FastVideo](https://github.com/hao-ai-lab/FastVideo) and [Sliding Tile Attention](https://hao-ai-lab.github.io/blogs/sta/) by [Hao AI Lab](https://hao-ai-lab.github.io/)
|
56 |
-
- HunyuanVideo-gguf (GGUF Version and Quantization): [HunyuanVideo-gguf](https://huggingface.co/city96/HunyuanVideo-gguf) by [city96](https://huggingface.co/city96)
|
57 |
-
- Enhance-A-Video (Better Generated Video for Free): [Enhance-A-Video](https://github.com/NUS-HPC-AI-Lab/Enhance-A-Video) by [NUS-HPC-AI-Lab](https://ai.comp.nus.edu.sg/)
|
58 |
-
- TeaCache (Cache-based Accelerate): [TeaCache](https://github.com/LiewFeng/TeaCache) by [Feng Liu](https://github.com/LiewFeng)
|
59 |
-
- HunyuanVideoGP (GPU Poor version): [HunyuanVideoGP](https://github.com/deepbeepmeep/HunyuanVideoGP) by [DeepBeepMeep](https://github.com/deepbeepmeep)
|
60 |
-
-->
|
61 |
-
|
62 |
-
|
63 |
-
|
64 |
## π Open-source Plan
|
65 |
- HunyuanVideo-I2V (Image-to-Video Model)
|
66 |
- [x] Inference
|
67 |
- [x] Checkpoints
|
68 |
- [x] ComfyUI
|
69 |
- [x] Lora training scripts
|
70 |
-
- [
|
71 |
-
- [ ] Diffusers
|
72 |
-
- [ ] FP8 Quantified weight
|
73 |
|
74 |
## Contents
|
75 |
- [**HunyuanVideo-I2V** π
](#hunyuanvideo-i2v-)
|
@@ -91,6 +59,8 @@ This repo contains offical PyTorch model definitions, pre-trained weights and in
|
|
91 |
- [Training data construction](#training-data-construction)
|
92 |
- [Training](#training)
|
93 |
- [Inference](#inference)
|
|
|
|
|
94 |
- [π BibTeX](#-bibtex)
|
95 |
- [Acknowledgements](#acknowledgements)
|
96 |
---
|
@@ -107,6 +77,7 @@ The overall architecture of our system is designed to maximize the synergy betwe
|
|
107 |
|
108 |
|
109 |
|
|
|
110 |
## π Requirements
|
111 |
|
112 |
The following table shows the requirements for running HunyuanVideo-I2V model (batch size = 1) to generate videos:
|
@@ -153,6 +124,9 @@ python -m pip install -r requirements.txt
|
|
153 |
# 5. Install flash attention v2 for acceleration (requires CUDA 11.8 or above)
|
154 |
python -m pip install ninja
|
155 |
python -m pip install git+https://github.com/Dao-AILab/[email protected]
|
|
|
|
|
|
|
156 |
```
|
157 |
|
158 |
In case of running into float point exception(core dump) on the specific GPU type, you may try the following solutions:
|
@@ -167,8 +141,8 @@ Additionally, HunyuanVideo-I2V also provides a pre-built Docker image. Use the f
|
|
167 |
|
168 |
```shell
|
169 |
# For CUDA 12.4 (updated to avoid float point exception)
|
170 |
-
docker pull hunyuanvideo/hunyuanvideo-i2v:
|
171 |
-
docker run -itd --gpus all --init --net=host --uts=host --ipc=host --name hunyuanvideo-i2v --security-opt=seccomp=unconfined --ulimit=stack=67108864 --ulimit=memlock=-1 --privileged hunyuanvideo/hunyuanvideo-i2v:
|
172 |
```
|
173 |
|
174 |
|
@@ -341,6 +315,98 @@ We list some lora specific configurations for easy usage:
|
|
341 |
| `--lora-scale` | 1.0 | Fusion scale for lora model. |
|
342 |
| `--lora-path` | "" | Weight path for lora model. |
|
343 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
344 |
|
345 |
## π BibTeX
|
346 |
|
@@ -365,6 +431,8 @@ We would like to thank the contributors to the [SD3](https://huggingface.co/stab
|
|
365 |
Additionally, we also thank the Tencent Hunyuan Multimodal team for their help with the text encoder.
|
366 |
|
367 |
|
|
|
|
|
368 |
<!-- ## Github Star History
|
369 |
<a href="https://star-history.com/#Tencent/HunyuanVideo&Date">
|
370 |
<picture>
|
|
|
25 |
|
26 |
|
27 |
## π₯π₯π₯ News!!
|
28 |
+
* Mar 13, 2025: π We release the parallel inference code for HunyuanVideo-I2V powered by [xDiT](https://github.com/xdit-project/xDiT).
|
29 |
* Mar 07, 2025: π₯ We have fixed the bug in our open-source version that caused ID changes. Please try the new model weights of [HunyuanVideo-I2V](https://huggingface.co/tencent/HunyuanVideo-I2V) to ensure full visual consistency in the first frame and produce higher quality videos.
|
30 |
* Mar 06, 2025: π We release the inference code and model weights of HunyuanVideo-I2V. [Download](https://github.com/Tencent/HunyuanVideo-I2V/blob/main/ckpts/README.md).
|
31 |
|
32 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
33 |
## π Open-source Plan
|
34 |
- HunyuanVideo-I2V (Image-to-Video Model)
|
35 |
- [x] Inference
|
36 |
- [x] Checkpoints
|
37 |
- [x] ComfyUI
|
38 |
- [x] Lora training scripts
|
39 |
+
- [x] Multi-gpus Sequence Parallel inference (Faster inference speed on more gpus)
|
40 |
+
- [ ] Diffusers
|
|
|
41 |
|
42 |
## Contents
|
43 |
- [**HunyuanVideo-I2V** π
](#hunyuanvideo-i2v-)
|
|
|
59 |
- [Training data construction](#training-data-construction)
|
60 |
- [Training](#training)
|
61 |
- [Inference](#inference)
|
62 |
+
- [π Parallel Inference on Multiple GPUs by xDiT](#-parallel-inference-on-multiple-gpus-by-xdit)
|
63 |
+
- [Using Command Line](#using-command-line-1)
|
64 |
- [π BibTeX](#-bibtex)
|
65 |
- [Acknowledgements](#acknowledgements)
|
66 |
---
|
|
|
77 |
|
78 |
|
79 |
|
80 |
+
|
81 |
## π Requirements
|
82 |
|
83 |
The following table shows the requirements for running HunyuanVideo-I2V model (batch size = 1) to generate videos:
|
|
|
124 |
# 5. Install flash attention v2 for acceleration (requires CUDA 11.8 or above)
|
125 |
python -m pip install ninja
|
126 |
python -m pip install git+https://github.com/Dao-AILab/[email protected]
|
127 |
+
|
128 |
+
# 6. Install xDiT for parallel inference (It is recommended to use torch 2.4.0 and flash-attn 2.6.3)
|
129 |
+
python -m pip install xfuser==0.4.0
|
130 |
```
|
131 |
|
132 |
In case of running into float point exception(core dump) on the specific GPU type, you may try the following solutions:
|
|
|
141 |
|
142 |
```shell
|
143 |
# For CUDA 12.4 (updated to avoid float point exception)
|
144 |
+
docker pull hunyuanvideo/hunyuanvideo-i2v:cuda12
|
145 |
+
docker run -itd --gpus all --init --net=host --uts=host --ipc=host --name hunyuanvideo-i2v --security-opt=seccomp=unconfined --ulimit=stack=67108864 --ulimit=memlock=-1 --privileged hunyuanvideo/hunyuanvideo-i2v:cuda12
|
146 |
```
|
147 |
|
148 |
|
|
|
315 |
| `--lora-scale` | 1.0 | Fusion scale for lora model. |
|
316 |
| `--lora-path` | "" | Weight path for lora model. |
|
317 |
|
318 |
+
## π Parallel Inference on Multiple GPUs by xDiT
|
319 |
+
|
320 |
+
[xDiT](https://github.com/xdit-project/xDiT) is a Scalable Inference Engine for Diffusion Transformers (DiTs) on multi-GPU Clusters.
|
321 |
+
It has successfully provided low-latency parallel inference solutions for a variety of DiTs models, including mochi-1, CogVideoX, Flux.1, SD3, etc. This repo adopted the [Unified Sequence Parallelism (USP)](https://arxiv.org/abs/2405.07719) APIs for parallel inference of the HunyuanVideo-I2V model.
|
322 |
+
|
323 |
+
### Using Command Line
|
324 |
+
|
325 |
+
For example, to generate a video with 8 GPUs, you can use the following command:
|
326 |
+
|
327 |
+
```bash
|
328 |
+
cd HunyuanVideo-I2V
|
329 |
+
|
330 |
+
torchrun --nproc_per_node=8 sample_image2video.py \
|
331 |
+
--model HYVideo-T/2 \
|
332 |
+
--prompt "An Asian man with short hair in black tactical uniform and white clothes waves a firework stick." \
|
333 |
+
--i2v-mode \
|
334 |
+
--i2v-image-path ./assets/demo/i2v/imgs/0.jpg \
|
335 |
+
--i2v-resolution 720p \
|
336 |
+
--i2v-stability \
|
337 |
+
--infer-steps 50 \
|
338 |
+
--video-length 129 \
|
339 |
+
--flow-reverse \
|
340 |
+
--flow-shift 7.0 \
|
341 |
+
--seed 0 \
|
342 |
+
--embedded-cfg-scale 6.0 \
|
343 |
+
--save-path ./results \
|
344 |
+
--ulysses-degree 8 \
|
345 |
+
--ring-degree 1 \
|
346 |
+
--video-size 1280 720 \
|
347 |
+
--xdit-adaptive-size
|
348 |
+
```
|
349 |
+
|
350 |
+
You can change the `--ulysses-degree` and `--ring-degree` to control the parallel configurations for the best performance.
|
351 |
+
Note that you need to set `--video-size` since xDiT's acceleration mechanism has requirements for the size of the video to be generated.
|
352 |
+
To prevent black padding after converting the original image height/width to the target height/width, you can use `--xdit-adaptive-size`.
|
353 |
+
The valid parallel configurations are shown in the following table.
|
354 |
+
|
355 |
+
<details>
|
356 |
+
<summary>Supported Parallel Configurations (Click to expand)</summary>
|
357 |
+
|
358 |
+
| --video-size | --video-length | --ulysses-degree x --ring-degree | --nproc_per_node |
|
359 |
+
|----------------------|----------------|----------------------------------|------------------|
|
360 |
+
| 1280 720 or 720 1280 | 129 | 8x1,4x2,2x4,1x8 | 8 |
|
361 |
+
| 1280 720 or 720 1280 | 129 | 1x5 | 5 |
|
362 |
+
| 1280 720 or 720 1280 | 129 | 4x1,2x2,1x4 | 4 |
|
363 |
+
| 1280 720 or 720 1280 | 129 | 3x1,1x3 | 3 |
|
364 |
+
| 1280 720 or 720 1280 | 129 | 2x1,1x2 | 2 |
|
365 |
+
| 1104 832 or 832 1104 | 129 | 4x1,2x2,1x4 | 4 |
|
366 |
+
| 1104 832 or 832 1104 | 129 | 3x1,1x3 | 3 |
|
367 |
+
| 1104 832 or 832 1104 | 129 | 2x1,1x2 | 2 |
|
368 |
+
| 960 960 | 129 | 6x1,3x2,2x3,1x6 | 6 |
|
369 |
+
| 960 960 | 129 | 4x1,2x2,1x4 | 4 |
|
370 |
+
| 960 960 | 129 | 3x1,1x3 | 3 |
|
371 |
+
| 960 960 | 129 | 1x2,2x1 | 2 |
|
372 |
+
| 960 544 or 544 960 | 129 | 6x1,3x2,2x3,1x6 | 6 |
|
373 |
+
| 960 544 or 544 960 | 129 | 4x1,2x2,1x4 | 4 |
|
374 |
+
| 960 544 or 544 960 | 129 | 3x1,1x3 | 3 |
|
375 |
+
| 960 544 or 544 960 | 129 | 1x2,2x1 | 2 |
|
376 |
+
| 832 624 or 624 832 | 129 | 4x1,2x2,1x4 | 4 |
|
377 |
+
| 624 832 or 624 832 | 129 | 3x1,1x3 | 3 |
|
378 |
+
| 832 624 or 624 832 | 129 | 2x1,1x2 | 2 |
|
379 |
+
| 720 720 | 129 | 1x5 | 5 |
|
380 |
+
| 720 720 | 129 | 3x1,1x3 | 3 |
|
381 |
+
|
382 |
+
</details>
|
383 |
+
|
384 |
+
|
385 |
+
<p align="center">
|
386 |
+
<table align="center">
|
387 |
+
<thead>
|
388 |
+
<tr>
|
389 |
+
<th colspan="4">Latency (Sec) for 1280x720 (129 frames 50 steps) on 8xGPU</th>
|
390 |
+
</tr>
|
391 |
+
<tr>
|
392 |
+
<th>1</th>
|
393 |
+
<th>2</th>
|
394 |
+
<th>4</th>
|
395 |
+
<th>8</th>
|
396 |
+
</tr>
|
397 |
+
</thead>
|
398 |
+
<tbody>
|
399 |
+
<tr>
|
400 |
+
<th>1904.08</th>
|
401 |
+
<th>934.09 (2.04x)</th>
|
402 |
+
<th>514.08 (3.70x)</th>
|
403 |
+
<th>337.58 (5.64x)</th>
|
404 |
+
</tr>
|
405 |
+
|
406 |
+
</tbody>
|
407 |
+
</table>
|
408 |
+
</p>
|
409 |
+
|
410 |
|
411 |
## π BibTeX
|
412 |
|
|
|
431 |
Additionally, we also thank the Tencent Hunyuan Multimodal team for their help with the text encoder.
|
432 |
|
433 |
|
434 |
+
|
435 |
+
|
436 |
<!-- ## Github Star History
|
437 |
<a href="https://star-history.com/#Tencent/HunyuanVideo&Date">
|
438 |
<picture>
|