James Zhou
commited on
Commit
Β·
9c301e6
1
Parent(s):
1062761
[update] readme
Browse files
README.md
CHANGED
|
@@ -28,7 +28,7 @@ extra_gated_eu_disallowed: true
|
|
| 28 |
|
| 29 |
[](https://szczesnys.github.io/hunyuanvideo-foley)
|
| 30 |
[](https://github.com/Tencent-Hunyuan/HunyuanVideo-Foley)
|
| 31 |
-
[](https://arxiv.org/abs/
|
| 32 |
[](https://huggingface.co/tencent/HunyuanVideo-Foley)
|
| 33 |
|
| 34 |
</div>
|
|
@@ -291,8 +291,14 @@ pip install -r requirements.txt
|
|
| 291 |
|
| 292 |
<div style="background: #d1ecf1; padding: 15px; border-radius: 8px; border-left: 4px solid #17a2b8; margin: 10px 0;">
|
| 293 |
|
| 294 |
-
π **Model weights
|
| 295 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 296 |
|
| 297 |
</div>
|
| 298 |
|
|
@@ -311,7 +317,7 @@ Generate Foley audio for a single video file with text description:
|
|
| 311 |
|
| 312 |
```bash
|
| 313 |
python3 infer.py \
|
| 314 |
-
--
|
| 315 |
--config_path ./configs/hunyuanvideo-foley-xxl.yaml \
|
| 316 |
--single_video video_path \
|
| 317 |
--single_prompt "audio description" \
|
|
@@ -328,7 +334,7 @@ Process multiple videos using a CSV file with video paths and descriptions:
|
|
| 328 |
|
| 329 |
```bash
|
| 330 |
python3 infer.py \
|
| 331 |
-
--model_path
|
| 332 |
--config_path ./configs/hunyuanvideo-foley-xxl.yaml \
|
| 333 |
--csv_path assets/test.csv \
|
| 334 |
--output_dir OUTPUT_DIR
|
|
@@ -343,6 +349,7 @@ Launch a user-friendly Gradio web interface for easy interaction:
|
|
| 343 |
</div>
|
| 344 |
|
| 345 |
```bash
|
|
|
|
| 346 |
python3 gradio_app.py
|
| 347 |
```
|
| 348 |
|
|
@@ -363,11 +370,14 @@ If you find **HunyuanVideo-Foley** useful for your research, please consider cit
|
|
| 363 |
</div>
|
| 364 |
|
| 365 |
```bibtex
|
| 366 |
-
@
|
| 367 |
-
|
| 368 |
-
|
| 369 |
-
|
| 370 |
-
|
|
|
|
|
|
|
|
|
|
| 371 |
}
|
| 372 |
```
|
| 373 |
|
|
|
|
| 28 |
|
| 29 |
[](https://szczesnys.github.io/hunyuanvideo-foley)
|
| 30 |
[](https://github.com/Tencent-Hunyuan/HunyuanVideo-Foley)
|
| 31 |
+
[](https://arxiv.org/abs/2508.16930)
|
| 32 |
[](https://huggingface.co/tencent/HunyuanVideo-Foley)
|
| 33 |
|
| 34 |
</div>
|
|
|
|
| 291 |
|
| 292 |
<div style="background: #d1ecf1; padding: 15px; border-radius: 8px; border-left: 4px solid #17a2b8; margin: 10px 0;">
|
| 293 |
|
| 294 |
+
π **Download Model weights from Huggingface**
|
| 295 |
+
```bash
|
| 296 |
+
# using git-lfs
|
| 297 |
+
git clone https://huggingface.co/tencent/HunyuanVideo-Foley
|
| 298 |
+
|
| 299 |
+
# using huggingface-cli
|
| 300 |
+
huggingface-cli download tencent/HunyuanVideo-Foley
|
| 301 |
+
```
|
| 302 |
|
| 303 |
</div>
|
| 304 |
|
|
|
|
| 317 |
|
| 318 |
```bash
|
| 319 |
python3 infer.py \
|
| 320 |
+
--model_path PRETRAINED_MODEL_PATH_DIR \
|
| 321 |
--config_path ./configs/hunyuanvideo-foley-xxl.yaml \
|
| 322 |
--single_video video_path \
|
| 323 |
--single_prompt "audio description" \
|
|
|
|
| 334 |
|
| 335 |
```bash
|
| 336 |
python3 infer.py \
|
| 337 |
+
--model_path PRETRAINED_MODEL_PATH_DIR \
|
| 338 |
--config_path ./configs/hunyuanvideo-foley-xxl.yaml \
|
| 339 |
--csv_path assets/test.csv \
|
| 340 |
--output_dir OUTPUT_DIR
|
|
|
|
| 349 |
</div>
|
| 350 |
|
| 351 |
```bash
|
| 352 |
+
export HIFI_FOLEY_MODEL_PATH=PRETRAINED_MODEL_PATH_DIR
|
| 353 |
python3 gradio_app.py
|
| 354 |
```
|
| 355 |
|
|
|
|
| 370 |
</div>
|
| 371 |
|
| 372 |
```bibtex
|
| 373 |
+
@misc{shan2025hunyuanvideofoleymultimodaldiffusionrepresentation,
|
| 374 |
+
title={HunyuanVideo-Foley: Multimodal Diffusion with Representation Alignment for High-Fidelity Foley Audio Generation},
|
| 375 |
+
author={Sizhe Shan and Qiulin Li and Yutao Cui and Miles Yang and Yuehai Wang and Qun Yang and Jin Zhou and Zhao Zhong},
|
| 376 |
+
year={2025},
|
| 377 |
+
eprint={2508.16930},
|
| 378 |
+
archivePrefix={arXiv},
|
| 379 |
+
primaryClass={eess.AS},
|
| 380 |
+
url={https://arxiv.org/abs/2508.16930},
|
| 381 |
}
|
| 382 |
```
|
| 383 |
|