James Zhou commited on
Commit
9c301e6
Β·
1 Parent(s): 1062761

[update] readme

Browse files
Files changed (1) hide show
  1. README.md +20 -10
README.md CHANGED
@@ -28,7 +28,7 @@ extra_gated_eu_disallowed: true
28
 
29
  [![Project Page](https://img.shields.io/badge/🌐_Project-Page-green.svg?style=for-the-badge)](https://szczesnys.github.io/hunyuanvideo-foley)
30
  [![Code](https://img.shields.io/badge/Code-GitHub-blue.svg?style=for-the-badge&logo=github)](https://github.com/Tencent-Hunyuan/HunyuanVideo-Foley)
31
- [![Paper](https://img.shields.io/badge/Paper-arXiv-red.svg?style=for-the-badge&logo=arxiv)](https://arxiv.org/abs/2506.17201)
32
  [![Model](https://img.shields.io/badge/Model-Huggingface-yellow.svg?style=for-the-badge&logo=huggingface)](https://huggingface.co/tencent/HunyuanVideo-Foley)
33
 
34
  </div>
@@ -291,8 +291,14 @@ pip install -r requirements.txt
291
 
292
  <div style="background: #d1ecf1; padding: 15px; border-radius: 8px; border-left: 4px solid #17a2b8; margin: 10px 0;">
293
 
294
- πŸ”— **Model weights and detailed download instructions will be available soon!**
295
- <!-- The details of download pretrained models are shown [here](ckpts/README.md). -->
 
 
 
 
 
 
296
 
297
  </div>
298
 
@@ -311,7 +317,7 @@ Generate Foley audio for a single video file with text description:
311
 
312
  ```bash
313
  python3 infer.py \
314
- --model-path MODEL_PATH_DIR \
315
  --config_path ./configs/hunyuanvideo-foley-xxl.yaml \
316
  --single_video video_path \
317
  --single_prompt "audio description" \
@@ -328,7 +334,7 @@ Process multiple videos using a CSV file with video paths and descriptions:
328
 
329
  ```bash
330
  python3 infer.py \
331
- --model_path MODEL_PATH_DIR \
332
  --config_path ./configs/hunyuanvideo-foley-xxl.yaml \
333
  --csv_path assets/test.csv \
334
  --output_dir OUTPUT_DIR
@@ -343,6 +349,7 @@ Launch a user-friendly Gradio web interface for easy interaction:
343
  </div>
344
 
345
  ```bash
 
346
  python3 gradio_app.py
347
  ```
348
 
@@ -363,11 +370,14 @@ If you find **HunyuanVideo-Foley** useful for your research, please consider cit
363
  </div>
364
 
365
  ```bibtex
366
- @article{hunyuanvideo-foley2025,
367
- title={HunyuanVideo-Foley: Multimodal Diffusion with Representation Alignment for High-Fidelity Foley Audio Generation},
368
- author={Sizhe Shan and Qiulin Li and Yutao Cui and Miles Yang and Zhao Zhong and Yuehai Wang and Qun Yang and Jin Zhou},
369
- journal={arXiv preprint arXiv:2506.17201},
370
- year={2025}
 
 
 
371
  }
372
  ```
373
 
 
28
 
29
  [![Project Page](https://img.shields.io/badge/🌐_Project-Page-green.svg?style=for-the-badge)](https://szczesnys.github.io/hunyuanvideo-foley)
30
  [![Code](https://img.shields.io/badge/Code-GitHub-blue.svg?style=for-the-badge&logo=github)](https://github.com/Tencent-Hunyuan/HunyuanVideo-Foley)
31
+ [![Paper](https://img.shields.io/badge/Paper-arXiv-red.svg?style=for-the-badge&logo=arxiv)](https://arxiv.org/abs/2508.16930)
32
  [![Model](https://img.shields.io/badge/Model-Huggingface-yellow.svg?style=for-the-badge&logo=huggingface)](https://huggingface.co/tencent/HunyuanVideo-Foley)
33
 
34
  </div>
 
291
 
292
  <div style="background: #d1ecf1; padding: 15px; border-radius: 8px; border-left: 4px solid #17a2b8; margin: 10px 0;">
293
 
294
+ πŸ”— **Download Model weights from Huggingface**
295
+ ```bash
296
+ # using git-lfs
297
+ git clone https://huggingface.co/tencent/HunyuanVideo-Foley
298
+
299
+ # using huggingface-cli
300
+ huggingface-cli download tencent/HunyuanVideo-Foley
301
+ ```
302
 
303
  </div>
304
 
 
317
 
318
  ```bash
319
  python3 infer.py \
320
+ --model_path PRETRAINED_MODEL_PATH_DIR \
321
  --config_path ./configs/hunyuanvideo-foley-xxl.yaml \
322
  --single_video video_path \
323
  --single_prompt "audio description" \
 
334
 
335
  ```bash
336
  python3 infer.py \
337
+ --model_path PRETRAINED_MODEL_PATH_DIR \
338
  --config_path ./configs/hunyuanvideo-foley-xxl.yaml \
339
  --csv_path assets/test.csv \
340
  --output_dir OUTPUT_DIR
 
349
  </div>
350
 
351
  ```bash
352
+ export HIFI_FOLEY_MODEL_PATH=PRETRAINED_MODEL_PATH_DIR
353
  python3 gradio_app.py
354
  ```
355
 
 
370
  </div>
371
 
372
  ```bibtex
373
+ @misc{shan2025hunyuanvideofoleymultimodaldiffusionrepresentation,
374
+ title={HunyuanVideo-Foley: Multimodal Diffusion with Representation Alignment for High-Fidelity Foley Audio Generation},
375
+ author={Sizhe Shan and Qiulin Li and Yutao Cui and Miles Yang and Yuehai Wang and Qun Yang and Jin Zhou and Zhao Zhong},
376
+ year={2025},
377
+ eprint={2508.16930},
378
+ archivePrefix={arXiv},
379
+ primaryClass={eess.AS},
380
+ url={https://arxiv.org/abs/2508.16930},
381
  }
382
  ```
383