ginipick commited on
Commit
1d9bce3
Β·
verified Β·
1 Parent(s): dbad390

Delete ORIGINAL_README.md

Browse files
Files changed (1) hide show
  1. ORIGINAL_README.md +0 -163
ORIGINAL_README.md DELETED
@@ -1,163 +0,0 @@
1
- <p align="center">
2
- <img src="./assets/logo/η™½εΊ•.png" width="400" />
3
- </p>
4
-
5
- <p align="center">
6
- <a href="https://map-yue.github.io/">Demo 🎢</a> &nbsp;|&nbsp; πŸ“‘ <a href="">Paper (coming soon)</a>
7
- <br>
8
- <a href="https://huggingface.co/m-a-p/YuE-s1-7B-anneal-en-cot">YuE-s1-7B-anneal-en-cot πŸ€—</a> &nbsp;|&nbsp; <a href="https://huggingface.co/m-a-p/YuE-s1-7B-anneal-en-icl">YuE-s1-7B-anneal-en-icl πŸ€—</a> &nbsp;|&nbsp; <a href="https://huggingface.co/m-a-p/YuE-s1-7B-anneal-jp-kr-cot">YuE-s1-7B-anneal-jp-kr-cot πŸ€—</a>
9
- <br>
10
- <a href="https://huggingface.co/m-a-p/YuE-s1-7B-anneal-jp-kr-icl">YuE-s1-7B-anneal-jp-kr-icl πŸ€—</a> &nbsp;|&nbsp; <a href="https://huggingface.co/m-a-p/YuE-s1-7B-anneal-zh-cot">YuE-s1-7B-anneal-zh-cot πŸ€—</a> &nbsp;|&nbsp; <a href="https://huggingface.co/m-a-p/YuE-s1-7B-anneal-zh-icl">YuE-s1-7B-anneal-zh-icl πŸ€—</a>
11
- <br>
12
- <a href="https://huggingface.co/m-a-p/YuE-s2-1B-general">YuE-s2-1B-general πŸ€—</a> &nbsp;|&nbsp; <a href="https://huggingface.co/m-a-p/YuE-upsampler">YuE-upsampler πŸ€—</a>
13
- </p>
14
-
15
- ---
16
- Our model's name is **YuE (乐)**. In Chinese, the word means "music" and "happiness." Some of you may find words that start with Yu hard to pronounce. If so, you can just call it "yeah." We wrote a song with our model's name.
17
-
18
- <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/6555e8d8a0c34cd61a6b9ce3/rG-ELxMyzDU7zH-inB9DV.mpga"></audio>
19
-
20
- YuE is a groundbreaking series of open-source foundation models designed for music generation, specifically for transforming lyrics into full songs (lyrics2song). It can generate a complete song, lasting several minutes, that includes both a catchy vocal track and complementary accompaniment, ensuring a polished and cohesive result. YuE is capable of modeling diverse genres/vocal styles. Below are examples of songs in the pop and metal genres. For more styles, please visit the demo page.
21
-
22
- Pop:Quiet Evening
23
- <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/640701cb4dc5f2846c91d4eb/gnBULaFjcUyXYzzIwXLZq.mpga"></audio>
24
- Metal: Step Back
25
- <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/6555e8d8a0c34cd61a6b9ce3/kmCwl4GRS70UYDEELL-Tn.mpga"></audio>
26
-
27
- ## News and Updates
28
-
29
- * **2025.01.26 πŸ”₯**: We have released the **YuE** series.
30
-
31
- <br>
32
-
33
- ## Requirements
34
-
35
- Python >=3.8 is recommended.
36
-
37
- Install dependencies with the following command:
38
-
39
- ```
40
- pip install -r requirements.txt
41
- ```
42
-
43
- ### **Important: Install FlashAttention 2**
44
- For saving GPU memory, **FlashAttention 2 is mandatory**. Without it, large sequence lengths will lead to out-of-memory (OOM) errors, especially on GPUs with limited memory. Install it using the following command:
45
- ```
46
- pip install flash-attn --no-build-isolation
47
- ```
48
- Before installing FlashAttention, ensure that your CUDA environment is correctly set up.
49
- For example, if you are using CUDA 11.8:
50
- - If using a module system:
51
- ``` module load cuda11.8/toolkit/11.8.0 ```
52
- - Or manually configure CUDA in your shell:
53
- ```
54
- export PATH=/usr/local/cuda-11.8/bin:$PATH
55
- export LD_LIBRARY_PATH=/usr/local/cuda-11.8/lib64:$LD_LIBRARY_PATH
56
- ```
57
-
58
- ---
59
-
60
- ## GPU Memory Usage and Sessions
61
-
62
- YuE requires significant GPU memory for generating long sequences. Below are the recommended configurations:
63
-
64
- - **For GPUs with 24GB memory or less**: Run **up to 2 sessions** concurrently to avoid out-of-memory (OOM) errors.
65
- - **For full song generation** (many sessions, e.g., 4 or more): Use **GPUs with at least 80GB memory**. This can be achieved by combining multiple GPUs and enabling tensor parallelism.
66
-
67
- To customize the number of sessions, the interface allows you to specify the desired session count. By default, the model runs **2 sessions** for optimal memory usage.
68
-
69
- ---
70
-
71
- ## Quickstart
72
-
73
- ```
74
- # Make sure you have git-lfs installed (https://git-lfs.com)
75
- git lfs install
76
- git clone https://github.com/multimodal-art-projection/YuE.git
77
-
78
- cd YuE/inference/
79
- git clone https://huggingface.co/m-a-p/xcodec_mini_infer
80
- ```
81
-
82
- Here’s a quick guide to help you generate music with **YuE** using πŸ€— Transformers. Before running the code, make sure your environment is properly set up, and that all dependencies are installed.
83
-
84
- ### Running the Script
85
-
86
- In the following example, customize the `genres` and `lyrics` in the script, then execute it to generate a song with **YuE**.
87
-
88
- Notice: Set `--run_n_segments` to the number of lyric sections if you want to generate a full song. Additionally, you can increase `--stage2_batch_size` based on your available GPU memory.
89
-
90
- ```bash
91
- cd YuE/inference/
92
- python infer.py \
93
- --stage1_model m-a-p/YuE-s1-7B-anneal-en-cot \
94
- --stage2_model m-a-p/YuE-s2-1B-general \
95
- --genre_txt prompt_examples/genre.txt \
96
- --lyrics_txt prompt_examples/lyrics.txt \
97
- --run_n_segments 2 \
98
- --stage2_batch_size 4 \
99
- --output_dir ./output \
100
- --cuda_idx 0 \
101
- --max_new_tokens 3000
102
- ```
103
-
104
- If you want to use audio prompt, enable `--use_audio_prompt`, and provide audio prompt:
105
- ```bash
106
- cd YuE/inference/
107
- python infer.py \
108
- --stage1_model m-a-p/YuE-s1-7B-anneal-en-icl \
109
- --stage2_model m-a-p/YuE-s2-1B-general \
110
- --genre_txt prompt_examples/genre.txt \
111
- --lyrics_txt prompt_examples/lyrics.txt \
112
- --run_n_segments 2 \
113
- --stage2_batch_size 4 \
114
- --output_dir ./output \
115
- --cuda_idx 0 \
116
- --max_new_tokens 3000 \
117
- --audio_prompt_path {YOUR_AUDIO_FILE} \
118
- --prompt_start_time 0 \
119
- --prompt_end_time 30
120
- ```
121
-
122
-
123
- ---
124
-
125
- ### **Execution Time**
126
- On an **H800 GPU**, generating 30s audio takes **150 seconds**.
127
- On an **RTX 4090 GPU**, generating 30s audio takes approximately **360 seconds**.
128
-
129
- **Tips:**
130
- 1. `genres` should include details like instruments, genre, mood, vocal timbre, and vocal gender.
131
- 2. The length of `lyrics` segments and the `--max_new_tokens` value should be matched. For example, if `--max_new_tokens` is set to 3000, the maximum duration for a segment is around 30 seconds. Ensure your lyrics fit this time frame.
132
- 3. If using audio prompt,the duration around 30s will be fine.
133
- ---
134
-
135
- ### Notice
136
- 1. A suitable [Genre] tag consists of five components: genre, instrument, mood, gender, and timbre. All five should be included if possible, separated by spaces. The values of timbre should include "vocal" (e.g., "bright vocal").
137
-
138
- 2. Although our tags have an open vocabulary, we have provided the 200 most commonly used [tags](./wav_top_200_tags.json). It is recommended to select tags from this list for more stable results.
139
-
140
- 3. The order of the tags is flexible. For example, a stable genre control string might look like: "[Genre] inspiring female uplifting pop airy vocal electronic bright vocal vocal."
141
-
142
- 4. Additionally, we have introduced the "Mandarin" and "Cantonese" tags to distinguish between Mandarin and Cantonese, as their lyrics often share similarities.
143
-
144
- ## License Agreement
145
-
146
- Creative Commons Attribution Non Commercial 4.0
147
-
148
- ---
149
-
150
- ## Citation
151
-
152
- If you find our paper and code useful in your research, please consider giving a star :star: and citation :pencil: :)
153
-
154
- ```BibTeX
155
- @misc{yuan2025yue,
156
- title={YuE: Open Music Foundation Models for Full-Song Generation},
157
- author={Ruibin Yuan and Hanfeng Lin and Shawn Guo and Ge Zhang and Jiahao Pan and Yongyi Zang and Haohe Liu and Xingjian Du and Xeron Du and Zhen Ye and Tianyu Zheng and Yinghao Ma and Minghao Liu and Lijun Yu and Zeyue Tian and Ziya Zhou and Liumeng Xue and Xingwei Qu and Yizhi Li and Tianhao Shen and Ziyang Ma and Shangda Wu and Jun Zhan and Chunhui Wang and Yatian Wang and Xiaohuan Zhou and Xiaowei Chi and Xinyue Zhang and Zhenzhu Yang and Yiming Liang and Xiangzhou Wang and Shansong Liu and Lingrui Mei and Peng Li and Yong Chen and Chenghua Lin and Xie Chen and Gus Xia and Zhaoxiang Zhang and Chao Zhang and Wenhu Chen and Xinyu Zhou and Xipeng Qiu and Roger Dannenberg and Jiaheng Liu and Jian Yang and Stephen Huang and Wei Xue and Xu Tan and Yike Guo},
158
- howpublished={\url{https://github.com/multimodal-art-projection/YuE}},
159
- year={2025},
160
- note={GitHub repository}
161
- }
162
- ```
163
- <br>