ZeyuXie commited on
Commit
f5b1a19
·
verified ·
1 Parent(s): 28368fe

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -2
README.md CHANGED
@@ -2,7 +2,9 @@
2
  license: apache-2.0
3
  ---
4
  # PicoAudio: Enabling Precise Timing and Frequency Controllability of Audio Events in Text-to-audio Generation
5
- [![arXiv](https://img.shields.io/badge/arXiv-2308.05734-brightgreen.svg?style=flat-square)](https://arxiv.org/abs/2407.02869v2)[![githubio](https://img.shields.io/badge/GitHub.io-Audio_Samples-blue?logo=Github&style=flat-square)](https://zeyuxie29.github.io/PicoAudio.github.io/)[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/ZeyuXie/PicoAudio)
 
 
6
 
7
  **Bullet contribution**:
8
  * A data simulation pipeline tailored specifically for controllable audio generation frameworks;
@@ -28,4 +30,16 @@ where:
28
  * *"filepath"* indicates the path to the audio file.
29
  * *"frequencyCaption"* contains information about the occurrence frequency.
30
  * *"onoffCaption"* contains on- & off-set information.
31
- * For test file *"test-frequency-control_onoffFromGpt_{}.json"*, the *"onoffCaption"* is derived from *"frequencyCaption"* transformed by GPT-4, which is used for evaluation in the frequency control task.
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  license: apache-2.0
3
  ---
4
  # PicoAudio: Enabling Precise Timing and Frequency Controllability of Audio Events in Text-to-audio Generation
5
+ [![arXiv](https://img.shields.io/badge/arXiv-2308.05734-brightgreen.svg?style=flat-square)](https://arxiv.org/abs/2407.02869v2)
6
+ [![githubio](https://img.shields.io/badge/GitHub.io-Audio_Samples-blue?logo=Github&style=flat-square)](https://zeyuxie29.github.io/PicoAudio.github.io/)
7
+ [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/ZeyuXie/PicoAudio)
8
 
9
  **Bullet contribution**:
10
  * A data simulation pipeline tailored specifically for controllable audio generation frameworks;
 
30
  * *"filepath"* indicates the path to the audio file.
31
  * *"frequencyCaption"* contains information about the occurrence frequency.
32
  * *"onoffCaption"* contains on- & off-set information.
33
+ * For test file *"test-frequency-control_onoffFromGpt_{}.json"*, the *"onoffCaption"* is derived from *"frequencyCaption"* transformed by GPT-4, which is used for evaluation in the frequency control task.
34
+
35
+ ## Training
36
+ Download data into the *"data"* folder.
37
+ The training and inference code can be found in the *"picoaudio"* folder.
38
+ ```shell
39
+ cd picoaudio
40
+ pip install -r requirements.txt
41
+ ```
42
+ To start traning:
43
+ ```python
44
+ accelerate launch runner/controllable_train.py
45
+ ```