ZeyuXie
/

PicoAudio

ZeyuXie commited on Jul 18, 2024

Commit

f5b1a19

verified ·

1 Parent(s): 28368fe

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -2,7 +2,9 @@
 license: apache-2.0
 ---
 # PicoAudio: Enabling Precise Timing and Frequency Controllability of Audio Events in Text-to-audio Generation
-[![arXiv](https://img.shields.io/badge/arXiv-2308.05734-brightgreen.svg?style=flat-square)](https://arxiv.org/abs/2407.02869v2)[![githubio](https://img.shields.io/badge/GitHub.io-Audio_Samples-blue?logo=Github&style=flat-square)](https://zeyuxie29.github.io/PicoAudio.github.io/)[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/ZeyuXie/PicoAudio)
 **Bullet contribution**:
 * A data simulation pipeline tailored specifically for controllable audio generation frameworks;
@@ -28,4 +30,16 @@ where:
 * *"filepath"* indicates the path to the audio file.
 * *"frequencyCaption"* contains information about the occurrence frequency.
 * *"onoffCaption"* contains on- & off-set information.
-* For test file *"test-frequency-control_onoffFromGpt_{}.json"*, the *"onoffCaption"* is derived from *"frequencyCaption"* transformed by GPT-4, which is used for evaluation in the frequency control task.

 license: apache-2.0
 ---
 # PicoAudio: Enabling Precise Timing and Frequency Controllability of Audio Events in Text-to-audio Generation
+[![arXiv](https://img.shields.io/badge/arXiv-2308.05734-brightgreen.svg?style=flat-square)](https://arxiv.org/abs/2407.02869v2)
+[![githubio](https://img.shields.io/badge/GitHub.io-Audio_Samples-blue?logo=Github&style=flat-square)](https://zeyuxie29.github.io/PicoAudio.github.io/)
+[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/ZeyuXie/PicoAudio)
 **Bullet contribution**:
 * A data simulation pipeline tailored specifically for controllable audio generation frameworks;
 * *"filepath"* indicates the path to the audio file.
 * *"frequencyCaption"* contains information about the occurrence frequency.
 * *"onoffCaption"* contains on- & off-set information.
+* For test file *"test-frequency-control_onoffFromGpt_{}.json"*, the *"onoffCaption"* is derived from *"frequencyCaption"* transformed by GPT-4, which is used for evaluation in the frequency control task.
+## Training
+Download data into the *"data"* folder.
+The training and inference code can be found in the *"picoaudio"* folder.
+```shell
+cd picoaudio
+pip install -r requirements.txt
+```
+To start traning:
+```python
+  accelerate launch runner/controllable_train.py
+```