ZeyuXie commited on
Commit
bf889ce
·
verified ·
1 Parent(s): ad2d705

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +35 -3
README.md CHANGED
@@ -1,3 +1,35 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # PicoAudio: Enabling Precise Timing and Frequency Controllability of Audio Events in Text-to-audio Generation
2
+ [![arXiv](https://img.shields.io/badge/arXiv-2308.05734-brightgreen.svg?style=flat-square)](https://arxiv.org/abs/2407.02869v2)
3
+ [![githubio](https://img.shields.io/badge/GitHub.io-Audio_Samples-blue?logo=Github&style=flat-square)](https://zeyuxie29.github.io/PicoAudio.github.io/)
4
+ [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/ZeyuXie/PicoAudio)
5
+
6
+ **Bullet contribution**:
7
+ * A data simulation pipeline tailored specifically for controllable audio generation frameworks;
8
+ * Propose a timing-controllable audio generation framework, enabling precise control over the timing and frequency of sound event;
9
+ * Achieve any precise control related to timing by integrating of large language models.
10
+
11
+ ## Inference
12
+ You can see the demo on the website [Huggingface Online Inference](https://huggingface.co/spaces/ZeyuXie/PicoAudio) and [Github Demo](https://zeyuxie29.github.io/PicoAudio.github.io).
13
+ Or you can use the *"inference.py"* script provided by website [Huggingface Inference](https://huggingface.co/spaces/ZeyuXie/PicoAudio/tree/main) to generate.
14
+ Huggingface Online Inference uses Gemini as a preprocessor, and we also provide a GPT preprocessing script consistent with the paper in *"llm_preprocess.py"*
15
+
16
+ ## Simulated Dataset
17
+ Simulated data can be downloaded from [GoogleDrive](https://drive.google.com/file/d/1oez7kzFFhqU9JZQhqJdDshXrRQczBmlp/view?usp=sharing) or [BaiduNetDisk](https://pan.baidu.com/s/1rGrcjtQCEYFpr3o6y9wI8Q?pwd=pico) with the extraction code "pico".
18
+ The metadata is stored in *"data/meta_data/{}.json"*, one instance is as follows:
19
+ ```python
20
+ {
21
+ "filepath": "data/multi_event_test/syn_1.wav",
22
+ "onoffCaption": "cat meowing at 0.5-2.0, 3.0-4.5 and whistling at 5.0-6.5 and explosion at 7.0-8.0, 8.5-9.5",
23
+ "frequencyCaption": "cat meowing two times and whistling one times and explosion two times"
24
+ }
25
+ ```
26
+ where:
27
+ * *"filepath"* indicates the path to the audio file.
28
+ * *"frequencyCaption"* contains information about the occurrence frequency.
29
+ * *"onoffCaption"* contains on- & off-set information.
30
+ * For test file *"test-frequency-control_onoffFromGpt_{}.json"*, the *"onoffCaption"* is derived from *"frequencyCaption"* transformed by GPT-4, which is used for evaluation in the frequency control task.
31
+
32
+
33
+ ---
34
+ license: apache-2.0
35
+ ---