Update README.md
Browse files
README.md
CHANGED
@@ -8,17 +8,15 @@ pipeline_tag: text-to-audio
|
|
8 |
tags:
|
9 |
- text-to-audio
|
10 |
---
|
11 |
-
#
|
12 |
|
13 |
-
**
|
14 |
|
15 |
-
📣 We are releasing [**Tango-Full-FT-Audiocaps**](https://huggingface.co/declare-lab/tango-full-ft-audiocaps) which was first pre-trained on [**TangoPromptBank**](https://huggingface.co/datasets/declare-lab/TangoPromptBank), a collection of diverse text, audio pairs. We later fine tuned this checkpoint on AudioCaps. This checkpoint obtained state-of-the-art results for text-to-audio generation on AudioCaps.
|
16 |
|
17 |
## Code
|
18 |
|
19 |
Our code is released here: [https://github.com/declare-lab/tango](https://github.com/declare-lab/tango)
|
20 |
|
21 |
-
We uploaded several **TANGO** generated samples here: [https://tango-web.github.io/](https://tango-web.github.io/)
|
22 |
|
23 |
Please follow the instructions in the repository for installation, usage and experiments.
|
24 |
|
@@ -63,10 +61,4 @@ prompts = [
|
|
63 |
]
|
64 |
audios = tango.generate_for_batch(prompts, samples=2)
|
65 |
```
|
66 |
-
This will generate two samples for each of the three text prompts.
|
67 |
-
|
68 |
-
## Limitations
|
69 |
-
|
70 |
-
TANGO is trained on the small AudioCaps dataset so it may not generate good audio samples related to concepts that it has not seen in training (e.g. _singing_). For the same reason, TANGO is not always able to finely control its generations over textual control prompts. For example, the generations from TANGO for prompts _Chopping tomatoes on a wooden table_ and _Chopping potatoes on a metal table_ are very similar. _Chopping vegetables on a table_ also produces similar audio samples. Training text-to-audio generation models on larger datasets is thus required for the model to learn the composition of textual concepts and varied text-audio mappings.
|
71 |
-
|
72 |
-
We are training another version of TANGO on larger datasets to enhance its generalization, compositional and controllable generation ability.
|
|
|
8 |
tags:
|
9 |
- text-to-audio
|
10 |
---
|
11 |
+
# Tango 2: Aligning Diffusion-based Text-to-Audio Generative Models through Direct Preference Optimization
|
12 |
|
13 |
+
🎵 We developed **Tango 2** building upon **Tango** for text-to-audio generation. **Tango 2** was initialized with the **Tango-full-ft** checkpoint and underwent alignment training using DPO on **audio-alpaca**, a dataset of pairwise audio preferences. 🎶
|
14 |
|
|
|
15 |
|
16 |
## Code
|
17 |
|
18 |
Our code is released here: [https://github.com/declare-lab/tango](https://github.com/declare-lab/tango)
|
19 |
|
|
|
20 |
|
21 |
Please follow the instructions in the repository for installation, usage and experiments.
|
22 |
|
|
|
61 |
]
|
62 |
audios = tango.generate_for_batch(prompts, samples=2)
|
63 |
```
|
64 |
+
This will generate two samples for each of the three text prompts.
|
|
|
|
|
|
|
|
|
|
|
|