first
Browse files
README.md
CHANGED
@@ -1,5 +1,19 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
# Mellow
|
2 |
-
[[`Paper`]()] [[`Checkpoint`]()]
|
3 |
|
4 |
Mellow is a small Audio-Language Model that takes in two audios and a text prompt as input and produces free-form text as output. It is a 167M parameter model and trained on ~155 hours of audio (AudioCaps and Clotho), and achieves SoTA performance on different tasks with 50x fewer parameters.
|
5 |
|
@@ -79,7 +93,7 @@ print(f"\noutput: {response}")
|
|
79 |
The composition of the ReasonAQA dataset is shown in Table. The training set is restricted to AudioCaps and Clotho audio files and the testing is performed on 6 tasks - Audio Entailment, Audio Difference, ClothoAQA, Clotho MCQ, Clotho Detail, AudioCaps MCQ and AudioCaps Detail.
|
80 |
|
81 |

|
82 |
-
- The ReasonAQA JSONs can be downloaded from Zenodo: [checkpoint
|
83 |
- The audio files can be downloaded from their respective hosting website: [Clotho](https://zenodo.org/records/4783391) and [AudioCaps](https://github.com/cdjkim/audiocaps)
|
84 |
|
85 |
## Limitation
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
tags:
|
4 |
+
- small audio-language model
|
5 |
+
- ALM
|
6 |
+
- audio
|
7 |
+
- music
|
8 |
+
- sound events
|
9 |
+
- audio reasoning
|
10 |
+
- audio captioning
|
11 |
+
- audio question answering
|
12 |
+
- zero-shot
|
13 |
+
- audio-text
|
14 |
+
---
|
15 |
# Mellow
|
16 |
+
[[`Paper`]()] [[`GitHub`](https://github.com/soham97/Mellow)] [[`Checkpoint`](https://huggingface.co/soham97/Mellow)]
|
17 |
|
18 |
Mellow is a small Audio-Language Model that takes in two audios and a text prompt as input and produces free-form text as output. It is a 167M parameter model and trained on ~155 hours of audio (AudioCaps and Clotho), and achieves SoTA performance on different tasks with 50x fewer parameters.
|
19 |
|
|
|
93 |
The composition of the ReasonAQA dataset is shown in Table. The training set is restricted to AudioCaps and Clotho audio files and the testing is performed on 6 tasks - Audio Entailment, Audio Difference, ClothoAQA, Clotho MCQ, Clotho Detail, AudioCaps MCQ and AudioCaps Detail.
|
94 |
|
95 |

|
96 |
+
- The ReasonAQA JSONs can be downloaded from Zenodo: [checkpoint](https://drive.google.com/file/d/1WPKgafYw2ZCifElEtHn_k3DkcVGjesqB/view?usp=sharing)
|
97 |
- The audio files can be downloaded from their respective hosting website: [Clotho](https://zenodo.org/records/4783391) and [AudioCaps](https://github.com/cdjkim/audiocaps)
|
98 |
|
99 |
## Limitation
|