first
Browse files- README.md +7 -11
- resource/data.png +0 -3
README.md
CHANGED
@@ -35,8 +35,11 @@ conda create -n mellow python=3.10.14 && \
|
|
35 |
conda activate mellow && \
|
36 |
pip install -r requirements.txt
|
37 |
```
|
38 |
-
|
39 |
-
|
|
|
|
|
|
|
40 |
|
41 |
## Usage
|
42 |
The MellowWrapper class allows easy interaction with the model. To use the wrapper, inputs required are:
|
@@ -63,8 +66,8 @@ mellow = Mellow(config="<choice of config>", model_path="<model weights", device
|
|
63 |
|
64 |
# setup mellow
|
65 |
mellow = MellowWrapper(
|
66 |
-
config="
|
67 |
-
model = "v0
|
68 |
device=device,
|
69 |
use_cuda=cuda,
|
70 |
)
|
@@ -89,13 +92,6 @@ response = mellow.generate(examples=examples, max_len=300, top_p=0.8, temperatur
|
|
89 |
print(f"\noutput: {response}")
|
90 |
```
|
91 |
|
92 |
-
## ReasonAQA
|
93 |
-
The composition of the ReasonAQA dataset is shown in Table. The training set is restricted to AudioCaps and Clotho audio files and the testing is performed on 6 tasks - Audio Entailment, Audio Difference, ClothoAQA, Clotho MCQ, Clotho Detail, AudioCaps MCQ and AudioCaps Detail.
|
94 |
-
|
95 |
-

|
96 |
-
- The ReasonAQA JSONs can be downloaded from Zenodo: [checkpoint](https://drive.google.com/file/d/1WPKgafYw2ZCifElEtHn_k3DkcVGjesqB/view?usp=sharing)
|
97 |
-
- The audio files can be downloaded from their respective hosting website: [Clotho](https://zenodo.org/records/4783391) and [AudioCaps](https://github.com/cdjkim/audiocaps)
|
98 |
-
|
99 |
## Limitation
|
100 |
With Mellow, we aim to showcase that small audio-language models can engage in reasoning. As a research prototype, Mellow has not been trained at scale on publicly available audio datasets, resulting in a limited understanding of audio concepts. Therefore, we advise caution when considering its use in production settings. Ultimately, we hope this work inspires researchers to explore small audio-language models for multitask capabilities, complementing ongoing research on general-purpose audio assistants.
|
101 |
|
|
|
35 |
conda activate mellow && \
|
36 |
pip install -r requirements.txt
|
37 |
```
|
38 |
+
|
39 |
+
2. To test the setup is complete, run:
|
40 |
+
```shell
|
41 |
+
python example.py
|
42 |
+
```
|
43 |
|
44 |
## Usage
|
45 |
The MellowWrapper class allows easy interaction with the model. To use the wrapper, inputs required are:
|
|
|
66 |
|
67 |
# setup mellow
|
68 |
mellow = MellowWrapper(
|
69 |
+
config="v0",
|
70 |
+
model = "v0",
|
71 |
device=device,
|
72 |
use_cuda=cuda,
|
73 |
)
|
|
|
92 |
print(f"\noutput: {response}")
|
93 |
```
|
94 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
95 |
## Limitation
|
96 |
With Mellow, we aim to showcase that small audio-language models can engage in reasoning. As a research prototype, Mellow has not been trained at scale on publicly available audio datasets, resulting in a limited understanding of audio concepts. Therefore, we advise caution when considering its use in production settings. Ultimately, we hope this work inspires researchers to explore small audio-language models for multitask capabilities, complementing ongoing research on general-purpose audio assistants.
|
97 |
|
resource/data.png
DELETED
Git LFS Details
|