soham97 commited on
Commit
fce207b
·
1 Parent(s): 6b09558
Files changed (2) hide show
  1. README.md +7 -11
  2. resource/data.png +0 -3
README.md CHANGED
@@ -35,8 +35,11 @@ conda create -n mellow python=3.10.14 && \
35
  conda activate mellow && \
36
  pip install -r requirements.txt
37
  ```
38
- 2. Download Mellow weights: [checkpoint \[drive\]]()
39
- 3. Move the `v0.ckpt` under `config` folder
 
 
 
40
 
41
  ## Usage
42
  The MellowWrapper class allows easy interaction with the model. To use the wrapper, inputs required are:
@@ -63,8 +66,8 @@ mellow = Mellow(config="<choice of config>", model_path="<model weights", device
63
 
64
  # setup mellow
65
  mellow = MellowWrapper(
66
- config="conf.yaml",
67
- model = "v0.ckpt",
68
  device=device,
69
  use_cuda=cuda,
70
  )
@@ -89,13 +92,6 @@ response = mellow.generate(examples=examples, max_len=300, top_p=0.8, temperatur
89
  print(f"\noutput: {response}")
90
  ```
91
 
92
- ## ReasonAQA
93
- The composition of the ReasonAQA dataset is shown in Table. The training set is restricted to AudioCaps and Clotho audio files and the testing is performed on 6 tasks - Audio Entailment, Audio Difference, ClothoAQA, Clotho MCQ, Clotho Detail, AudioCaps MCQ and AudioCaps Detail.
94
-
95
- ![alt text](resource/data.png)
96
- - The ReasonAQA JSONs can be downloaded from Zenodo: [checkpoint](https://drive.google.com/file/d/1WPKgafYw2ZCifElEtHn_k3DkcVGjesqB/view?usp=sharing)
97
- - The audio files can be downloaded from their respective hosting website: [Clotho](https://zenodo.org/records/4783391) and [AudioCaps](https://github.com/cdjkim/audiocaps)
98
-
99
  ## Limitation
100
  With Mellow, we aim to showcase that small audio-language models can engage in reasoning. As a research prototype, Mellow has not been trained at scale on publicly available audio datasets, resulting in a limited understanding of audio concepts. Therefore, we advise caution when considering its use in production settings. Ultimately, we hope this work inspires researchers to explore small audio-language models for multitask capabilities, complementing ongoing research on general-purpose audio assistants.
101
 
 
35
  conda activate mellow && \
36
  pip install -r requirements.txt
37
  ```
38
+
39
+ 2. To test the setup is complete, run:
40
+ ```shell
41
+ python example.py
42
+ ```
43
 
44
  ## Usage
45
  The MellowWrapper class allows easy interaction with the model. To use the wrapper, inputs required are:
 
66
 
67
  # setup mellow
68
  mellow = MellowWrapper(
69
+ config="v0",
70
+ model = "v0",
71
  device=device,
72
  use_cuda=cuda,
73
  )
 
92
  print(f"\noutput: {response}")
93
  ```
94
 
 
 
 
 
 
 
 
95
  ## Limitation
96
  With Mellow, we aim to showcase that small audio-language models can engage in reasoning. As a research prototype, Mellow has not been trained at scale on publicly available audio datasets, resulting in a limited understanding of audio concepts. Therefore, we advise caution when considering its use in production settings. Ultimately, we hope this work inspires researchers to explore small audio-language models for multitask capabilities, complementing ongoing research on general-purpose audio assistants.
97
 
resource/data.png DELETED

Git LFS Details

  • SHA256: 0e4d4dc0b0699031235bea278f7a0dc226a767f3501718a1b6f7253c5e8f1682
  • Pointer size: 131 Bytes
  • Size of remote file: 492 kB