soham97
/

mellow

small audio-language model

audio reasoning

audio captioning

audio question answering

Model card Files Files and versions Community

soham97 commited on Mar 10

Commit

5bb426c

·

1 Parent(s): 0e6a16e

readme update

Files changed (2) hide show

README.md +1 -2
config.json +0 -0

README.md CHANGED Viewed

@@ -13,7 +13,7 @@ tags:
   - audio-text
 ---
 # Mellow
-[[`Paper`]()] [[`GitHub`](https://github.com/soham97/Mellow)] [[`🤗Checkpoint`](https://huggingface.co/soham97/Mellow)] [[`Zenodo`](https://huggingface.co/soham97/Mellow)]
 Mellow is a small Audio-Language Model that takes in two audios and a text prompt as input and produces free-form text as output. It is a 167M parameter model and trained on ~155 hours of audio (AudioCaps and Clotho), and achieves SoTA performance on different tasks with 50x fewer parameters.
@@ -62,7 +62,6 @@ from mellow import MellowWrapper
 # setup cuda and device
 cuda = torch.cuda.is_available()
 device = 0 if cuda else "cpu"
-mellow = Mellow(config="<choice of config>", model_path="<model weights", device=device, cuda=cuda)
 # setup mellow
 mellow = MellowWrapper(

   - audio-text
 ---
 # Mellow
+[[`📑Paper`]()] [[`⚙️GitHub`](https://github.com/soham97/Mellow)] [[`🤗Checkpoint`](https://huggingface.co/soham97/Mellow)] [[`📊Zenodo`](https://zenodo.org/records/15002886)]
 Mellow is a small Audio-Language Model that takes in two audios and a text prompt as input and produces free-form text as output. It is a 167M parameter model and trained on ~155 hours of audio (AudioCaps and Clotho), and achieves SoTA performance on different tasks with 50x fewer parameters.
 # setup cuda and device
 cuda = torch.cuda.is_available()
 device = 0 if cuda else "cpu"
 # setup mellow
 mellow = MellowWrapper(

config.json ADDED Viewed

File without changes