Spaces:
Sleeping
Sleeping
title: Music Descriptor CPU | |
emoji: 🚀 | |
colorFrom: green | |
colorTo: green | |
sdk: gradio | |
sdk_version: 5.1.0 | |
app_file: app.py | |
pinned: true | |
license: cc-by-nc-4.0 | |
short_description: CPU version | |
<!-- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference --> | |
# Demo Introduction | |
This is an example of using the [MERT-v1-95M](https://huggingface.co/m-a-p/MERT-v1-95M) model as backbone to conduct multiple music understanding tasks with the universal representation. | |
The tasks include EMO, GS, MTGInstrument, MTGGenre, MTGTop50, MTGMood, NSynthI, NSynthP, VocalSetS, VocalSetT. | |
More models can be referred at the [map organization page](https://huggingface.co/m-a-p). | |
# Known Issues | |
## Audio Format Support | |
Theorectically, all the audio formats supported by [torchaudio.load()](https://pytorch.org/audio/stable/torchaudio.html#torchaudio.load) can be used in the demo. Theese should include but not limited to `WAV, AMB, MP3, FLAC`. | |
## Audio Input Length | |
Due the **hardware limitation** of the machine hosting this demo (2 CPU and 16GB RAM) only **the first 4 seconds** of audio are used! | |
This issue is expected to solve in the future by applying more community-support GPU resources or using other audio encoding strategies. | |
In the current stage, if you want to directly run the demo with longer audios, you could clone this space and deploy with GPU. | |
The code will automatically use GPU for inference if there is GPU that can be detected by `torch.cuda.is_available()`. |