Intro
For the 99 recordings, silence is first removed, which is done based on the annotation, targeting the parts where there is no technique annotation. Then all recordings are uniformly segmented into fixed-length segments of 3 seconds. After segmentation, clips shorter than 3 seconds are zero padded. This padding approach, unlike circular padding, is adopted specifically for frame-level detection tasks to prevent the introduction of extraneous information. Regarding the dataset split, since the dataset consists of 99 recordings, we split it at the recording level. The data is partitioned into training, validation, and testing subsets in a 79:10:10 ratio, roughly 8:1:1.
Demo
https://huggingface.co/spaces/ccmusic-database/Guzheng_Tech99
Usage
from modelscope import snapshot_download
model_dir = snapshot_download("ccmusic-database/Guzheng_Tech99")
Maintenance
git clone [email protected]:ccmusic-database/Guzheng_Tech99
cd Guzheng_Tech99
Results
Backbone | Mel | CQT | Chroma |
---|---|---|---|
ViT-B-16 | 0.705 | 0.518 | 0.508 |
Swin-T | 0.849 | 0.783 | 0.766 |
VGG19 | 0.862 | 0.799 | 0.665 |
EfficientNet-V2-L | 0.783 | 0.812 | 0.697 |
ConvNeXt-B | 0.849 | 0.849 | 0.805 |
ResNet101 | 0.638 | 0.830 | 0.707 |
SqueezeNet1.1 | 0.831 | 0.814 | 0.780 |
Average | 0.788 | 0.772 | 0.704 |
Dataset
https://huggingface.co/datasets/ccmusic-database/Guzheng_Tech99
Mirror
https://www.modelscope.cn/models/ccmusic-database/Guzheng_Tech99
Evaluation
https://github.com/monetjoe/ccmusic_eval/tree/tech99
Cite
@dataset{zhaorui_liu_2021_5676893,
author = {Monan Zhou, Shenyang Xu, Zhaorui Liu, Zhaowen Wang, Feng Yu, Wei Li and Baoqiang Han},
title = {CCMusic: an Open and Diverse Database for Chinese Music Information Retrieval Research},
month = {mar},
year = {2024},
publisher = {HuggingFace},
version = {1.2},
url = {https://huggingface.co/ccmusic-database}
}