Audio Classification
Chinese
music
File size: 2,582 Bytes
dcdd1f4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fe8df31
297a25e
dcdd1f4
 
 
95893f0
dcdd1f4
 
 
 
 
95893f0
dcdd1f4
 
 
 
297a25e
 
 
 
dcdd1f4
297a25e
 
 
 
 
 
dcdd1f4
 
 
 
 
 
 
 
 
 
 
 
6fdcbce
a5b738b
821306e
 
044c1b7
 
 
821306e
044c1b7
821306e
 
dcdd1f4
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
---
license: mit
datasets:
- ccmusic-database/Guzheng_Tech99
language:
- zh
metrics:
- accuracy
pipeline_tag: audio-classification
tags:
- music
---

# Intro
For the 99 recordings, silence is first removed, which is done based on the annotation, targeting the parts where there is no technique annotation. Then all recordings are uniformly segmented into fixed-length segments of 3 seconds. After segmentation, clips shorter than 3 seconds are zero padded. This padding approach, unlike circular padding, is adopted specifically for frame-level detection tasks to prevent the introduction of extraneous information. Regarding the dataset split, since the dataset consists of 99 recordings, we split it at the recording level. The data is partitioned into training, validation, and testing subsets in a 79:10:10 ratio, roughly 8:1:1.

## Demo (inference code)
<https://huggingface.co/spaces/ccmusic-database/Guzheng_Tech99>

## Usage
```python
from huggingface_hub import snapshot_download
model_dir = snapshot_download("ccmusic-database/Guzheng_Tech99")
```

## Maintenance
```bash
GIT_LFS_SKIP_SMUDGE=1 git clone [email protected]:ccmusic-database/Guzheng_Tech99
cd Guzheng_Tech99
```

## Results
|     Backbone      |    Mel    |    CQT    |  Chroma   |
| :---------------: | :-------: | :-------: | :-------: |
|     ViT-B-16      |   0.705   |   0.518   |   0.508   |
|      Swin-T       | **0.849** | **0.783** | **0.766** |
|                   |           |           |           |
|       VGG19       | **0.862** |   0.799   |   0.665   |
| EfficientNet-V2-L |   0.783   |   0.812   |   0.697   |
|    ConvNeXt-B     |   0.849   | **0.849** | **0.805** |
|     ResNet101     |   0.638   |   0.830   |   0.707   |
|   SqueezeNet1.1   |   0.831   |   0.814   |   0.780   |
|      Average      |   0.788   |   0.772   |   0.704   |

## Dataset
<https://huggingface.co/datasets/ccmusic-database/Guzheng_Tech99>

## Mirror
<https://www.modelscope.cn/models/ccmusic-database/Guzheng_Tech99>

## Evaluation
<https://github.com/monetjoe/ccmusic_eval/tree/tech99>

## Cite
```bibtex
@article{Zhou-2025,
  author  = {Monan Zhou and Shenyang Xu and Zhaorui Liu and Zhaowen Wang and Feng Yu and Wei Li and Baoqiang Han},
  title   = {CCMusic: An Open and Diverse Database for Chinese Music Information Retrieval Research},
  journal = {Transactions of the International Society for Music Information Retrieval},
  volume  = {8},
  number  = {1},
  pages   = {22--38},
  month   = {Mar},
  year    = {2025},
  url     = {https://doi.org/10.5334/tismir.194},
  doi     = {10.5334/tismir.194}
}
```