DBD-research-group
/

ConvNeXT-Base-BirdSet-XCL

Image Classification

Model card Files Files and versions Community

lrauch commited on Sep 5, 2024

Commit

cbc23a7

·

verified ·

1 Parent(s): 854b150

Update README.md

Files changed (1) hide show

README.md +7 -8

README.md CHANGED Viewed

@@ -8,6 +8,13 @@ tags: []
 ConvNext trained on the XCL dataset from BirdSet, covering 9736 bird species from Xeno-Canto. Please refer to the [BirdSet Paper](https://arxiv.org/pdf/2403.10380) and the
 [BirdSet Repository](https://github.com/DBD-research-group/BirdSet/tree/main) for further information.
 - The model is trained on 5-second clips of bird vocalizations.
 - num_channels: 1
 - pretrained checkpoint: facebook/convnext-base-224-22k
@@ -17,14 +24,6 @@ ConvNext trained on the XCL dataset from BirdSet, covering 9736 bird species fro
 - melscale: n_mels: 128, n_stft: 513
 - dbscale: top_db: 80
-### Model Details
-ConvNeXT is a pure convolutional model (ConvNet), inspired by the design of Vision Transformers, that claims to outperform them.
-## How to use
-The BirdSet data needs a custom processor that is available in the BirdSet repository. The model does not have a processor available.
-The model accepts a mono image (spectrogram) as input (e.g., `torch.Size([16, 1, 128, 1024])`)
 ```python
 import torch
 from transformers import AutoModelForImageClassification

 ConvNext trained on the XCL dataset from BirdSet, covering 9736 bird species from Xeno-Canto. Please refer to the [BirdSet Paper](https://arxiv.org/pdf/2403.10380) and the
 [BirdSet Repository](https://github.com/DBD-research-group/BirdSet/tree/main) for further information.
+### Model Details
+ConvNeXT is a pure convolutional model (ConvNet), inspired by the design of Vision Transformers, that claims to outperform them.
+## How to use
+The BirdSet data needs a custom processor that is available in the BirdSet repository. The model does not have a processor available.
+The model accepts a mono image (spectrogram) as input (e.g., `torch.Size([16, 1, 128, 1024])`)
 - The model is trained on 5-second clips of bird vocalizations.
 - num_channels: 1
 - pretrained checkpoint: facebook/convnext-base-224-22k
 - melscale: n_mels: 128, n_stft: 513
 - dbscale: top_db: 80
 ```python
 import torch
 from transformers import AutoModelForImageClassification