File size: 2,322 Bytes
43d07f1 7f1562d d30aeee 7f1562d 491b9bd 6fdc9fd 491b9bd 8cc9463 491b9bd |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 |
---
tags:
- pytorch_model_hub_mixin
- model_hub_mixin
- gender-classification
- VoxCeleb
license: mit
datasets:
- ProgramComputer/voxceleb
---
# Voice gender classifier
- This repo contains the inference code to use pretrained human voice gender classifier.
- You could also try 🤗[Huggingface online demo](https://huggingface.co/spaces/JaesungHuh/voice-gender-classifier).
## Installation
First, clone the original [github repository](https://github.com/JaesungHuh/voice-gender-classifier)
```
git clone https://github.com/JaesungHuh/voice-gender-classifier.git
```
and install the packages via pip.
```
cd voice-gender-classifier
pip install -r requirements.txt
```
## Usage
```
import torch
from model import ECAPA_gender
# You could directly download the model from the huggingface model hub
model = ECAPA_gender.from_pretrained("JaesungHuh/voice-gender-classifier")
model.eval()
# If you are using gpu ....
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
# Load the audio file and use predict function to directly get the output
example_file = "data/00001.wav"
with torch.no_grad():
output = model.predict(example_file, device=device)
print("Gender : ", output)
```
## Pretrained weights
For those who need pretrained weights, please download it in [here](https://drive.google.com/file/d/1ojtaa6VyUhEM49F7uEyvsLSVN3T8bbPI/view?usp=sharing)
## Training details
State-of-the-art speaker verification model already produces good representation of the speaker's gender.
I used the pretrained ECAPA-TDNN from [TaoRuijie's](https://github.com/TaoRuijie/ECAPA-TDNN) repository, added one linear layer to make two-class classifier, and finetuned the model with the VoxCeleb2 dev set.
The model achieved **98.7%** accuracy on the VoxCeleb1 identification test split.
## Caveat
I would like to note the training dataset I've used for this model (VoxCeleb) may not represent the global human population. Please be careful of unintended biases when using this model.
## Reference
- [Original github repository](https://github.com/JaesungHuh/voice-gender-classifier)
- I modified the model architecture from [TaoRuijie's](https://github.com/TaoRuijie/ECAPA-TDNN) repository.
- For more details about ECAPA-TDNN, check the [paper](https://arxiv.org/abs/2005.07143). |