File size: 2,322 Bytes
43d07f1
 
7f1562d
 
d30aeee
 
 
 
 
7f1562d
 
491b9bd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6fdc9fd
491b9bd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8cc9463
 
 
491b9bd
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
---
tags:
- pytorch_model_hub_mixin
- model_hub_mixin
- gender-classification
- VoxCeleb
license: mit
datasets:
- ProgramComputer/voxceleb
---

# Voice gender classifier 
- This repo contains the inference code to use pretrained human voice gender classifier.
- You could also try 🤗[Huggingface online demo](https://huggingface.co/spaces/JaesungHuh/voice-gender-classifier).

## Installation
First, clone the original [github repository](https://github.com/JaesungHuh/voice-gender-classifier)
```
git clone https://github.com/JaesungHuh/voice-gender-classifier.git
```

and install the packages via pip.

```
cd voice-gender-classifier
pip install -r requirements.txt
```

## Usage
```
import torch

from model import ECAPA_gender

# You could directly download the model from the huggingface model hub
model = ECAPA_gender.from_pretrained("JaesungHuh/voice-gender-classifier")
model.eval()

# If you are using gpu .... 
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

# Load the audio file and use predict function to directly get the output
example_file = "data/00001.wav"
with torch.no_grad():
    output = model.predict(example_file, device=device)
    print("Gender : ", output)
```

## Pretrained weights
For those who need pretrained weights, please download it in [here](https://drive.google.com/file/d/1ojtaa6VyUhEM49F7uEyvsLSVN3T8bbPI/view?usp=sharing)

## Training details
State-of-the-art speaker verification model already produces good representation of the speaker's gender.

I used the pretrained ECAPA-TDNN from [TaoRuijie's](https://github.com/TaoRuijie/ECAPA-TDNN) repository, added one linear layer to make two-class classifier, and finetuned the model with the VoxCeleb2 dev set.

The model achieved **98.7%** accuracy on the VoxCeleb1 identification test split.

## Caveat
I would like to note the training dataset I've used for this model (VoxCeleb) may not represent the global human population. Please be careful of unintended biases when using this model.

## Reference
- [Original github repository](https://github.com/JaesungHuh/voice-gender-classifier)
- I modified the model architecture from [TaoRuijie's](https://github.com/TaoRuijie/ECAPA-TDNN) repository.
- For more details about ECAPA-TDNN, check the [paper](https://arxiv.org/abs/2005.07143).