metadata
language:
- en
license: apache-2.0
tags:
- hyperion
- audio
- speech
- speaker-recognition
- x-vector
- thin-resnet34
datasets:
- voxceleb
metrics:
- eer
- min_dcf-p=0.05
- min_dcf-p=0.01
model-index:
- name: >-
voxceleb-v1.1-fbank80_stmn_lresnet34_e256_arcs30m0.3_do0_adam_lr0.05_b512.v1
results:
- task:
type: speaker-verification
name: Speaker Verification
dataset:
type: voxceleb1
name: Voxceleb1
args: Train on VoxCeleb2-dev
metrics:
- type: eer
value: 2.11
name: EER Vox1-O
- type: min_dcf-p=0.05
value: 0.135
name: Minimum DCF Vox1-O prior=0.05
- type: act_dcf-p=0.01
value: 0.208
name: Minimum DCF Vox1-O prior=0.01
- type: eer
value: 1.93
name: EER Vox1-E
- type: min_dcf-p=0.05
value: 0.121
name: Minimum DCF Vox1-E prior=0.05
- type: act_dcf-p=0.01
value: 0.204
name: Minimum DCF Vox1-E Original prior=0.01
- type: eer
value: 3.21
name: EER Vox1-H
- type: min_dcf-p=0.05
value: 0.19
name: Minimum DCF Vox1-H prior=0.05
- type: act_dcf-p=0.01
value: 0.298
name: Minimum DCF Vox1-H Original prior=0.01
Hyperion Toolkit Speaker Verification pre-trained Model
Model Configuration
This model was trained using recipe voxceleb/v1.1
The configuration for this modeis is defined in config_fbank80_stmn_lresnet34_arcs30m0.3_adam_lr0.05_amp.v1.sh
This is an x-vector model with:
- 80 logMel filter-banks with short-time mean normalization.
- ThinResNet34 (aka Light ResNet34) encoder.
- Mean+Stddev pooling
- AAM-softmax loss (m=0.3, s=30)
- Mixed prec. training.