license: mit | |
language: | |
- en | |
pipeline_tag: automatic-speech-recognition | |
# About | |
This model was created to support experiments for evaluating phonetic transcription | |
with the Buckeye corpus as part of https://github.com/ginic/multipa. | |
This is a version of facebook/wav2vec2-large-xlsr-53 fine tuned on a specific subset of the Buckeye corpus. | |
For details about specific model parameters, please view the config.json here or | |
training scripts in the scripts/buckeye_experiments folder of the GitHub repository. | |
# Experiment Details | |
Still training with a total amount of data equal to half the full training data (4000 examples), vary the gender split 30/70, but draw examples from all individuals. Do 5 models for each gender split with the same model parameters but different data seeds. | |
Goals: | |
- Determine how different in gender split in training data affects performance | |
Params to vary: | |
- percent female (--percent_female) [0.3, 0.7] | |
- training seed (--train_seed) | |