About

This model was created to support experiments for evaluating phonetic transcription with the Buckeye corpus as part of https://github.com/ginic/multipa. This is a version of facebook/wav2vec2-large-xlsr-53 fine tuned on a specific subset of the Buckeye corpus. For details about specific model parameters, please view the config.json here or training scripts in the scripts/buckeye_experiments folder of the GitHub repository.

Experiment Details

The best performing model from hyperparameter tuning experiments (batch size, learning rat, base model to fine tune). Vary the random seed to select training data while keeping an even 50/50 gender split to measure statistical significance of changing training data selection. Retrain with the same model parameters, but different data seeding to measure statistical significance of data seed, keeping 50/50 gender split.

Goals:

Choose initial hyperparameters (batch size, learning rat, base model to fine tune) based on validation set performance
Establish whether data variation with the same gender makeup is statistically significant in changing performance on the test set (first data_seed experiment)