CTC-DRO XLSR-based ASR model - set 5

This repository contains a CTC-DRO XLSR-based automatic speech recognition (ASR) model trained with ESPnet.
The model was trained on balanced training data from set 5.

Intended Use

This model is intended for ASR. Users can run inference using the provided checkpoint (valid.loss.best.pth) and configuration file (config.yaml):

import soundfile as sf
from espnet2.bin.asr_inference import Speech2Text

asr_train_config = "ctc-dro_xlsr_set_5/config.yaml"
asr_model_file = "ctc-dro_xlsr_set_5/valid.loss.best.pth"

model = Speech2Text.from_pretrained(
    asr_train_config=asr_train_config,
    asr_model_file=asr_model_file
)

speech, _ = sf.read("input.wav")
text, *_ = model(speech)[0]

print("Recognized text:", text)

How to Use

Clone this repository.
Use ESPnet’s inference scripts with the provided config.yaml and checkpoint file.
Ensure any external resources referenced in config.yaml are available at the indicated relative paths.