CTC-DRO XLSR-based ASR model - set 5

This repository contains a CTC-DRO XLSR-based automatic speech recognition (ASR) model trained with ESPnet.
The model was trained on balanced training data from set 5.

Intended Use

This model is intended for ASR. Users can run inference using the provided checkpoint (valid.loss.best.pth) and configuration file (config.yaml):

import soundfile as sf
from espnet2.bin.asr_inference import Speech2Text

asr_train_config = "ctc-dro_xlsr_set_5/config.yaml"
asr_model_file = "ctc-dro_xlsr_set_5/valid.loss.best.pth"

model = Speech2Text.from_pretrained(
    asr_train_config=asr_train_config,
    asr_model_file=asr_model_file
)

speech, _ = sf.read("input.wav")
text, *_ = model(speech)[0]

print("Recognized text:", text)

How to Use

  1. Clone this repository.
  2. Use ESPnet’s inference scripts with the provided config.yaml and checkpoint file.
  3. Ensure any external resources referenced in config.yaml are available at the indicated relative paths.
Downloads last month
0
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.