CTC-DRO XLSR-based ASR model - set 5
This repository contains a CTC-DRO XLSR-based automatic speech recognition (ASR) model trained with ESPnet.
The model was trained on balanced training data from set 5.
Intended Use
This model is intended for ASR. Users can run inference using the provided checkpoint (valid.loss.best.pth
) and configuration file (config.yaml
):
import soundfile as sf
from espnet2.bin.asr_inference import Speech2Text
asr_train_config = "ctc-dro_xlsr_set_5/config.yaml"
asr_model_file = "ctc-dro_xlsr_set_5/valid.loss.best.pth"
model = Speech2Text.from_pretrained(
asr_train_config=asr_train_config,
asr_model_file=asr_model_file
)
speech, _ = sf.read("input.wav")
text, *_ = model(speech)[0]
print("Recognized text:", text)
How to Use
- Clone this repository.
- Use ESPnet’s inference scripts with the provided
config.yaml
and checkpoint file. - Ensure any external resources referenced in
config.yaml
are available at the indicated relative paths.
- Downloads last month
- 0
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
HF Inference API was unable to determine this model's library.