MahmoudAshraf
/

mms-300m-1130-forced-aligner

Automatic Speech Recognition

forced-alignment

Inference Endpoints

Model card Files Files and versions Community

MahmoudAshraf commited on May 19, 2024

Commit

3bd2831

·

verified ·

1 Parent(s): e88adb8

Update README.md

Files changed (1) hide show

README.md +73 -1

README.md CHANGED Viewed

@@ -162,4 +162,76 @@ license: cc-by-nc-4.0
 tags:
 - mms
 - wav2vec2
----

 tags:
 - mms
 - wav2vec2
+---
+# Forced Alignment with Hugging Face CTC Models
+This Python package provides an efficient way to perform forced alignment between text and audio using Hugging Face's pretrained models. it also features an improved implementation to use much less memory than TorchAudio forced alignment API.
+The model checkpoint uploaded here is a conversion from torchaudio to HF Transformers for the MMS-300M checkpoint trained on forced alignment dataset
+## Installation
+```bash
+pip install git+https://github.com/MahmoudAshraf97/ctc-forced-aligner.git
+```
+## Usage
+```python
+from ctc_forced_aligner import (
+    load_audio,
+    load_alignment_model,
+    generate_emissions,
+    preprocess_text,
+    get_alignments,
+    get_spans,
+    postprocess_results,
+)
+audio_path = "your/audio/path"
+text_path = "your/text/path"
+audio_waveform = load_audio(audio_path, model.dtype, model.device)
+    emissions, stride = generate_emissions(
+        model, audio_waveform, args.window_size, args.context_size, args.batch_size
+    )
+with open(text_path, "r") as f:
+    lines = f.readlines()
+text = "".join(line for line in lines).replace("\n", " ").strip()
+alignment_model, alignment_tokenizer, alignment_dictionary = load_alignment_model(
+    device,
+    dtype=torch.float16 if device == "cuda" else torch.float32,
+    model_path="MahmoudAshraf/mms-300m-1130-forced-aligner"
+)
+# also compatible with other Wav2Vec2 Checkpoints such as
+# "jonatasgrosman/wav2vec2-large-xlsr-53-arabic"
+emissions, stride = generate_emissions(
+    alignment_model, audio_waveform, batch_size=batch_size
+)
+# romanization should be enabled when using multilingual models
+# it should be changed to `False` when using models that support the
+# native vocabulary of the text
+tokens_starred, text_starred = preprocess_text(
+    text,
+    romanize=True,
+    language=langs_to_iso[language],
+)
+segments, blank_id = get_alignments(
+    emissions,
+    tokens_starred,
+    alignment_dictionary,
+)
+spans = get_spans(tokens_starred, segments, alignment_tokenizer.decode(blank_id))
+word_timestamps = postprocess_results(text_starred, spans, stride)
+```