--- license: cc0-1.0 datasets: - mah92/Khadijah-FA_EN-Public-Phone-Audio-Dataset language: - fa - en pipeline_tag: text-to-speech --- # بسم اله الرحمن الرحیم - هست کلید در گنج حکیم # Model Card for Khadijah(SA) This is the first persian/english text-to-speech model using the brand new matcha TTS model. Much faster and better than VITS. Works best with the UNIVERSAL_V1_22050Hz hifigan vocoder. You can test this model [here](https://huggingface.co/spaces/k2-fsa/text-to-speech) under persian+english part. Enjoy! ## Usage with the Sherpa-onnx repo Remember to add metadata to onnx file as in: https://github.com/k2-fsa/icefall/blob/master/egs/ljspeech/TTS/matcha/export_onnx.py#L174 ## Usage with the Matcha-TTS repo 1) In matcha/text/cleaners.py, phonemizer.backend.EspeakBackend part: ``` language="fa", ``` 2) pip install piper-phonemize 3) In cleaners.py: add below persian_cleaners_piper: ``` import piper_phonemize def persian_cleaners_piper(text): """Pipeline for Persian text, including abbreviation expansion. + punctuation + stress""" #text = convert_to_ascii(text) text = lowercase(text) text = expand_abbreviations(text) phonemes = "".join(piper_phonemize.phonemize_espeak(text=text, voice="fa")[0]) phonemes = collapse_whitespace(phonemes) # Remove unwanted symbols (e.g., '1') unwanted_symbols = {'1', '-'} # Add any other unwanted symbols here filtered_phonemes = "".join([char for char in phonemes if char not in unwanted_symbols]) return filtered_phonemes ``` 4) In matcha/text/cleaners.py change this line to: ``` intersperse(text_to_sequence(text, ["persian_cleaners_piper"])[0], 0), ``` 5) Also set cleaner in configs/data/custom.yaml: cleaners: [persian_cleaners_piper] 6) replace symbols.py by: ``` def read_tokens(): tokens = [] with open("/home/oem/Basir/TTS/Matcha/Matcha-TTS/configs/tokens/tokens_sherpa_with_fa.txt", "r", encoding="utf-8") as f: for line in f: # Remove the newline character at the end line = line.rstrip("\n") # Split into token and number, preserving whitespace if " " in line: token = line[:line.index(" ")] # Extract everything before the first space if len(token) == 0: # White-space token = ' ' else: token = line # If there's no space, the entire line is the token tokens.append(token) return tokens symbols = read_tokens() ``` 7) For possible errors, change save_figure_to_numpy to: ``` import numpy as np import matplotlib.pyplot as plt from PIL import Image import io def save_figure_to_numpy(fig): buf = io.BytesIO() fig.savefig(buf, format='png', bbox_inches='tight', pad_inches=0) buf.seek(0) img = Image.open(buf) data = np.array(img) buf.close() return data ``` 8) After exporting to onnx, add sherpa metadata if you want to use the model with sherpa ``` python3 ./add_sherpa_metadata_to_matcha.py ``` ## Training results ![Training Results](khadijah-22050.png) ## Credits Trained by Ali Mahmoudi (@mah92) Special thanks to Masoud Azizi (@Mablue ), Amirreza Ramezani (@brightening-eyes ), and Dr. Hamid Jafari (Khaneh Noor Iranian Basir). Special thanks to people from @ttsfarsi channel. I should also thank you @csukuangfj from Xiaomi corporation for your helps and cares in icefall and sherpa-onnx repos. و ما نحن بشئ الا بما رحم ربنا