|
--- |
|
library_name: transformers |
|
language: |
|
- en |
|
license: mit |
|
base_model: microsoft/speecht5_tts |
|
tags: |
|
- generated_from_trainer |
|
datasets: |
|
- custom |
|
model-index: |
|
- name: 'SpeechT5 TTS technical train2 ' |
|
results: [] |
|
--- |
|
|
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You |
|
should probably proofread and complete it, then remove this comment. --> |
|
| **PAGE** | **LINK** | |
|
|-------------------------------------|------------------------------------------------------------------------------------------------------| |
|
| **MARATHI TTS GITHUB LINK LINK** | [MARATHI TTS REPO](https://github.com/dawarepranav/speechT5_marathi_finetuned-) | |
|
| **HUGGING FACE ENG TECHNICAL DATA** | [HUGGING FACE TECHNICAL DATA ](https://huggingface.co/pranavdaware/speecht5_tts_technical_train2) | |
|
| **HUGGING FACE MARATHI TTS** | [HUGGING FACE MARATHI TTS ](https://huggingface.co/pranavdaware/speecht5_tts_marathi_train2) | |
|
| **REPORT** | [REPORT](https://github.com/dawarepranav/speecht5_tts_english_technical_data/blob/main/A%20Technical%20Report.docx) | |
|
# π€ SpeechT5 TTS Technical Train2 |
|
|
|
This model is a fine-tuned version of [microsoft/speecht5_tts](https://huggingface.co/microsoft/speecht5_tts) using a custom dataset, specifically trained for *Text-to-Speech (TTS)* tasks. |
|
|
|
π― *Key Metric:* |
|
- *Loss* on the evaluation set: 0.3763 |
|
|
|
π’ *Listen to the generated sample:* |
|
|
|
The text is " Hello ,few technical terms i used while fine tuning are API and REST andΒ CUDAΒ andΒ TTS." |
|
|
|
<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/66f64964584cae45b5494560/JYJmDNPHnBRLuvqGTJQSu.wav"></audio> |
|
|
|
--- |
|
|
|
## π Model Description |
|
|
|
The *SpeechT5 TTS Technical Train2* is built on the *SpeechT5* architecture and was fine-tuned for speech synthesis (TTS). The fine-tuning focused on improving the naturalness and clarity of the generated audio from text. |
|
|
|
π *Base Model*: [Microsoft SpeechT5](https://huggingface.co/microsoft/speecht5_tts) |
|
π *Dataset*: Custom (specific details to be provided) |
|
|
|
--- |
|
|
|
## π§ Intended Uses & Limitations |
|
|
|
### β
*Primary Use Cases:* |
|
- *Text-to-Speech (TTS)* for technical Interview Texts . |
|
- *Virtual Assistants*: |
|
|
|
|
|
### β *Limitations:* |
|
- Best suited for English TTS tasks. |
|
- Require further fine-tuning on Large dataset . |
|
|
|
--- |
|
|
|
## π
Training Data |
|
|
|
The model was fine-tuned on a *custom dataset*, curated for enhancing TTS outputs. This dataset consists of various types of text that help the model generate more natural speech, making it suitable for TTS applications. |
|
|
|
### β *Hyperparameters:* |
|
|
|
The model was trained with the following hyperparameters: |
|
|
|
- *Learning Rate*: 1e-05 |
|
- *Train Batch Size*: 16 |
|
- *Eval Batch Size*: 8 |
|
- *Seed*: 42 |
|
- *Gradient Accumulation Steps*: 2 |
|
- *Total Train Batch Size*: 32 |
|
- *Optimizer*: AdamW (betas=(0.9, 0.999), epsilon=1e-08) |
|
- *LR Scheduler Type*: Linear |
|
- *Warmup Steps*: 50 |
|
- *Training Steps*: 500 |
|
- *Mixed Precision Training*: Native AMP |
|
|
|
### β *π Training Results:*: |
|
| πββ Training Loss | π Epoch | π€ Step | π Validation Loss | |
|
|:-------------------:|:-------:|:-------:|:-----------------:| |
|
| 1.1921 | 100.0 | 100 | 0.4136 | |
|
| 0.8435 | 200.0 | 200 | 0.3791 | |
|
| 0.8294 | 300.0 | 300 | 0.3766 | |
|
| 0.7959 | 400.0 | 400 | 0.3744 | |
|
| 0.7918 | 500.0 | 500 | 0.3763 | |
|
|
|
|
|
### π¦ Framework Versions |
|
|
|
- *Transformers*: 4.46.0.dev0 |
|
- *PyTorch*: 2.4.1+cu121 |
|
- *Datasets*: 3.0.2 |
|
- *Tokenizers*:Β 0.20.1 |