File size: 3,852 Bytes
d8bead6 ebb0c42 d8bead6 66b8802 c81b1d7 d8bead6 c81b1d7 d8bead6 fc467a2 c81b1d7 fc467a2 c81b1d7 d8bead6 c81b1d7 d8bead6 c81b1d7 d8bead6 c81b1d7 d8bead6 c81b1d7 d8bead6 c81b1d7 d8bead6 c81b1d7 d8bead6 550b9c5 c81b1d7 550b9c5 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 |
---
library_name: transformers
language:
- en
license: mit
base_model: microsoft/speecht5_tts
tags:
- generated_from_trainer
datasets:
- custom
model-index:
- name: 'SpeechT5 TTS technical train2 '
results: []
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
| **PAGE** | **LINK** |
|-------------------------------------|------------------------------------------------------------------------------------------------------|
| **MARATHI TTS GITHUB LINK LINK** | [MARATHI TTS REPO](https://github.com/dawarepranav/speechT5_marathi_finetuned-) |
| **HUGGING FACE ENG TECHNICAL DATA** | [HUGGING FACE TECHNICAL DATA ](https://huggingface.co/pranavdaware/speecht5_tts_technical_train2) |
| **HUGGING FACE MARATHI TTS** | [HUGGING FACE MARATHI TTS ](https://huggingface.co/pranavdaware/speecht5_tts_marathi_train2) |
| **REPORT** | [REPORT](https://github.com/dawarepranav/speecht5_tts_english_technical_data/blob/main/A%20Technical%20Report.docx) |
# π€ SpeechT5 TTS Technical Train2
This model is a fine-tuned version of [microsoft/speecht5_tts](https://huggingface.co/microsoft/speecht5_tts) using a custom dataset, specifically trained for *Text-to-Speech (TTS)* tasks.
π― *Key Metric:*
- *Loss* on the evaluation set: 0.3763
π’ *Listen to the generated sample:*
The text is " Hello ,few technical terms i used while fine tuning are API and REST andΒ CUDAΒ andΒ TTS."
<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/66f64964584cae45b5494560/JYJmDNPHnBRLuvqGTJQSu.wav"></audio>
---
## π Model Description
The *SpeechT5 TTS Technical Train2* is built on the *SpeechT5* architecture and was fine-tuned for speech synthesis (TTS). The fine-tuning focused on improving the naturalness and clarity of the generated audio from text.
π *Base Model*: [Microsoft SpeechT5](https://huggingface.co/microsoft/speecht5_tts)
π *Dataset*: Custom (specific details to be provided)
---
## π§ Intended Uses & Limitations
### β
*Primary Use Cases:*
- *Text-to-Speech (TTS)* for technical Interview Texts .
- *Virtual Assistants*:
### β *Limitations:*
- Best suited for English TTS tasks.
- Require further fine-tuning on Large dataset .
---
## π
Training Data
The model was fine-tuned on a *custom dataset*, curated for enhancing TTS outputs. This dataset consists of various types of text that help the model generate more natural speech, making it suitable for TTS applications.
### β *Hyperparameters:*
The model was trained with the following hyperparameters:
- *Learning Rate*: 1e-05
- *Train Batch Size*: 16
- *Eval Batch Size*: 8
- *Seed*: 42
- *Gradient Accumulation Steps*: 2
- *Total Train Batch Size*: 32
- *Optimizer*: AdamW (betas=(0.9, 0.999), epsilon=1e-08)
- *LR Scheduler Type*: Linear
- *Warmup Steps*: 50
- *Training Steps*: 500
- *Mixed Precision Training*: Native AMP
### β *π Training Results:*:
| πββ Training Loss | π Epoch | π€ Step | π Validation Loss |
|:-------------------:|:-------:|:-------:|:-----------------:|
| 1.1921 | 100.0 | 100 | 0.4136 |
| 0.8435 | 200.0 | 200 | 0.3791 |
| 0.8294 | 300.0 | 300 | 0.3766 |
| 0.7959 | 400.0 | 400 | 0.3744 |
| 0.7918 | 500.0 | 500 | 0.3763 |
### π¦ Framework Versions
- *Transformers*: 4.46.0.dev0
- *PyTorch*: 2.4.1+cu121
- *Datasets*: 3.0.2
- *Tokenizers*:Β 0.20.1 |