omvishesh
/

SpeechT5_interview

Generated from Trainer

Model card Files Files and versions Metrics Training metrics Community

SpeechT5_interview / README.md

omvishesh's picture

Update README.md

5fb1f1a verified 8 months ago

|

history blame contribute delete

2.42 kB

	---
	library_name: transformers
	license: mit
	base_model: microsoft/speecht5_tts
	tags:
	- generated_from_trainer
	model-index:
	- name: speecht5_finetuned_interview
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# speecht5_finetuned_emirhan_tr

	This model is a fine-tuned version of [microsoft/speecht5_tts](https://huggingface.co/microsoft/speecht5_tts) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.4392

	## Model description

	The Speech T5 model is a text-to-speech (TTS) model based on the T5 architecture. It has been
	pretrained on a large corpus of speech data, allowing it to understand and generate human-like
	speech from input text. The model is capable of handling various speech synthesis tasks, making it
	suitable for applications such as virtual assistants, audiobook production, and more


	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	The model was trained using a custom-made dataset of 170 audio samples, containing commonly asked interview lines. Synthetic audio was generated using Amazon AWS Polly, which offered diverse voice options. The dataset was carefully curated to ensure a variety of speech styles, accents, and phonetic structures, enhancing the model's ability to generalize.



	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 2e-05
	- train_batch_size: 2
	- eval_batch_size: 1
	- seed: 42
	- gradient_accumulation_steps: 4
	- total_train_batch_size: 8
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 40
	- training_steps: 250
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-------:\|:----:\|:---------------:\|
	\| 0.6078 \| 2.2535 \| 40 \| 0.4783 \|
	\| 0.5393 \| 4.5070 \| 80 \| 0.4533 \|
	\| 0.4864 \| 6.7606 \| 120 \| 0.4480 \|
	\| 0.4846 \| 9.0141 \| 160 \| 0.4493 \|
	\| 0.4628 \| 11.2676 \| 200 \| 0.4383 \|
	\| 0.4731 \| 13.5211 \| 240 \| 0.4392 \|


	### Framework versions

	- Transformers 4.45.0.dev0
	- Pytorch 2.4.1+cu118
	- Datasets 3.0.0
	- Tokenizers 0.20.0