seba3y
/

speecht5-asr-punctuation-sensitive

Automatic Speech Recognition

ASR-punctuation-sensitive

encoder-decoder-for-asr

Model card Files Files and versions Community

seba3y commited on Oct 11, 2023

Commit

54f0bb6

·

1 Parent(s): 40ea9c1

Update README.md

Files changed (1) hide show

README.md +13 -12

README.md CHANGED Viewed

@@ -34,8 +34,9 @@ enhancing the fidelity of transcriptions in scenarios where punctuation is cruci
 Finetuning dataset:** [MuST-C-en_ar](https://www.kaggle.com/datasets/sebaeymohamed/must-c-en-ar)
 ## Key Features:
-Punctuation Sensitivity: The model is specifically engineered to be highly sensitive to punctuation nuances in spoken English, ensuring
 accurate representation of the speaker's intended meaning.
 ## Usage
@@ -98,28 +99,28 @@ recordings from English TED Talks, which are automatically aligned at the senten
 |Paramter|Value|
 |-|-|
 |per_device_train_batch_size|6|
-|per_device_eval_batch_size|10|
-|gradient_accumulation_steps|20|
 |eval_accumulation_steps|16|
 |dataloader_num_workers|2|
-|learning_rate|7e-5|
 |adafactor|True|
-|weight_decay|0.1|
-|max_grad_norm|0.9|
-|num_train_epochs|2.15|
-|warmup_steps|2000|
 |lr_scheduler_type|constant_with_warmup|
 |fp16|True|
 |gradient_checkpointing|True|
 |sortish_sampler|True|
 ##### Results
-**Train loss:** 0.4429
 |Split|Word Error Rate (%)|
 |-|-|
-|dev|51.6|
-|tst-HE|40.2|
-|tst-COMMON|43.01|
 ## Citation

 Finetuning dataset:** [MuST-C-en_ar](https://www.kaggle.com/datasets/sebaeymohamed/must-c-en-ar)
 ## Key Features:
+**Punctuation Sensitivity:** The model is specifically engineered to be highly sensitive to punctuation nuances in spoken English, ensuring
 accurate representation of the speaker's intended meaning.
+**New Vocabulary:** Change vocabulary to be on Piece-level rather than character-level with vocabulary size 500 piece.
 ## Usage
 |Paramter|Value|
 |-|-|
 |per_device_train_batch_size|6|
+|per_device_eval_batch_size|16|
+|gradient_accumulation_steps|12|
 |eval_accumulation_steps|16|
 |dataloader_num_workers|2|
+|learning_rate|5e-5|
 |adafactor|True|
+|weight_decay|0.08989525|
+|max_grad_norm|0.58585|
+|num_train_epochs|5|
+|warmup_ratio|0.7|
 |lr_scheduler_type|constant_with_warmup|
 |fp16|True|
 |gradient_checkpointing|True|
 |sortish_sampler|True|
 ##### Results
+**Train loss:** 0.8925
 |Split|Word Error Rate (%)|
 |-|-|
+|dev|44.8|
+|tst-HE|39.1|
+|tst-COMMON|43.2|
 ## Citation