seba3y commited on
Commit
3ce9234
·
1 Parent(s): d54c10d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +122 -0
README.md CHANGED
@@ -5,8 +5,41 @@ language:
5
  metrics:
6
  - wer
7
  pipeline_tag: automatic-speech-recognition
 
 
 
 
 
 
 
 
 
 
 
 
8
  ---
9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  ```py
11
  # Use a pipeline as a high-level helper
12
  from transformers import pipeline
@@ -14,10 +47,99 @@ from transformers import pipeline
14
  pipe = pipeline("automatic-speech-recognition", model="seba3y/speecht5-asr-punctuation-sensitive")
15
  ```
16
 
 
17
  ```py
18
  # Load model directly
19
  from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
20
 
21
  processor = AutoProcessor.from_pretrained("seba3y/speecht5-asr-punctuation-sensitive")
22
  model = AutoModelForSpeechSeq2Seq.from_pretrained("seba3y/speecht5-asr-punctuation-sensitive")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
  ```
 
5
  metrics:
6
  - wer
7
  pipeline_tag: automatic-speech-recognition
8
+ datasets:
9
+ - MuST-C-en_ar
10
+ library_name: fairseq
11
+ tags:
12
+ - audio
13
+ - automatic-speech-recognition
14
+ - speech
15
+ - speech2text
16
+ - ASR
17
+ - asr
18
+ - ASR-punctuation-sensitive
19
+ - encoder-decoder-for-asr
20
  ---
21
 
22
+ # speecht5-asr-punctuation-sensitive
23
+ This model is part of SotoMedia's Automatic Video Dubbing project, aiming to build first open source video dubbing technolgy across a
24
+ diverse range of languages. You can find more details about our project and our pibline [here](https://github.com/ElsebaiyMohamed/Modablag).
25
+
26
+ ## Description:
27
+ The **speecht5-asr-punctuation-sensitive** model is an advanced Automatic Speech Recognition (ASR) system designed to transcribe spoken English
28
+ while maintaining a high level of awareness for punctuation. This model is trained to accurately recognize and preserve punctuation marks,
29
+ enhancing the fidelity of transcriptions in scenarios where punctuation is crucial for conveying meaning.
30
+
31
+ - **Model type:** transformer encoder- decoder
32
+ - **Language:** En
33
+ - **Base model:** SpeechT5-ASR [checkpoint](https://huggingface.co/microsoft/speecht5_asr)
34
+ - **Finetuning dataset:** [MuST-C-en_ar](https://www.kaggle.com/datasets/sebaeymohamed/must-c-en-ar)
35
+
36
+ ## Key Features:
37
+ Punctuation Sensitivity: The model is specifically engineered to be highly sensitive to punctuation nuances in spoken English, ensuring
38
+ accurate representation of the speaker's intended meaning.
39
+
40
+
41
+ ## Usage
42
+
43
  ```py
44
  # Use a pipeline as a high-level helper
45
  from transformers import pipeline
 
47
  pipe = pipeline("automatic-speech-recognition", model="seba3y/speecht5-asr-punctuation-sensitive")
48
  ```
49
 
50
+
51
  ```py
52
  # Load model directly
53
  from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
54
 
55
  processor = AutoProcessor.from_pretrained("seba3y/speecht5-asr-punctuation-sensitive")
56
  model = AutoModelForSpeechSeq2Seq.from_pretrained("seba3y/speecht5-asr-punctuation-sensitive")
57
+ ```
58
+
59
+ ## Fintuning & Evaluation Details
60
+
61
+ ### Dataset
62
+ MuST-C is a multilingual speech translation corpus whose size and quality will facilitate the training of end-to-end systems
63
+ for SLT from English into several target languages. For each target language, MuST-C comprises several hundred hours of audio
64
+ recordings from English TED Talks, which are automatically aligned at the sentence level with their manual transcriptions and translations.
65
+
66
+ **Datasplits:**
67
+ - set: dev
68
+ - talks: 11
69
+ - sentences: 1073
70
+ - words src: 24274
71
+ - words tgt: 21387
72
+ - time: 8914.57s == 2h28m34s
73
+
74
+ - set: tst-COMMON
75
+ - talks: 27
76
+ - sentences: 2019
77
+ - words src: 41955
78
+ - words tgt: 36443
79
+ - time: 14679.72s == 4h04m39s
80
+
81
+ - set: tst-HE
82
+ - talks: 12
83
+ - sentences: 578
84
+ - words src: 13080
85
+ - words tgt: 10912
86
+ - time: 5211.84s == 1h26m51s
87
+
88
+ - set: train
89
+ - talks: 2412
90
+ - sentences: 212085
91
+ - words src: 4520522
92
+ - words tgt: 4000457
93
+ - time: 1667744.17s == 463h15m44s
94
+
95
+
96
+ #### Hyperparameters
97
+
98
+ |Paramter|Value|
99
+ |-|-|
100
+ |per_device_train_batch_size|6|
101
+ |per_device_eval_batch_size|10|
102
+ |gradient_accumulation_steps|20|
103
+ |eval_accumulation_steps|16|
104
+ |dataloader_num_workers|2|
105
+ |learning_rate|7e-5|
106
+ |adafactor|True|
107
+ |weight_decay|0.1|
108
+ |max_grad_norm|0.9|
109
+ |num_train_epochs|2.15|
110
+ |warmup_steps|2000|
111
+ |lr_scheduler_type|constant_with_warmup|
112
+ |fp16|True|
113
+ |gradient_checkpointing|True|
114
+ |sortish_sampler|True|
115
+
116
+ ##### Results
117
+ **Train loss:** 0.4429
118
+ |Split|Word Error Rate (%)|
119
+ |-|-|
120
+ |dev|51.6|
121
+ |tst-HE|40.2|
122
+ |tst-COMMON|43.01|
123
+
124
+
125
+ ## Citation
126
+
127
+ - MuST-C dataset
128
+ ```
129
+ @InProceedings{mustc19, author = "Di Gangi, Mattia Antonino and Cattoni, Roldano and Bentivogli, Luisa and Negri, Matteo > and Turchi, Marco",
130
+ title = "{MuST-C: a Multilingual Speech Translation Corpus}",
131
+ booktitle = "Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies,
132
+ Volume 2 (Short Papers)", year = "2019", address = "Minneapolis, MN, USA", month = "June"}}
133
+ ```
134
+ - SpeechT5-ASR
135
+ ```
136
+ @inproceedings{ao-etal-2022-speecht5,
137
+ title = {{S}peech{T}5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing},
138
+ author = {Ao, Junyi and Wang, Rui and Zhou, Long and Wang, Chengyi and Ren, Shuo and Wu, Yu and Liu, Shujie and Ko, Tom and Li, Qing and Zhang, Yu and Wei, Zhihua and Qian, Yao and Li, Jinyu and Wei, Furu},
139
+ booktitle = {Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
140
+ month = {May},
141
+ year = {2022},
142
+ pages={5723--5738},
143
+ }
144
+
145
  ```