Automatic Speech Recognition
ESPnet
multilingual
audio
speech-translation
pyf98 commited on
Commit
dd2ab3c
·
verified ·
1 Parent(s): 6ab2ab2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +47 -14
README.md CHANGED
@@ -26,22 +26,55 @@ OWSM v3 has 889M parameters and is trained on 180k hours of public speech data.
26
  For more details and results, please check out our [paper](https://arxiv.org/abs/2309.13876) (Peng et al., ASRU 2023).
27
 
28
 
29
- ### Citing OWSM and ESPnet
 
 
30
 
31
  ```BibTex
32
- @article{peng2023owsm,
33
- title={Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data},
34
- author={Yifan Peng and Jinchuan Tian and Brian Yan and Dan Berrebbi and Xuankai Chang and Xinjian Li and Jiatong Shi and Siddhant Arora and William Chen and Roshan Sharma and Wangyou Zhang and Yui Sudo and Muhammad Shakeel and Jee-weon Jung and Soumi Maiti and Shinji Watanabe},
35
- journal={arXiv preprint arXiv:2309.13876},
36
- year={2023}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
  }
38
- @inproceedings{watanabe2018espnet,
39
- author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
40
- title={{ESPnet}: End-to-End Speech Processing Toolkit},
41
- year={2018},
42
- booktitle={Proceedings of Interspeech},
43
- pages={2207--2211},
44
- doi={10.21437/Interspeech.2018-1456},
45
- url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
46
  }
47
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
  For more details and results, please check out our [paper](https://arxiv.org/abs/2309.13876) (Peng et al., ASRU 2023).
27
 
28
 
29
+ ## Citations
30
+
31
+ #### OWSM-CTC
32
 
33
  ```BibTex
34
+ @inproceedings{owsm-ctc,
35
+ title = "{OWSM}-{CTC}: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification",
36
+ author = "Peng, Yifan and
37
+ Sudo, Yui and
38
+ Shakeel, Muhammad and
39
+ Watanabe, Shinji",
40
+ booktitle = "Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL)",
41
+ year = "2024",
42
+ month= {8},
43
+ url = "https://aclanthology.org/2024.acl-long.549",
44
+ }
45
+ ```
46
+
47
+ #### OWSM v3.1 and v3.2
48
+
49
+ ```BibTex
50
+ @inproceedings{owsm-v32,
51
+ title={On the Effects of Heterogeneous Data Sources on Speech-to-Text Foundation Models},
52
+ author={Jinchuan Tian and Yifan Peng and William Chen and Kwanghee Choi and Karen Livescu and Shinji Watanabe},
53
+ booktitle={Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH)},
54
+ year={2024},
55
+ month={9},
56
+ pdf="https://arxiv.org/pdf/2406.09282"
57
  }
58
+ @inproceedings{owsm-v31,
59
+ title={{OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer}},
60
+ author={Yifan Peng and Jinchuan Tian and William Chen and Siddhant Arora and Brian Yan and Yui Sudo and Muhammad Shakeel and Kwanghee Choi and Jiatong Shi and Xuankai Chang and Jee-weon Jung and Shinji Watanabe},
61
+ booktitle={Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH)},
62
+ year={2024},
63
+ month={9},
64
+ pdf="https://arxiv.org/pdf/2401.16658",
 
65
  }
66
  ```
67
+
68
+ #### Initial OWSM (v1, v2, v3)
69
+
70
+ ```BibTex
71
+ @inproceedings{owsm,
72
+ title={Reproducing Whisper-Style Training Using An Open-Source Toolkit And Publicly Available Data},
73
+ author={Yifan Peng and Jinchuan Tian and Brian Yan and Dan Berrebbi and Xuankai Chang and Xinjian Li and Jiatong Shi and Siddhant Arora and William Chen and Roshan Sharma and Wangyou Zhang and Yui Sudo and Muhammad Shakeel and Jee-weon Jung and Soumi Maiti and Shinji Watanabe},
74
+ booktitle={Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)},
75
+ year={2023},
76
+ month={12},
77
+ pdf="https://arxiv.org/pdf/2309.13876",
78
+ }
79
+ ```
80
+