Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,35 @@
|
|
1 |
-
---
|
2 |
-
license: cc-by-nc-4.0
|
3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: cc-by-nc-4.0
|
3 |
+
---
|
4 |
+
|
5 |
+
# CapSpeech
|
6 |
+
|
7 |
+
DataSet used for the paper: ***CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech***
|
8 |
+
|
9 |
+
Please refer to [CapSpeech](https://github.com/WangHelin1997/CapSpeech) repo for more details.
|
10 |
+
|
11 |
+
## Overview
|
12 |
+
|
13 |
+
🔥 CapSpeech is a new benchmark designed for style-captioned TTS (**CapTTS**) tasks, including style-captioned text-to-speech synthesis with sound effects (**CapTTS-SE**), accent-captioned TTS (**AccCapTTS**), emotion-captioned TTS (**EmoCapTTS**) and text-to-speech synthesis for chat agent (**AgentTTS**).
|
14 |
+
CapSpeech comprises over **10 million machine-annotated** audio-caption pairs and nearly **0.36 million human-annotated** audio-caption pairs. **3 new speech datasets** are specifically designed for the CapTTS-SE and AgentTTS tasks to enhance the benchmark’s coverage of real-world scenarios.
|
15 |
+
|
16 |
+

|
17 |
+
|
18 |
+
## License
|
19 |
+
|
20 |
+
⚠️ All resources are under the [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/) license.
|
21 |
+
|
22 |
+
## Citation
|
23 |
+
|
24 |
+
If you use the models, please cite our work as follows:
|
25 |
+
```bibtex
|
26 |
+
@misc{wang2025capspeechenablingdownstreamapplications,
|
27 |
+
title={CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech},
|
28 |
+
author={Helin Wang and Jiarui Hai and Dading Chong and Karan Thakkar and Tiantian Feng and Dongchao Yang and Junhyeok Lee and Laureano Moro Velazquez and Jesus Villalba and Zengyi Qin and Shrikanth Narayanan and Mounya Elhiali and Najim Dehak},
|
29 |
+
year={2025},
|
30 |
+
eprint={2506.02863},
|
31 |
+
archivePrefix={arXiv},
|
32 |
+
primaryClass={eess.AS},
|
33 |
+
url={https://arxiv.org/abs/2506.02863},
|
34 |
+
}
|
35 |
+
```
|