AshwinSankar commited on
Commit
eea07ad
·
verified ·
1 Parent(s): 4d17da3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +145 -3
README.md CHANGED
@@ -1,3 +1,145 @@
1
- ---
2
- license: cc-by-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-4.0
3
+ language:
4
+ - as
5
+ - bn
6
+ - brx
7
+ - doi
8
+ - kn
9
+ - mai
10
+ - ml
11
+ - mr
12
+ - ne
13
+ - pa
14
+ - sa
15
+ - ta
16
+ - te
17
+ ---
18
+ # VITS TTS for Indian Languages
19
+
20
+ This repository contains a VITS-based Text-to-Speech (TTS) model fine-tuned for Indian languages. The model supports multiple Indian languages and a wide range of speaking styles and emotions, making it suitable for diverse use cases such as conversational AI, audiobooks, and more.
21
+
22
+ ---
23
+
24
+ ## Model Overview
25
+
26
+ The model `ai4bharat/vits_rasa_13` is based on the VITS architecture and supports the following features:
27
+ - **Languages**: Multiple Indian languages.
28
+ - **Styles**: Various speaking styles and emotions.
29
+ - **Speaker IDs**: Predefined speaker profiles for male and female voices.
30
+
31
+ ---
32
+
33
+ ## Installation
34
+
35
+ ```bash
36
+ pip install transformers torch
37
+ ```
38
+
39
+ ---
40
+
41
+ ## Usage
42
+
43
+ Here's a quick example to get started:
44
+
45
+ ```python
46
+ from transformers import AutoModel, AutoTokenizer
47
+
48
+ model = AutoModel.from_pretrained("ai4bharat/vits_rasa_13", trust_remote_code=True).to("cuda")
49
+ tokenizer = AutoTokenizer.from_pretrained("ai4bharat/vits_rasa_13", trust_remote_code=True)
50
+
51
+ text = "ਕੀ ਮੈਂ ਇਸ ਹਫਤੇ ਦੇ ਅੰਤ ਵਿੱਚ ਰੁੱਝਿਆ ਹੋਇਆ ਹਾਂ?" # Example text in Punjabi
52
+ speaker_id = 16 # PAN_M
53
+ style_id = 0 # ALEXA
54
+
55
+ inputs = tokenizer(text=text, return_tensors="pt").to("cuda")
56
+ outputs = model(inputs['input_ids'], speaker_id=speaker_id, emotion_id=style_id)
57
+ print(outputs.waveform.shape)
58
+ ```
59
+
60
+ ---
61
+
62
+ ## Speaker IDs
63
+
64
+ | Speaker Name | ID |
65
+ |--------------|------|
66
+ | ASM_F | 0 |
67
+ | ASM_M | 1 |
68
+ | BEN_F | 2 |
69
+ | BEN_M | 3 |
70
+ | BRX_F | 4 |
71
+ | BRX_M | 5 |
72
+ | DOI_F | 6 |
73
+ | DOI_M | 7 |
74
+ | KAN_F | 8 |
75
+ | KAN_M | 9 |
76
+ | MAI_M | 10 |
77
+ | MAL_F | 11 |
78
+ | MAR_F | 12 |
79
+ | MAR_M | 13 |
80
+ | NEP_F | 14 |
81
+ | PAN_F | 15 |
82
+ | PAN_M | 16 |
83
+ | SAN_M | 17 |
84
+ | TAM_F | 18 |
85
+ | TEL_F | 19 |
86
+
87
+ ---
88
+
89
+ ## Languages Supported
90
+
91
+ | Language |
92
+ |------------|
93
+ | Assamese |
94
+ | Bengali |
95
+ | Bodo |
96
+ | Dogri |
97
+ | Kannada |
98
+ | Maithili |
99
+ | Malayalam |
100
+ | Marathi |
101
+ | Nepali |
102
+ | Punjabi |
103
+ | Sanskrit |
104
+ | Tamil |
105
+ | Telugu |
106
+
107
+ ---
108
+
109
+ ## Style IDs
110
+
111
+ | Style Name | ID |
112
+ |-------------|------|
113
+ | ALEXA | 0 |
114
+ | ANGER | 1 |
115
+ | BB | 2 |
116
+ | BOOK | 3 |
117
+ | CONV | 4 |
118
+ | DIGI | 5 |
119
+ | DISGUST | 6 |
120
+ | FEAR | 7 |
121
+ | HAPPY | 8 |
122
+ | INDIC | 9 |
123
+ | INDICTTS | 9 |
124
+ | NEWS | 10 |
125
+ | NAMES | 11 |
126
+ | SAD | 12 |
127
+ | SANGRAH | 13 |
128
+ | SURPRISE | 14 |
129
+ | UMANG | 15 |
130
+ | WIKI | 16 |
131
+
132
+ ---
133
+
134
+ ## Citation
135
+
136
+ If you use this model in your research, please cite:
137
+
138
+ ```bibtex
139
+ @article{ai4bharat_vits_rasa_13,
140
+ title={VITS TTS for Indian Languages},
141
+ author={AI4Bharat Team},
142
+ year={2024},
143
+ publisher={Hugging Face}
144
+ }
145
+ ```