sajalmandrekar commited on
Commit
4cf908c
·
1 Parent(s): 215df76

uploaded readme

Browse files
Files changed (1) hide show
  1. README.md +226 -3
README.md CHANGED
@@ -1,3 +1,226 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # TranslateKar - English to Konkani (& vice-versa) Language Translator
2
+
3
+ Developed by: `Sajal Mandrekar` and `Shreya Deepak Pai`
4
+
5
+ Dataset generated by: ` Atit Naik, Saylee Phadte, Sajal and Shreya`
6
+
7
+ A Neural Machine Translator for Konkani to English Translations and vice-versa. It uses the Transformer architecture implemented using tensorflow and keras
8
+
9
+ ## Table of contents
10
+
11
+ 1. [Prerequisite](#Prerequisite)
12
+ 2. [Test translations using the saved model](#test-translations-using-the-saved-model)
13
+ 3. [Example Translations](#example-translations)
14
+ 4. [Evaluation: Bleu Score](#evaluation-bleu-score)
15
+ 5. [Building BERT Vocabulary](#building-vocabulary)
16
+ 6. [Training model from scratch](#training-model-from-scratch)
17
+ 7. [Using Pretrained weights](#using-pretrained-weights)
18
+ 8. [Terms and Conditions of use](#terms-and-conditions)
19
+
20
+
21
+ ## Prerequisite
22
+
23
+ * Make sure your python version is between 3.8 to 3.11 (to prevent any dependency issues)
24
+
25
+ * (Optional) Create a virtual environment:
26
+ * `python3 -m venv .myenv`
27
+ * `source ./.myenv/bin/activate`
28
+
29
+ * Install the libraries using pip: `python3 -m pip install -r requirements.txt`
30
+
31
+
32
+ ## Test translations using the saved model
33
+
34
+ simply run : `python3 run_saved_model.py`
35
+
36
+ It opens up a prompt to let you select the model (English to Konkani or Konkani to English) or specify the path to the model. On successful loading of the model, you can enter an input and it returns the translated output.
37
+
38
+ ## Example translations
39
+
40
+ #### English to Konkani (T_BASE_EK_07_07)
41
+
42
+ Random inputs:
43
+ ```
44
+ source: what is your name?
45
+ expected: तुमचें नांव किदें?
46
+ predicted: तुमचें नांव कितें ?
47
+
48
+ source: he likes to play cricket
49
+ expected: ताका क्रिकेट खेळपाक आवडटा
50
+ predicted: ताका क्रिकेट खेळपाक आवडटा
51
+
52
+ source: Ramesh is a very kind person
53
+ expected: रमेश हो एक बरोच दयाळ मनीस
54
+ predicted: रमेश हो एक सामको दयाळू मनीस
55
+
56
+ source: Goa is my favourite tourist destination
57
+ expected: गोंय हें म्हजें आवडीचें पर्यटन थळ
58
+ predicted: गोंय हें म्हजें आवडीचें पर्यटन थळ
59
+ ```
60
+
61
+ Quotes from the famous :
62
+ ```
63
+ source: Some Quotes from famous people:
64
+ predicted: नामनेच्या लोकांचीं कांय कोटीां : १ .
65
+
66
+ source: ""The only way to do great work is to love what you do."" - Steve Jobs
67
+ predicted: "" व्हडलें काम करपाचो एकूच मार्ग म्हणल्यार तुमी जें करतात ताचो मोग करप . ""
68
+
69
+ source: ""In the end, it's not the years in your life that count. It's the life in your years."" - Abraham Lincoln
70
+ predicted: "" शेवटाक , तुमच्या जिवितांत वर्सां न्हय , जीं संख्या . तुमच्या वर्सांनी जिवीत . "" अब्राहम लिंकन
71
+
72
+ source: ""Success is not final, failure is not fatal: It is the courage to continue that counts."" - Winston Churchill
73
+ predicted: "" यशस्वी जावप हें निमाणें न्हय , अपेस घातक न्हय : तें चालूच दवरप हें धैर्य . "" विन्स्टन न्यायालयाक
74
+
75
+ source: ""It does not matter how slowly you go as long as you do not stop."" - Confucius
76
+ predicted: "" जो मेरेन तुमी थांबवपा इतले ल्हवू ल्हवू वतात ताका कसलोच फरक पडना . "" - द्रॅल्फ्लोव्हल
77
+
78
+ source: ""The greatest glory in living lies not in never falling, but in rising every time we fall."" - Nelson Mandela
79
+ predicted: "" जिणेंत सगळ्यांत व्हडलो वैभव केन्नाच पडना , पूण दर खेपे आमी पडटात तेन्ना वाडपाक फट उलयता . "" नेल्सन मंडेला
80
+
81
+ source: ""The only limit to our realization of tomorrow will be our doubts of today."" - Franklin D. Roosevelt
82
+ predicted: फाल्यां आमच्या साक्षात्काराक एकूच मर्यादा म्हळ्यार आयच्या आमचो दुबाव आसतलो . "" - फ्रँकलिन डी .
83
+
84
+ source: ""Believe you can and you're halfway there."" - Theodore Roosevelt
85
+ predicted: "" विस्वास दवरात तुमी शक्य आसात आनी तुमी अर्द्या वाटेर आसात . "" - थिओडोर रूव्हॉल्ट्ट .
86
+
87
+ source: ""You miss 100% of the shots you don't take."" - Wayne Gretzky
88
+ predicted: "" तुमी घेनात ते १०० % शॉट तुमी चुकतात . "" - वेन ग्रेत्झकी
89
+
90
+ source: ""Don't watch the clock; do what it does. Keep going."" - Sam Levenson
91
+ predicted: "" घड्याळ पळोवंक नाकात ; जें चलता तें करात . "" - सॅम लेव्हेनसन
92
+ ```
93
+
94
+ #### Konkani to English (T_BASE_KE_17_07)
95
+
96
+ Random inputs:
97
+ ```
98
+ source: तुमचें नांव कितें?
99
+ expected: what is your name?
100
+ predicted: What is your name ?
101
+
102
+ source: ताका क्रिकेट खेळपाक आवडटा
103
+ expected: he likes to play cricket
104
+ predicted: He likes to play cricket
105
+
106
+ source: रमेश हो एक बरोच दयाळ मनीस
107
+ expected: Ramesh is a very kind person
108
+ predicted: Ramesh is a very compassionate person
109
+
110
+ source: गोंय हें म्हजें आवडीचें पर्यटन थळ
111
+ expected: Goa is my favourite tourist destination
112
+ predicted: Goa is my favourite tourist destination
113
+ ```
114
+
115
+ Miscellaneous inputs:
116
+ ```
117
+ Input: हाय! म्हजें नांव सजल मांद्रेकर
118
+ Output: High ! My name is Sajal Mandekar
119
+
120
+ Input: हांव फार्मगुडीच्या गोंय अभियांत्रिकी महाविद्यालयाचो विद्यार्थी
121
+ Output: I am a student of Goa Engineering College , farmgudi
122
+
123
+ Input: हांव संगणक अभियांत्रिकी शिकतां
124
+ Output: I am learning computer engineering
125
+
126
+ Input: मनशाक फकत एकूच गजाल जाय आनी ती तिरस्कार करपा सारकी
127
+ Output: A person needs only one thing and that is contemptable
128
+
129
+ Input: आज रातीं कितें करता?
130
+ Output: What does it do tonight ?
131
+ ```
132
+
133
+ ## Evaluation: Bleu Score
134
+
135
+ * English to Konkani:
136
+ * model codename: T_BASE_EK_07_07
137
+ * Bleu-4 score: **_29.03%_**
138
+
139
+ * Konkani to English:
140
+ * model codename: T_BASE_KE_17_07
141
+ * Bleu-4 score: **_23.20%_**
142
+
143
+
144
+ ## Building vocabulary
145
+
146
+ * **This requires you to have a dataset!** The code uses BERT tokenizer (Word-Piece tokenizer) to generated the vocabulary. Note that this is a very CPU/GPU intensive task and thus can take a lot of time depending on your system performance.
147
+
148
+ * run : `python3 building_vocabulary.py`
149
+
150
+ * specify the path of your dataset and the max size of the vocabulary
151
+
152
+ * Generates the vocabulary adding `.vocab` extention to file name of the dataset
153
+
154
+
155
+ ## Training model from scratch
156
+
157
+ * Prerequisites:
158
+ * A parallel corpus in two separate files
159
+ * Two separate vocabulary files for source and target languages
160
+
161
+ * Modify the configuration file `config.env` to set the dataset paths, vocabulary, epochs and architecture (leave it to default if you want to use the BASE configurations)
162
+
163
+ * train the model: `python3 transformer_train.py config.env`
164
+
165
+
166
+ ## Using Pretrained weights
167
+
168
+ * open config.env file and modify the variables to specify your dataset file and model name/path (Example shown below):
169
+ ```
170
+ # -----Configurations of the Transformer model----- #
171
+
172
+ # Model name
173
+ MODEL_NAME=TRANS_BASE_EK
174
+
175
+ ## Path to training data of source language
176
+ CONTEXT_DATA_PATH=dataset/FULL_DATA.en
177
+
178
+ ## Path to training data of target language
179
+ TARGET_DATA_PATH=dataset/FULL_DATA.gom
180
+
181
+ ## Path to vocabulary of source language
182
+ CONTEXT_TOKEN_PATH=vocabulary/bert_en.vocab
183
+
184
+ ## Path to vocabulary data of target language
185
+ TARGET_TOKEN_PATH=vocabulary/bert_gom.vocab
186
+
187
+ # Reloading weights from pretrained model (Comment out or leave empty or set to 'None' if not using)
188
+ WEIGHTS_PATH=trained_models/T_BASE_EK_07_07/checkpoints/best_model.weights.hdf5
189
+ ```
190
+
191
+ * Make sure that architecture variables like `NUM_LAYERS`,`DFF`, etc match the architecture of the pretrained model weights (specified in `config.env` inside the `checkpoints` directory)
192
+
193
+ * Set the epochs using the `epochs` variable
194
+
195
+ * To start training run: `python3 transformer_train.py config.env`
196
+
197
+
198
+ ## TERMS AND CONDITIONS
199
+
200
+ **Disclaimer: Use of this Service and Information**
201
+
202
+ The following terms and conditions govern your use of this service ("TranslateKar"). By using the Service, you agree to these terms and conditions in full. If you disagree with these terms and conditions or any part of them, you must not use this Service.
203
+
204
+ **No Liability for Accuracy of Information**
205
+
206
+ The information provided by this Service is for general informational purposes only. While we strive to provide accurate and up-to-date information, we make no representations or warranties of any kind, express or implied, about the completeness, accuracy, reliability, suitability, or availability with respect to the Service or the information, products, services, or related graphics contained on the Service for any purpose. Any reliance you place on such information is therefore strictly at your own risk.
207
+
208
+ **No Professional Advice**
209
+
210
+ The information provided by this Service is not intended as professional advice. You should not rely on the information as an alternative to professional advice. If you have any specific questions about any matter, you should consult a professional.
211
+
212
+ **No Warranty**
213
+
214
+ We do not warrant or represent:
215
+
216
+ 1. the completeness or accuracy of the information published on this Service;
217
+ 2. that the material on this Service is up to date; or
218
+ 3. that the Service or any service on the Service will remain available.
219
+
220
+ **Limitations of Liability**
221
+
222
+ In no event will we be liable for any loss or damage including without limitation, indirect or consequential loss or damage, or any loss or damage whatsoever arising from loss of data or profits arising out of, or in connection with, the use of this Service.
223
+
224
+ **Links to Other Websites**
225
+
226
+ Through this Service, you may be able to link to other websites which are not under our control. We have no control over the nature, content, and availability of those sites. The inclusion of any links does not necessarily imply a recommendation or endorse the views expressed within them.