Muhammad-Arham commited on
Commit
8b8247f
·
verified ·
1 Parent(s): b017d29

Upload 2 files

Browse files
Files changed (2) hide show
  1. SMS-Spam-detection.ipynb +684 -0
  2. vectorizer.pkl +3 -0
SMS-Spam-detection.ipynb ADDED
@@ -0,0 +1,684 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "code",
5
+ "execution_count": 1,
6
+ "metadata": {},
7
+ "outputs": [],
8
+ "source": [
9
+ "import numpy as np\n",
10
+ "import pandas as pd"
11
+ ]
12
+ },
13
+ {
14
+ "cell_type": "code",
15
+ "execution_count": 6,
16
+ "metadata": {},
17
+ "outputs": [
18
+ {
19
+ "data": {
20
+ "text/html": [
21
+ "<div>\n",
22
+ "<style scoped>\n",
23
+ " .dataframe tbody tr th:only-of-type {\n",
24
+ " vertical-align: middle;\n",
25
+ " }\n",
26
+ "\n",
27
+ " .dataframe tbody tr th {\n",
28
+ " vertical-align: top;\n",
29
+ " }\n",
30
+ "\n",
31
+ " .dataframe thead th {\n",
32
+ " text-align: right;\n",
33
+ " }\n",
34
+ "</style>\n",
35
+ "<table border=\"1\" class=\"dataframe\">\n",
36
+ " <thead>\n",
37
+ " <tr style=\"text-align: right;\">\n",
38
+ " <th></th>\n",
39
+ " <th>v1</th>\n",
40
+ " <th>v2</th>\n",
41
+ " <th>Unnamed: 2</th>\n",
42
+ " <th>Unnamed: 3</th>\n",
43
+ " <th>Unnamed: 4</th>\n",
44
+ " </tr>\n",
45
+ " </thead>\n",
46
+ " <tbody>\n",
47
+ " <tr>\n",
48
+ " <th>1820</th>\n",
49
+ " <td>ham</td>\n",
50
+ " <td>I'll probably be by tomorrow (or even later to...</td>\n",
51
+ " <td>NaN</td>\n",
52
+ " <td>NaN</td>\n",
53
+ " <td>NaN</td>\n",
54
+ " </tr>\n",
55
+ " <tr>\n",
56
+ " <th>4348</th>\n",
57
+ " <td>ham</td>\n",
58
+ " <td>ÌÏ bot notes oredi... Cos i juz rem i got...</td>\n",
59
+ " <td>NaN</td>\n",
60
+ " <td>NaN</td>\n",
61
+ " <td>NaN</td>\n",
62
+ " </tr>\n",
63
+ " <tr>\n",
64
+ " <th>1553</th>\n",
65
+ " <td>ham</td>\n",
66
+ " <td>Ok how you dear. Did you call chechi</td>\n",
67
+ " <td>NaN</td>\n",
68
+ " <td>NaN</td>\n",
69
+ " <td>NaN</td>\n",
70
+ " </tr>\n",
71
+ " <tr>\n",
72
+ " <th>3395</th>\n",
73
+ " <td>spam</td>\n",
74
+ " <td>URGENT! Your Mobile number has been awarded wi...</td>\n",
75
+ " <td>NaN</td>\n",
76
+ " <td>NaN</td>\n",
77
+ " <td>NaN</td>\n",
78
+ " </tr>\n",
79
+ " <tr>\n",
80
+ " <th>2415</th>\n",
81
+ " <td>ham</td>\n",
82
+ " <td>Huh means computational science... Y they like...</td>\n",
83
+ " <td>NaN</td>\n",
84
+ " <td>NaN</td>\n",
85
+ " <td>NaN</td>\n",
86
+ " </tr>\n",
87
+ " </tbody>\n",
88
+ "</table>\n",
89
+ "</div>"
90
+ ],
91
+ "text/plain": [
92
+ " v1 v2 Unnamed: 2 \\\n",
93
+ "1820 ham I'll probably be by tomorrow (or even later to... NaN \n",
94
+ "4348 ham ÌÏ bot notes oredi... Cos i juz rem i got... NaN \n",
95
+ "1553 ham Ok how you dear. Did you call chechi NaN \n",
96
+ "3395 spam URGENT! Your Mobile number has been awarded wi... NaN \n",
97
+ "2415 ham Huh means computational science... Y they like... NaN \n",
98
+ "\n",
99
+ " Unnamed: 3 Unnamed: 4 \n",
100
+ "1820 NaN NaN \n",
101
+ "4348 NaN NaN \n",
102
+ "1553 NaN NaN \n",
103
+ "3395 NaN NaN \n",
104
+ "2415 NaN NaN "
105
+ ]
106
+ },
107
+ "execution_count": 6,
108
+ "metadata": {},
109
+ "output_type": "execute_result"
110
+ }
111
+ ],
112
+ "source": [
113
+ "df = pd.read_csv('spam.csv', encoding='ISO-8859-1')\n",
114
+ "df.sample(5)"
115
+ ]
116
+ },
117
+ {
118
+ "cell_type": "code",
119
+ "execution_count": 7,
120
+ "metadata": {},
121
+ "outputs": [
122
+ {
123
+ "data": {
124
+ "text/plain": [
125
+ "(5572, 5)"
126
+ ]
127
+ },
128
+ "execution_count": 7,
129
+ "metadata": {},
130
+ "output_type": "execute_result"
131
+ }
132
+ ],
133
+ "source": [
134
+ "df.shape"
135
+ ]
136
+ },
137
+ {
138
+ "cell_type": "markdown",
139
+ "metadata": {},
140
+ "source": [
141
+ "Steps include in this project:\n",
142
+ "1. Data Cleaning\n",
143
+ "2. EDA (Expraiotery Data analysis)\n",
144
+ "3. Text pre processing\n",
145
+ "4. Model building\n",
146
+ "5. Evaluation\n",
147
+ "6. Improvmenets depending upon the evaluation\n",
148
+ "7. Website\n",
149
+ "8. Deploy"
150
+ ]
151
+ },
152
+ {
153
+ "cell_type": "markdown",
154
+ "metadata": {},
155
+ "source": [
156
+ "**1.Data Cleaning**"
157
+ ]
158
+ },
159
+ {
160
+ "cell_type": "code",
161
+ "execution_count": null,
162
+ "metadata": {},
163
+ "outputs": [
164
+ {
165
+ "name": "stdout",
166
+ "output_type": "stream",
167
+ "text": [
168
+ "<class 'pandas.core.frame.DataFrame'>\n",
169
+ "RangeIndex: 5572 entries, 0 to 5571\n",
170
+ "Data columns (total 5 columns):\n",
171
+ " # Column Non-Null Count Dtype \n",
172
+ "--- ------ -------------- ----- \n",
173
+ " 0 v1 5572 non-null object\n",
174
+ " 1 v2 5572 non-null object\n",
175
+ " 2 Unnamed: 2 50 non-null object\n",
176
+ " 3 Unnamed: 3 12 non-null object\n",
177
+ " 4 Unnamed: 4 6 non-null object\n",
178
+ "dtypes: object(5)\n",
179
+ "memory usage: 217.8+ KB\n"
180
+ ]
181
+ }
182
+ ],
183
+ "source": [
184
+ "\n",
185
+ "df.info()"
186
+ ]
187
+ },
188
+ {
189
+ "cell_type": "code",
190
+ "execution_count": 12,
191
+ "metadata": {},
192
+ "outputs": [],
193
+ "source": [
194
+ "#drop last three columns\n",
195
+ "df.drop(columns=['Unnamed: 2','Unnamed: 3','Unnamed: 4'], inplace=True)"
196
+ ]
197
+ },
198
+ {
199
+ "cell_type": "code",
200
+ "execution_count": 13,
201
+ "metadata": {},
202
+ "outputs": [
203
+ {
204
+ "data": {
205
+ "text/html": [
206
+ "<div>\n",
207
+ "<style scoped>\n",
208
+ " .dataframe tbody tr th:only-of-type {\n",
209
+ " vertical-align: middle;\n",
210
+ " }\n",
211
+ "\n",
212
+ " .dataframe tbody tr th {\n",
213
+ " vertical-align: top;\n",
214
+ " }\n",
215
+ "\n",
216
+ " .dataframe thead th {\n",
217
+ " text-align: right;\n",
218
+ " }\n",
219
+ "</style>\n",
220
+ "<table border=\"1\" class=\"dataframe\">\n",
221
+ " <thead>\n",
222
+ " <tr style=\"text-align: right;\">\n",
223
+ " <th></th>\n",
224
+ " <th>v1</th>\n",
225
+ " <th>v2</th>\n",
226
+ " </tr>\n",
227
+ " </thead>\n",
228
+ " <tbody>\n",
229
+ " <tr>\n",
230
+ " <th>807</th>\n",
231
+ " <td>ham</td>\n",
232
+ " <td>Boooo you always work. Just quit.</td>\n",
233
+ " </tr>\n",
234
+ " <tr>\n",
235
+ " <th>1913</th>\n",
236
+ " <td>ham</td>\n",
237
+ " <td>You want to go?</td>\n",
238
+ " </tr>\n",
239
+ " <tr>\n",
240
+ " <th>4365</th>\n",
241
+ " <td>ham</td>\n",
242
+ " <td>Mm yes dear look how i am hugging you both. :-P</td>\n",
243
+ " </tr>\n",
244
+ " <tr>\n",
245
+ " <th>776</th>\n",
246
+ " <td>ham</td>\n",
247
+ " <td>Why don't you go tell your friend you're not s...</td>\n",
248
+ " </tr>\n",
249
+ " <tr>\n",
250
+ " <th>814</th>\n",
251
+ " <td>spam</td>\n",
252
+ " <td>U were outbid by simonwatson5120 on the Shinco...</td>\n",
253
+ " </tr>\n",
254
+ " </tbody>\n",
255
+ "</table>\n",
256
+ "</div>"
257
+ ],
258
+ "text/plain": [
259
+ " v1 v2\n",
260
+ "807 ham Boooo you always work. Just quit.\n",
261
+ "1913 ham You want to go? \n",
262
+ "4365 ham Mm yes dear look how i am hugging you both. :-P\n",
263
+ "776 ham Why don't you go tell your friend you're not s...\n",
264
+ "814 spam U were outbid by simonwatson5120 on the Shinco..."
265
+ ]
266
+ },
267
+ "execution_count": 13,
268
+ "metadata": {},
269
+ "output_type": "execute_result"
270
+ }
271
+ ],
272
+ "source": [
273
+ "df.sample(5)"
274
+ ]
275
+ },
276
+ {
277
+ "cell_type": "code",
278
+ "execution_count": 14,
279
+ "metadata": {},
280
+ "outputs": [
281
+ {
282
+ "data": {
283
+ "text/html": [
284
+ "<div>\n",
285
+ "<style scoped>\n",
286
+ " .dataframe tbody tr th:only-of-type {\n",
287
+ " vertical-align: middle;\n",
288
+ " }\n",
289
+ "\n",
290
+ " .dataframe tbody tr th {\n",
291
+ " vertical-align: top;\n",
292
+ " }\n",
293
+ "\n",
294
+ " .dataframe thead th {\n",
295
+ " text-align: right;\n",
296
+ " }\n",
297
+ "</style>\n",
298
+ "<table border=\"1\" class=\"dataframe\">\n",
299
+ " <thead>\n",
300
+ " <tr style=\"text-align: right;\">\n",
301
+ " <th></th>\n",
302
+ " <th>target</th>\n",
303
+ " <th>text</th>\n",
304
+ " </tr>\n",
305
+ " </thead>\n",
306
+ " <tbody>\n",
307
+ " <tr>\n",
308
+ " <th>4113</th>\n",
309
+ " <td>ham</td>\n",
310
+ " <td>Where are you ? What do you do ? How can you s...</td>\n",
311
+ " </tr>\n",
312
+ " <tr>\n",
313
+ " <th>4244</th>\n",
314
+ " <td>ham</td>\n",
315
+ " <td>Is toshiba portege m100 gd?</td>\n",
316
+ " </tr>\n",
317
+ " <tr>\n",
318
+ " <th>3799</th>\n",
319
+ " <td>spam</td>\n",
320
+ " <td>We tried to contact you re your reply to our o...</td>\n",
321
+ " </tr>\n",
322
+ " <tr>\n",
323
+ " <th>1075</th>\n",
324
+ " <td>ham</td>\n",
325
+ " <td>Oi. Ami parchi na re. Kicchu kaaj korte iccha ...</td>\n",
326
+ " </tr>\n",
327
+ " <tr>\n",
328
+ " <th>1560</th>\n",
329
+ " <td>ham</td>\n",
330
+ " <td>Just got some gas money, any chance you and th...</td>\n",
331
+ " </tr>\n",
332
+ " </tbody>\n",
333
+ "</table>\n",
334
+ "</div>"
335
+ ],
336
+ "text/plain": [
337
+ " target text\n",
338
+ "4113 ham Where are you ? What do you do ? How can you s...\n",
339
+ "4244 ham Is toshiba portege m100 gd?\n",
340
+ "3799 spam We tried to contact you re your reply to our o...\n",
341
+ "1075 ham Oi. Ami parchi na re. Kicchu kaaj korte iccha ...\n",
342
+ "1560 ham Just got some gas money, any chance you and th..."
343
+ ]
344
+ },
345
+ "execution_count": 14,
346
+ "metadata": {},
347
+ "output_type": "execute_result"
348
+ }
349
+ ],
350
+ "source": [
351
+ "#Renaming the columns\n",
352
+ "df.rename(columns={'v1':'target','v2':'text'}, inplace=True)\n",
353
+ "df.sample(5)"
354
+ ]
355
+ },
356
+ {
357
+ "cell_type": "code",
358
+ "execution_count": 15,
359
+ "metadata": {},
360
+ "outputs": [],
361
+ "source": [
362
+ "from sklearn.preprocessing import LabelEncoder\n",
363
+ "encoder = LabelEncoder()"
364
+ ]
365
+ },
366
+ {
367
+ "cell_type": "code",
368
+ "execution_count": 17,
369
+ "metadata": {},
370
+ "outputs": [],
371
+ "source": [
372
+ "df['target'] = encoder.fit_transform(df['target'])"
373
+ ]
374
+ },
375
+ {
376
+ "cell_type": "code",
377
+ "execution_count": 18,
378
+ "metadata": {},
379
+ "outputs": [
380
+ {
381
+ "data": {
382
+ "text/html": [
383
+ "<div>\n",
384
+ "<style scoped>\n",
385
+ " .dataframe tbody tr th:only-of-type {\n",
386
+ " vertical-align: middle;\n",
387
+ " }\n",
388
+ "\n",
389
+ " .dataframe tbody tr th {\n",
390
+ " vertical-align: top;\n",
391
+ " }\n",
392
+ "\n",
393
+ " .dataframe thead th {\n",
394
+ " text-align: right;\n",
395
+ " }\n",
396
+ "</style>\n",
397
+ "<table border=\"1\" class=\"dataframe\">\n",
398
+ " <thead>\n",
399
+ " <tr style=\"text-align: right;\">\n",
400
+ " <th></th>\n",
401
+ " <th>target</th>\n",
402
+ " <th>text</th>\n",
403
+ " </tr>\n",
404
+ " </thead>\n",
405
+ " <tbody>\n",
406
+ " <tr>\n",
407
+ " <th>0</th>\n",
408
+ " <td>0</td>\n",
409
+ " <td>Go until jurong point, crazy.. Available only ...</td>\n",
410
+ " </tr>\n",
411
+ " <tr>\n",
412
+ " <th>1</th>\n",
413
+ " <td>0</td>\n",
414
+ " <td>Ok lar... Joking wif u oni...</td>\n",
415
+ " </tr>\n",
416
+ " <tr>\n",
417
+ " <th>2</th>\n",
418
+ " <td>1</td>\n",
419
+ " <td>Free entry in 2 a wkly comp to win FA Cup fina...</td>\n",
420
+ " </tr>\n",
421
+ " <tr>\n",
422
+ " <th>3</th>\n",
423
+ " <td>0</td>\n",
424
+ " <td>U dun say so early hor... U c already then say...</td>\n",
425
+ " </tr>\n",
426
+ " <tr>\n",
427
+ " <th>4</th>\n",
428
+ " <td>0</td>\n",
429
+ " <td>Nah I don't think he goes to usf, he lives aro...</td>\n",
430
+ " </tr>\n",
431
+ " </tbody>\n",
432
+ "</table>\n",
433
+ "</div>"
434
+ ],
435
+ "text/plain": [
436
+ " target text\n",
437
+ "0 0 Go until jurong point, crazy.. Available only ...\n",
438
+ "1 0 Ok lar... Joking wif u oni...\n",
439
+ "2 1 Free entry in 2 a wkly comp to win FA Cup fina...\n",
440
+ "3 0 U dun say so early hor... U c already then say...\n",
441
+ "4 0 Nah I don't think he goes to usf, he lives aro..."
442
+ ]
443
+ },
444
+ "execution_count": 18,
445
+ "metadata": {},
446
+ "output_type": "execute_result"
447
+ }
448
+ ],
449
+ "source": [
450
+ "df.head()"
451
+ ]
452
+ },
453
+ {
454
+ "cell_type": "code",
455
+ "execution_count": null,
456
+ "metadata": {},
457
+ "outputs": [
458
+ {
459
+ "data": {
460
+ "text/plain": [
461
+ "target 0\n",
462
+ "text 0\n",
463
+ "dtype: int64"
464
+ ]
465
+ },
466
+ "execution_count": 19,
467
+ "metadata": {},
468
+ "output_type": "execute_result"
469
+ }
470
+ ],
471
+ "source": [
472
+ "#Missing values\n",
473
+ "df.isnull().sum() #Use to check missing values"
474
+ ]
475
+ },
476
+ {
477
+ "cell_type": "code",
478
+ "execution_count": 23,
479
+ "metadata": {},
480
+ "outputs": [],
481
+ "source": [
482
+ "#Check for duplicate values.\n",
483
+ "df.duplicated().sum()\n",
484
+ "\n",
485
+ "#Remove duplicates\n",
486
+ "df = df.drop_duplicates(keep = 'first')"
487
+ ]
488
+ },
489
+ {
490
+ "cell_type": "code",
491
+ "execution_count": 24,
492
+ "metadata": {},
493
+ "outputs": [
494
+ {
495
+ "data": {
496
+ "text/plain": [
497
+ "0"
498
+ ]
499
+ },
500
+ "execution_count": 24,
501
+ "metadata": {},
502
+ "output_type": "execute_result"
503
+ }
504
+ ],
505
+ "source": [
506
+ "df.duplicated().sum()"
507
+ ]
508
+ },
509
+ {
510
+ "cell_type": "code",
511
+ "execution_count": 26,
512
+ "metadata": {},
513
+ "outputs": [
514
+ {
515
+ "data": {
516
+ "text/plain": [
517
+ "(5169, 2)"
518
+ ]
519
+ },
520
+ "execution_count": 26,
521
+ "metadata": {},
522
+ "output_type": "execute_result"
523
+ }
524
+ ],
525
+ "source": [
526
+ "df.shape"
527
+ ]
528
+ },
529
+ {
530
+ "cell_type": "markdown",
531
+ "metadata": {},
532
+ "source": [
533
+ "**2. EDA** (Exploratory Data Analysis)"
534
+ ]
535
+ },
536
+ {
537
+ "cell_type": "code",
538
+ "execution_count": 27,
539
+ "metadata": {},
540
+ "outputs": [
541
+ {
542
+ "data": {
543
+ "text/html": [
544
+ "<div>\n",
545
+ "<style scoped>\n",
546
+ " .dataframe tbody tr th:only-of-type {\n",
547
+ " vertical-align: middle;\n",
548
+ " }\n",
549
+ "\n",
550
+ " .dataframe tbody tr th {\n",
551
+ " vertical-align: top;\n",
552
+ " }\n",
553
+ "\n",
554
+ " .dataframe thead th {\n",
555
+ " text-align: right;\n",
556
+ " }\n",
557
+ "</style>\n",
558
+ "<table border=\"1\" class=\"dataframe\">\n",
559
+ " <thead>\n",
560
+ " <tr style=\"text-align: right;\">\n",
561
+ " <th></th>\n",
562
+ " <th>target</th>\n",
563
+ " <th>text</th>\n",
564
+ " </tr>\n",
565
+ " </thead>\n",
566
+ " <tbody>\n",
567
+ " <tr>\n",
568
+ " <th>0</th>\n",
569
+ " <td>0</td>\n",
570
+ " <td>Go until jurong point, crazy.. Available only ...</td>\n",
571
+ " </tr>\n",
572
+ " <tr>\n",
573
+ " <th>1</th>\n",
574
+ " <td>0</td>\n",
575
+ " <td>Ok lar... Joking wif u oni...</td>\n",
576
+ " </tr>\n",
577
+ " <tr>\n",
578
+ " <th>2</th>\n",
579
+ " <td>1</td>\n",
580
+ " <td>Free entry in 2 a wkly comp to win FA Cup fina...</td>\n",
581
+ " </tr>\n",
582
+ " <tr>\n",
583
+ " <th>3</th>\n",
584
+ " <td>0</td>\n",
585
+ " <td>U dun say so early hor... U c already then say...</td>\n",
586
+ " </tr>\n",
587
+ " <tr>\n",
588
+ " <th>4</th>\n",
589
+ " <td>0</td>\n",
590
+ " <td>Nah I don't think he goes to usf, he lives aro...</td>\n",
591
+ " </tr>\n",
592
+ " </tbody>\n",
593
+ "</table>\n",
594
+ "</div>"
595
+ ],
596
+ "text/plain": [
597
+ " target text\n",
598
+ "0 0 Go until jurong point, crazy.. Available only ...\n",
599
+ "1 0 Ok lar... Joking wif u oni...\n",
600
+ "2 1 Free entry in 2 a wkly comp to win FA Cup fina...\n",
601
+ "3 0 U dun say so early hor... U c already then say...\n",
602
+ "4 0 Nah I don't think he goes to usf, he lives aro..."
603
+ ]
604
+ },
605
+ "execution_count": 27,
606
+ "metadata": {},
607
+ "output_type": "execute_result"
608
+ }
609
+ ],
610
+ "source": [
611
+ "#How many messages are spam and ham?\n",
612
+ "\n",
613
+ "df.head()"
614
+ ]
615
+ },
616
+ {
617
+ "cell_type": "code",
618
+ "execution_count": 29,
619
+ "metadata": {},
620
+ "outputs": [
621
+ {
622
+ "data": {
623
+ "text/plain": [
624
+ "target\n",
625
+ "0 4516\n",
626
+ "1 653\n",
627
+ "Name: count, dtype: int64"
628
+ ]
629
+ },
630
+ "execution_count": 29,
631
+ "metadata": {},
632
+ "output_type": "execute_result"
633
+ }
634
+ ],
635
+ "source": [
636
+ "df['target'].value_counts()"
637
+ ]
638
+ },
639
+ {
640
+ "cell_type": "code",
641
+ "execution_count": null,
642
+ "metadata": {},
643
+ "outputs": [
644
+ {
645
+ "ename": "AttributeError",
646
+ "evalue": "module 'matplotlib' has no attribute 'pie'",
647
+ "output_type": "error",
648
+ "traceback": [
649
+ "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
650
+ "\u001b[1;31mAttributeError\u001b[0m Traceback (most recent call last)",
651
+ "Cell \u001b[1;32mIn[34], line 2\u001b[0m\n\u001b[0;32m 1\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01mmatplotlib\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m \u001b[38;5;21;01mplt\u001b[39;00m\n\u001b[1;32m----> 2\u001b[0m \u001b[43mplt\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mpie\u001b[49m(df[\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mtarget\u001b[39m\u001b[38;5;124m'\u001b[39m]\u001b[38;5;241m.\u001b[39mvalue_counts(), labels\u001b[38;5;241m=\u001b[39m[\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mham\u001b[39m\u001b[38;5;124m'\u001b[39m,\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mspam\u001b[39m\u001b[38;5;124m'\u001b[39m],autopct \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;132;01m%0.2f\u001b[39;00m\u001b[38;5;124m\"\u001b[39m)\n",
652
+ "File \u001b[1;32mc:\\Users\\Muhammad Arham\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\matplotlib\\_api\\__init__.py:217\u001b[0m, in \u001b[0;36mcaching_module_getattr.<locals>.__getattr__\u001b[1;34m(name)\u001b[0m\n\u001b[0;32m 215\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m name \u001b[38;5;129;01min\u001b[39;00m props:\n\u001b[0;32m 216\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m props[name]\u001b[38;5;241m.\u001b[39m\u001b[38;5;21m__get__\u001b[39m(instance)\n\u001b[1;32m--> 217\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mAttributeError\u001b[39;00m(\n\u001b[0;32m 218\u001b[0m \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mmodule \u001b[39m\u001b[38;5;132;01m{\u001b[39;00m\u001b[38;5;28mcls\u001b[39m\u001b[38;5;241m.\u001b[39m\u001b[38;5;18m__module__\u001b[39m\u001b[38;5;132;01m!r}\u001b[39;00m\u001b[38;5;124m has no attribute \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mname\u001b[38;5;132;01m!r}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m)\n",
653
+ "\u001b[1;31mAttributeError\u001b[0m: module 'matplotlib' has no attribute 'pie'"
654
+ ]
655
+ }
656
+ ],
657
+ "source": [
658
+ "import matplotlib as plt\n",
659
+ "plt.pie(df['target'].value_counts(), labels=['ham','spam'],autopct =\"%0.2f\")"
660
+ ]
661
+ }
662
+ ],
663
+ "metadata": {
664
+ "kernelspec": {
665
+ "display_name": "Python 3",
666
+ "language": "python",
667
+ "name": "python3"
668
+ },
669
+ "language_info": {
670
+ "codemirror_mode": {
671
+ "name": "ipython",
672
+ "version": 3
673
+ },
674
+ "file_extension": ".py",
675
+ "mimetype": "text/x-python",
676
+ "name": "python",
677
+ "nbconvert_exporter": "python",
678
+ "pygments_lexer": "ipython3",
679
+ "version": "3.11.4"
680
+ }
681
+ },
682
+ "nbformat": 4,
683
+ "nbformat_minor": 2
684
+ }
vectorizer.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e472dd91abcfa77fd9d449e841ff7f1867da101eb05fbd0b113f2c378ba5d495
3
+ size 95008