Spaces:
Sleeping
Sleeping
Upload 110 files
Browse filesThis view is limited to 50 files because it contains too many changes.
See raw diff
- .gitattributes +33 -0
- Data Analitics/FAQ Data Analytics.txt +215 -0
- Data Analitics/Week 0/L0-Module-Admin.pdf +3 -0
- Data Analitics/Week 1/Additional Reading and Links Week 1.txt +25 -0
- Data Analitics/Week 1/L1-Introduction.pdf +3 -0
- Data Analitics/Week 1/Lab1-Introduction.pdf +3 -0
- Data Analitics/Week 1/TU257_Lab1-Introduction-Solution.ipynb +337 -0
- Data Analitics/Week 1/TU257_Lab1-Introduction.ipynb +235 -0
- Data Analitics/Week 10/L10-Clustering-Data.pdf +3 -0
- Data Analitics/Week 10/Lab10-Clustering-Data.pdf +3 -0
- Data Analitics/Week 10/TU257-Lab10-1-Clustering-Demo.ipynb +0 -0
- Data Analitics/Week 10/TU257-Lab10-2-Clustering-DBScan-Demo.ipynb +0 -0
- Data Analitics/Week 10/Week 10 Additional Reading and links.txt +23 -0
- Data Analitics/Week 11/L11-Text-Mining.pdf +3 -0
- Data Analitics/Week 11/Lab11-Text-Mining (1).pdf +3 -0
- Data Analitics/Week 11/Lab11-Text-Mining.pdf +3 -0
- Data Analitics/Week 11/TU257-Lab11-1-Demo.ipynb +0 -0
- Data Analitics/Week 11/TU257-Lab11-2-Text-ML-Predictions.ipynb +0 -0
- Data Analitics/Week 11/Week 11 Additional reading and links.txt +22 -0
- Data Analitics/Week 11/review_polarity.tar.gz +3 -0
- Data Analitics/Week 2/Week 2 Material Complementar.txt +29 -0
- Data Analitics/Week 3/### Week 3 Additional Reading.txt +8 -0
- Data Analitics/Week 3/.ipynb_checkpoints/Lab2-1-checkpoint.ipynb +149 -0
- Data Analitics/Week 3/.ipynb_checkpoints/TU257-Lab2-1-Automated-Data-Profiling-checkpoint.ipynb +0 -0
- Data Analitics/Week 3/.ipynb_checkpoints/TU257-Lab2-2-Data-Exploration-checkpoint.ipynb +1024 -0
- Data Analitics/Week 3/.ipynb_checkpoints/Video_Games_Sales_as_at_22_Dec_2016-checkpoint.csv +0 -0
- Data Analitics/Week 3/.ipynb_checkpoints/train-checkpoint.csv +892 -0
- Data Analitics/Week 3/1230-Article Text-1227-1-10-20080129.pdf +3 -0
- Data Analitics/Week 3/L2-Data-Science-Life-Cycle.pdf +3 -0
- Data Analitics/Week 3/Lab1-Introduction.pdf +3 -0
- Data Analitics/Week 3/Lab2-1.ipynb +0 -0
- Data Analitics/Week 3/Lab2-Data-Understanding.pdf +3 -0
- Data Analitics/Week 3/TU257-Lab2-2-Data-Exploration.ipynb +1083 -0
- Data Analitics/Week 3/TU257_Lab1-Introduction.ipynb +235 -0
- Data Analitics/Week 3/Video_Games_Sales_as_at_22_Dec_2016.csv +0 -0
- Data Analitics/Week 3/train.csv +892 -0
- Data Analitics/Week 4/.ipynb_checkpoints/TU257-Lab3-1-DataExploration-checkpoint.ipynb +0 -0
- Data Analitics/Week 4/.ipynb_checkpoints/TU257-Lab3-2-Data-Transformations-checkpoint.ipynb +0 -0
- Data Analitics/Week 4/.ipynb_checkpoints/TU257-Lab3-3-Scaling-Data-checkpoint.ipynb +1996 -0
- Data Analitics/Week 4/.ipynb_checkpoints/TU257-Lab3-4-Correlation-checkpoint.ipynb +0 -0
- Data Analitics/Week 4/.ipynb_checkpoints/TU257-Lab3-5-Sampling-and-Unbalanced-checkpoint.ipynb +538 -0
- Data Analitics/Week 4/L3-Data-Exploration-Preparation.pptx +3 -0
- Data Analitics/Week 4/Lab3-Data-Preparation.pptx +0 -0
- Data Analitics/Week 4/TU257-Lab3-1-DataExploration.ipynb +0 -0
- Data Analitics/Week 4/TU257-Lab3-2-Data-Transformations.ipynb +0 -0
- Data Analitics/Week 4/TU257-Lab3-3-Scaling-Data.ipynb +1996 -0
- Data Analitics/Week 4/TU257-Lab3-4-Correlation.ipynb +0 -0
- Data Analitics/Week 4/TU257-Lab3-5-Sampling-and-Unbalanced.ipynb +538 -0
- Data Analitics/Week 4/Video_Games_Sales_as_at_22_Dec_2016.csv +0 -0
- Data Analitics/Week 4/Week 4 Complementary Material.txt +23 -0
.gitattributes
CHANGED
@@ -33,3 +33,36 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
36 |
+
Data[[:space:]]Analitics/assessment/.ipynb_checkpoints/Assignment-A-checkpoint.pdf filter=lfs diff=lfs merge=lfs -text
|
37 |
+
Data[[:space:]]Analitics/assessment/Assignment-A.pdf filter=lfs diff=lfs merge=lfs -text
|
38 |
+
Data[[:space:]]Analitics/assessment/Bank-Research-Paper[[:space:]](1).pdf filter=lfs diff=lfs merge=lfs -text
|
39 |
+
Data[[:space:]]Analitics/Week[[:space:]]0/L0-Module-Admin.pdf filter=lfs diff=lfs merge=lfs -text
|
40 |
+
Data[[:space:]]Analitics/Week[[:space:]]1/L1-Introduction.pdf filter=lfs diff=lfs merge=lfs -text
|
41 |
+
Data[[:space:]]Analitics/Week[[:space:]]1/Lab1-Introduction.pdf filter=lfs diff=lfs merge=lfs -text
|
42 |
+
Data[[:space:]]Analitics/Week[[:space:]]10/L10-Clustering-Data.pdf filter=lfs diff=lfs merge=lfs -text
|
43 |
+
Data[[:space:]]Analitics/Week[[:space:]]10/Lab10-Clustering-Data.pdf filter=lfs diff=lfs merge=lfs -text
|
44 |
+
Data[[:space:]]Analitics/Week[[:space:]]11/L11-Text-Mining.pdf filter=lfs diff=lfs merge=lfs -text
|
45 |
+
Data[[:space:]]Analitics/Week[[:space:]]11/Lab11-Text-Mining[[:space:]](1).pdf filter=lfs diff=lfs merge=lfs -text
|
46 |
+
Data[[:space:]]Analitics/Week[[:space:]]11/Lab11-Text-Mining.pdf filter=lfs diff=lfs merge=lfs -text
|
47 |
+
Data[[:space:]]Analitics/Week[[:space:]]3/1230-Article[[:space:]]Text-1227-1-10-20080129.pdf filter=lfs diff=lfs merge=lfs -text
|
48 |
+
Data[[:space:]]Analitics/Week[[:space:]]3/L2-Data-Science-Life-Cycle.pdf filter=lfs diff=lfs merge=lfs -text
|
49 |
+
Data[[:space:]]Analitics/Week[[:space:]]3/Lab1-Introduction.pdf filter=lfs diff=lfs merge=lfs -text
|
50 |
+
Data[[:space:]]Analitics/Week[[:space:]]3/Lab2-Data-Understanding.pdf filter=lfs diff=lfs merge=lfs -text
|
51 |
+
Data[[:space:]]Analitics/Week[[:space:]]4/L3-Data-Exploration-Preparation.pptx filter=lfs diff=lfs merge=lfs -text
|
52 |
+
Data[[:space:]]Analitics/Week[[:space:]]5/1-Assessment-Advice.pdf filter=lfs diff=lfs merge=lfs -text
|
53 |
+
Data[[:space:]]Analitics/Week[[:space:]]5/Assessment-A-Guidance.pdf filter=lfs diff=lfs merge=lfs -text
|
54 |
+
Data[[:space:]]Analitics/Week[[:space:]]5/decistion_tree.png filter=lfs diff=lfs merge=lfs -text
|
55 |
+
Data[[:space:]]Analitics/Week[[:space:]]5/L5-Classification-Part-1.pdf filter=lfs diff=lfs merge=lfs -text
|
56 |
+
Data[[:space:]]Analitics/Week[[:space:]]5/Lab5-Classification-Part-1.pdf filter=lfs diff=lfs merge=lfs -text
|
57 |
+
Data[[:space:]]Analitics/Week[[:space:]]6/1-Assessment-Advice.pdf filter=lfs diff=lfs merge=lfs -text
|
58 |
+
Data[[:space:]]Analitics/Week[[:space:]]6/Assessment-A-Guidance.pdf filter=lfs diff=lfs merge=lfs -text
|
59 |
+
Data[[:space:]]Analitics/Week[[:space:]]6/L6-Classification-Part-2.pdf filter=lfs diff=lfs merge=lfs -text
|
60 |
+
Data[[:space:]]Analitics/Week[[:space:]]6/Lab6-Classification-Part-2.pdf filter=lfs diff=lfs merge=lfs -text
|
61 |
+
Data[[:space:]]Analitics/Week[[:space:]]7/.ipynb_checkpoints/Assessment-A-Guidance-checkpoint.pdf filter=lfs diff=lfs merge=lfs -text
|
62 |
+
Data[[:space:]]Analitics/Week[[:space:]]7/Assessment-A-Guidance.pdf filter=lfs diff=lfs merge=lfs -text
|
63 |
+
Data[[:space:]]Analitics/Week[[:space:]]7/L7-Tuning-AutoML.pdf filter=lfs diff=lfs merge=lfs -text
|
64 |
+
Data[[:space:]]Analitics/Week[[:space:]]7/Lab7-Tuning-AutoML.pdf filter=lfs diff=lfs merge=lfs -text
|
65 |
+
Data[[:space:]]Analitics/Week[[:space:]]9/L9-Association-Rules.pdf filter=lfs diff=lfs merge=lfs -text
|
66 |
+
Data[[:space:]]Analitics/Week[[:space:]]9/Lab9-Association-Rules[[:space:]](1).pdf filter=lfs diff=lfs merge=lfs -text
|
67 |
+
Data[[:space:]]Analitics/Week[[:space:]]9/Lab9-Association-Rules.pdf filter=lfs diff=lfs merge=lfs -text
|
68 |
+
Data[[:space:]]Analitics/Week[[:space:]]9/Online-Retail.csv filter=lfs diff=lfs merge=lfs -text
|
Data Analitics/FAQ Data Analytics.txt
ADDED
@@ -0,0 +1,215 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
FAQ for TU257 Data Analytics
|
2 |
+
: My chart This page will contain Questions I've received about the module, the topics, labs, and assignments.
|
3 |
+
|
4 |
+
It can be challenging to ask questions, in class or after class etc. This FAQ attempts to address these many challenges and allows for the sharing of knowledge.
|
5 |
+
|
6 |
+
Students can contact me with their questions. I will attempt to respond promptly with answers. To assist with knowledge sharing with the whole class, I will post the questions I receive on this page, along with the answers. [The names of the student asking the question will not be listed]
|
7 |
+
|
8 |
+
IMPORTANT: When I say, I will attempt to respond promptly, it means I will endeavor to respond within a day or two of getting the question. If I don't respond as quickly as you would like, just remember I have other classes, task and roles to perform each day. If I haven't responded with three days, then Yes Please get onto me and Gently remind me :-)
|
9 |
+
|
10 |
+
IMPORTANT: There will be a delay to any questions asked during weekends, holidays and vacation periods. Questions will be answered after such periods.
|
11 |
+
|
12 |
+
If you see a week with No questions, that means no one asked me a question.
|
13 |
+
|
14 |
+
IMPORTANT: It is important that students check this page regularly for new Q&A. There will be no notices posted to the class group when new Q&A are added.
|
15 |
+
|
16 |
+
Week 0 - Course Admin
|
17 |
+
|
18 |
+
Q: Is there an exam?
|
19 |
+
|
20 |
+
A: Nope, no exam. There are two assessments. The combination of marks from these makes up your final mark for the module. See the Module Introduction & Admin for the breakdown of these marks.
|
21 |
+
|
22 |
+
Q: How quickly will we get feedback on the assessment?
|
23 |
+
|
24 |
+
A: Typically within 2-3 working weeks. Depending on dates/timing, etc it might be a little longer. Feedback consists of a short paragraph on your assessment highlighting good things and things that needed some additional work
|
25 |
+
|
26 |
+
Week 1 - Introduction
|
27 |
+
|
28 |
+
Q: Will we be coding or doing lab work during the Week 1
|
29 |
+
|
30 |
+
A: There will be a bit of an overlap with your other module. Similar tasks to complete with getting your environment setup and installing software. These tasks aren't very complex, but some people might have minor issues. It's important to get these resolved asap.
|
31 |
+
|
32 |
+
Week 2 - Bank Holiday
|
33 |
+
|
34 |
+
Q: Do we have a class or work to do this week?
|
35 |
+
|
36 |
+
A: There is NO class and technically NO work you need to complete. If you'd like to do some learning, I've links on a webpage with some tutorials with using Pandas to process data. Check out the link to the webpage. NB. You don't have to do this.
|
37 |
+
|
38 |
+
Week 3
|
39 |
+
|
40 |
+
Q: I cannot get the library to install in Anaconda. Is there another way to do this
|
41 |
+
|
42 |
+
A: Yes, run the following command on the command-line, it will do the install, and the library will then appear in your Anaconda environment
|
43 |
+
|
44 |
+
conda install -c conda-forge ydata-profiling
|
45 |
+
|
46 |
+
Q: I get an error when I change the directory to the location of the data file
|
47 |
+
|
48 |
+
A: This means you have typed the full directory path to the file incorrectly. Double check the full path and make sure the path in the notebook matches. If you get an error message it means there is a typo in the path. Keep checking until it works.
|
49 |
+
|
50 |
+
Q: In the directory path should I use / or \ in the directory path?
|
51 |
+
|
52 |
+
A: It shouldn't matter if you use / in the path. But if you get an error message (and you are using a Windows machine) just change it to \. If you are using a Mac machine you can use /
|
53 |
+
|
54 |
+
Week 4
|
55 |
+
|
56 |
+
Q: My chart/graph doesn't display. All I get is an empty cell. What is wrong and how do I fix it?
|
57 |
+
|
58 |
+
A: There is nothing wrong. Go to the cell containing the code to display the chart/graph and run the cell for a second time. It should now display it.
|
59 |
+
|
60 |
+
Week 5
|
61 |
+
|
62 |
+
Week 6
|
63 |
+
|
64 |
+
Assignment A
|
65 |
+
|
66 |
+
Q: I'm trying to load the data set into a pandas dataframe, but it isn't loading the header correctly. Can you suggest what I need to do to make this work correctly for me?
|
67 |
+
A: The read_csv function assumes the file is in CSV format. By default, the column separator for CSV is a comma, but any character can be used as a separator. See the documentation for this function https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html
|
68 |
+
|
69 |
+
For example, if the data set has a semicolon separator, you can use the following.
|
70 |
+
|
71 |
+
# import dataset
|
72 |
+
|
73 |
+
df = pd.read_csv('.........', sep=';')
|
74 |
+
|
75 |
+
Q: Is there a sample report/solution you can give us, for example, a sample from last year?
|
76 |
+
|
77 |
+
A: The problem with providing sample solutions for assignments is, I'll just get 20+ reports which are exactly the same as the sample. Effectively it will be a copy and paste from the sample. There is no learning experience with doing this and for this reason NO sample solutions will be provided. I have given you a template notebook you can use. This contains a structured layout of sections you will typically have to work through. These sections also correspond to the marking scheme. Each student/team is responsible for completing these sections with code and clear documentation covering the work completed and discussions of your work and findings. This makes every report/notebook/assignment submission individual to each student/team.
|
78 |
+
|
79 |
+
Q: Is there a marking rubric?
|
80 |
+
|
81 |
+
A: The marking scheme is provided on the assignment handout. Although it isn't in a rubric format, it does give you the available marks for each section. Marks will be awarded on a range from Zero to the maximum mark for a section, and this range of marks will be based on the depth of detail, analysis, and discussions completed for each section. For example, if you write a few lines of code and only give a short general comment for these, then you will get the minimum of marks. Similarly, if you just reuse the code from the Demo Notebooks, I've supplied and provided no additional comments/discussions/analysis, then you will be looking at potential marks in the lower ranges, possibly getting zero marks. On the other hand, if you write lots of code, the code is well documented, and you provide in-depth analysis of the results and perform additional analysis, then you might be looking at a mark towards the upper range of marks.
|
82 |
+
|
83 |
+
Q: Do we need to write a separate report, in addition to the notebook?
|
84 |
+
|
85 |
+
A: There is no need for a separate report document. Everything should be in the notebook and it is this notebook you should submit.
|
86 |
+
|
87 |
+
Q: Does each person need to submit an assignment on BrightSpace?
|
88 |
+
|
89 |
+
A: Only one submission from a team should be submitted on BrightSpace. The name, student number, etc must be clearly given at the top of each notebook. Feedback will consist of a short paragraph and a mark. This will be entered into BrightSpace, and it is the responsibility of the person receiving this, to share it with other students in the team. The short feedback paragraph will focus on areas in your submission that needed more work (i.e. areas where you lost marks). You can learn from this, and try to address these for Assignment-2.
|
90 |
+
|
91 |
+
Q: I'm a little confused if I need to use Up Sampling or Under Sampling. Can you help guide us on this?
|
92 |
+
|
93 |
+
A: Up/Down Sampling allows you to create a modified data set. Most of the classification algorithms like to see a similar number of cases/records for each value in the Target variable. In some instances, they like to see a 50:50 split (for binary classification). In most data sets and work related problems for classification, one of the values in the Target variable is be a small percentage of the overall data set. To over come this small percentage, Up Sampling can be used to increase the number of cases/records for this particular variable/feature. This will typically be used when dealing with small data sets, like what we have for the Assignment. When you have LARGE data sets (Big Data), consisting of several millions or tens of millions of cases/records, you might want to use Down Sampling to bring the data set down to a manageable size to work with and for the algorithms to work, without running out of computer memory (RAM).
|
94 |
+
|
95 |
+
Q: Is the assignment based on what is covered in weeks 5+6?
|
96 |
+
|
97 |
+
A: The assessment work is based on everything we have covered up to and including week 7.
|
98 |
+
|
99 |
+
Q: Should we spend any time understanding the domain knowledge of the dataset?
|
100 |
+
|
101 |
+
A: You don't have to spend a lot of time on this. Every probably has some knowledge of the domains for the two problem data sets. Use this knowledge and try to think about if you were an employee and what you might say or tell your manager about the work you did.
|
102 |
+
|
103 |
+
Q: If columns are correlated should we remove them?
|
104 |
+
|
105 |
+
A: Have a look back over the notes/demo relating to correlation analysis. It's also vital to have a look at the meta-data or the descriptive information that's provided with the data set. This will give you some hints about certain attributes/features in the data set.
|
106 |
+
|
107 |
+
Q: I ran Naive Bayes and Decision Tree on the data (Supermarket case study) and got 100% accuracy on both. Is that beginner's luck? totally random?
|
108 |
+
|
109 |
+
A: Yes that's a bit of beginner's luck or perhaps bad luck. It is commonly referred to as having an 'overfitted model'. I've mentioned this several times in class. Have a look at the previous question (above) about correlated data and have a look at the notes and examples that cover this. Hopefully, this is a good hint towards what you might have to do. Also, check out the documentation for the data set and any related articles.
|
110 |
+
|
111 |
+
Q: Is it ok to pick 2-3 algorithms to build a model?
|
112 |
+
|
113 |
+
A: Remember the quote from George Box, "all models are wrong but some are more useful than others". Something to consider is what algorithms you can test and evaluate, and if they are sufficient.
|
114 |
+
|
115 |
+
Q: For problem set 2, should we perform any data manipulations based on the date the data was extracted or the current date?
|
116 |
+
|
117 |
+
A: It should be based on the date the data was extracted. You need to use this to correctly calculate any additional information you think is necessary. If you used the current date (e.g. today date) then all calculations would be incorrect. Additionally, all calculations would give a different outcome depending on whether you run those calculations today, tomorrow, or the day after, etc. FYI, the date of extraction is 23rd February, 1998.
|
118 |
+
|
119 |
+
Assessment - General Questions & Comments
|
120 |
+
|
121 |
+
Q: What feedback will we get?
|
122 |
+
|
123 |
+
A: For a group assessment, one person should submit the assignment on BrightSpace. Feedback will consist of a short paragraph and a mark. This will be entered into BrightSpace for the assignment submission. The student receiving it should share it with the other members of the team. For individual assessments, feedback will be similar with a paragraph and mark.
|
124 |
+
|
125 |
+
Q: How much detail should we include in the assignment for each topic?
|
126 |
+
|
127 |
+
A: It's important to remember when completing this assignment you explain your work, why you are doing a task, what the outcomes are, what they mean, how it feeds into the next step/cell etc. Document all of this as code comments and MarkDown. This helps the reader to see and understand what you are doing and (most importantly) why you are doing it, and (even more importantly) you can explain the outcomes and what they mean. This will be useful for potential employers who might ask to see examples of your work. The more detail you include beyond the copy & paste of the code examples I've given, the better it will be for you and the more marks I can award. Copy&Paste of what I've given will not gain many marks.
|
128 |
+
|
129 |
+
Q: Can I get regular feedback on my assessment work, during the weeks leading up to the Due Date?
|
130 |
+
|
131 |
+
A: All the materials necessary to complete the assessment have been covered. Your task is to apply these to the specified problem/use case. This is standard for assessments. If you have a specific question about the problem/user case, this will be addressed and shared with the class and on this FAQ page. Questions such as, is this code correct, or is this the correct answer, what is missing, etc will not be answered.
|
132 |
+
|
133 |
+
Q: Can I get a higher mark/grade?
|
134 |
+
|
135 |
+
A: The final mark/grade for the assessment has been carefully considered and will be in keeping with the marking and standards of previous students on this module and with other similar modules. All marks are classed as provisional and subject to change as per TU Dublin General Assessment Regulations. All assessments and marks/grades are reviewed by an External Examiner to ensure the standard of marking and feedback.
|
136 |
+
|
137 |
+
Check the TU Dublin General Assessment Regulations.
|
138 |
+
|
139 |
+
Some students expect to receive a very high mark for their work, typically in the 80%-100% range. This is an incorrect assumption. Marks are awarded on a 0%-100% range. Only a small percentage of students will attain a mark of >70%. Most students will achieve a mark between 50%-69%
|
140 |
+
|
141 |
+
|
142 |
+
Q: I'm not happy with the mark/grade I've received. What can I do?
|
143 |
+
|
144 |
+
A: Check the TU Dublin General Assessment Regulations for details on what you can do regarding this. The procedure for appealing your grade is outlined along with the fees for requesting this.
|
145 |
+
|
146 |
+
Q: I attempted a section of the assessment and I didn't get full marks, Why?
|
147 |
+
|
148 |
+
A: Marks are awarded on a sliding scale and are based on several factors such as completeness, correctness, depth of detail, explanations, etc. If you didn't get full marks then some details were probably missing.
|
149 |
+
|
150 |
+
Week 7
|
151 |
+
|
152 |
+
Q: When I first run TPOT with the default hyperparameters setting, all 5 generations gave the same CV score (see cells 51 & 52 below). When I changed some of the hyperparameters' in the TPOT algorithm it appeared to execute without any issue but came back with the following message: "Pipeline encountered that has previously been evaluated during the optimization process. Using the score from the previous evaluation."
|
153 |
+
|
154 |
+
A: Check out this post on the TPOP Github repository, which contains details about this message, and what can be done https://github.com/EpistasisLab/tpot/issues/927
|
155 |
+
|
156 |
+
Q: AutoML has inbuilt steps to perform Data Select and Data Transformations. Does this mean I don't have to do any of these steps manually?
|
157 |
+
|
158 |
+
A: Data cleaning and preparation still need to be performed before using the dataset for AutoML. With AutoML they perform some additional Feature Engineering and Feature Selection. For example, with Feature Selection they will run a variety of statistical tests to select which features to include in each iteration. This typically involved selecting a random subset of the features to include in the model.
|
159 |
+
|
160 |
+
Week 8
|
161 |
+
|
162 |
+
|
163 |
+
Week 9
|
164 |
+
|
165 |
+
Q: I cannot install or find mlxtends library in Anaconda. How can I install it?
|
166 |
+
|
167 |
+
A: If you go to the Anaconda webpage for this library you will find a command for you to run in a Terminal/Command Line window. You had to do similar when installing tpot library for AutoML
|
168 |
+
|
169 |
+
conda install -c conda-forge mlxtend
|
170 |
+
|
171 |
+
You might need to close Anaconda and reopen (or refresh the environment), it should then appear in your list of installed libraries. If you had Jupyter Notebook open, you will need to restart the Kernal (see menu option)
|
172 |
+
|
173 |
+
Assignment B (See section above on Assessment - General Questions & Coments)
|
174 |
+
|
175 |
+
Q: How many references would be enough for each notebook?
|
176 |
+
|
177 |
+
A: The answer to everything in IT is "it depends". It depends on the topic you are covering and how much detail is included. This can be a useful section to illustrate to employers your reading on the topic and your understanding of the topic. I'd suggest having a minimum of three references. One of these can be the reference to the data set you are using. But consider adding 4, 5, 6 references, up to a maximum of 8.
|
178 |
+
|
179 |
+
Q: For Assignment B, is it OK to use the dataset for Assignment A that we didn't work on?
|
180 |
+
|
181 |
+
A: The assignment handout, see section Important, mentions you "should not include any of the datasets and examples used throughout the module". This will include the data set not used in Assignment A.
|
182 |
+
|
183 |
+
The only exception to this is for AutoML.
|
184 |
+
|
185 |
+
Q: How many different data sets should we use for Assessment B?
|
186 |
+
|
187 |
+
A: You need to use two different data sets. Given the nature of the topics, you'll need to use two different datasets.
|
188 |
+
|
189 |
+
Q: How can we share the datasets we used in the assignment with you?
|
190 |
+
|
191 |
+
A: There are a number of ways to do this. If you downloaded the data set from the internet using python code (using pandas or otherwise) then just leave that code in your notebook submission. If you encounter/discover there are some download limits or rate limits or the download
|
192 |
+
can be temperamental you can give a Dropbox or Google Drive link. An alternative is to include a ZIP file with your submission in Brightspace
|
193 |
+
|
194 |
+
Week 10
|
195 |
+
|
196 |
+
Q: I've seen examples using PCA (and other approaches) for Clustering. Should we use this (for our assignment)?
|
197 |
+
|
198 |
+
A: PCA or Principal Component Analysis, is a dimensional reduction method. It is complex to use and understand what it produces and because of this PCA and other similar approaches/algorithms are outside the scope of this module. If you decide to continue your studies to higher levels you will encounter PCA etc in other modules such as Machine Learning where the various challenges of using it will be explored.
|
199 |
+
|
200 |
+
For Assessment-B, you don't have to use PCA, and using it may limit your ability to explain your work and discoveries in the data.
|
201 |
+
|
202 |
+
Week 11
|
203 |
+
|
204 |
+
Week 12
|
205 |
+
|
206 |
+
Q: Is there a class/topic for this week?
|
207 |
+
|
208 |
+
A: There is a topic for this week but most people will be busy finishing Assessment-B and maybe you are still working on your final assessment in your other module. It will be a busy week. I've recorded videos of the lecture and you can view these at any time, along with the additional readings. The topic is not assessed. Instead of covering the topic live during our scheduled class time, we will instead have a Q&A session for Assessment-B. I think most people found the same session for Assessment-A useful earlier in the semester. We'll do similar this week
|
209 |
+
|
210 |
+
Week 13
|
211 |
+
|
212 |
+
Q: Is there a class/topic for this week?
|
213 |
+
|
214 |
+
A: There is no Class this week as people will be busy completing and submitting Assessment-B.
|
215 |
+
|
Data Analitics/Week 0/L0-Module-Admin.pdf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:9e9b39ea967801255fb55ac119de3fa7fb1b7f385f03b7cab03bc99ed7eac359
|
3 |
+
size 1036622
|
Data Analitics/Week 1/Additional Reading and Links Week 1.txt
ADDED
@@ -0,0 +1,25 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
### Abaixo estao os links de "additional Reading" complementares a materia de "Week 1":
|
2 |
+
|
3 |
+
https://oralytics.com/2019/04/18/data-sets-for-analytics/
|
4 |
+
|
5 |
+
https://practicalanalytics.wordpress.com/predictive-analytics-101/
|
6 |
+
|
7 |
+
https://www.datasciencecentral.com/?s=&q=Data+science+use+cases
|
8 |
+
|
9 |
+
https://www.gartner.com/en/newsroom
|
10 |
+
|
11 |
+
https://www.dropbox.com/scl/fi/yp3s9zs37d6x2xvh8zvov/it-has-26-words-for-data-mining.pd?rlkey=oad38rok3znuqabfv9u4qhscc&e=1
|
12 |
+
|
13 |
+
https://www.dropbox.com/scl/fi/nacqj76um27pc0ree2q0c/DM-in-Business.pdf?rlkey=py6t5uffbpy5fx82v7ofsnlxl&e=1
|
14 |
+
|
15 |
+
https://www.dropbox.com/scl/fi/u9ibgc1dvrrlav11j5jl8/fayyad96from.pdf?rlkey=kjhcseyvn3t4l4vgsz1qiw7d1&e=1
|
16 |
+
|
17 |
+
|
18 |
+
###Abaixo estao os links dos videos no YouTube sobre a materia, feitas, em sua maioria, pelo professor Brendan:
|
19 |
+
|
20 |
+
https://youtu.be/nmsrSWad254
|
21 |
+
|
22 |
+
https://youtu.be/iLL_HxzpsDA
|
23 |
+
|
24 |
+
https://youtu.be/wSgX8gCZwsg
|
25 |
+
|
Data Analitics/Week 1/L1-Introduction.pdf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:7a24bf9a9286897fd21207a31dfea803abdc730f4f2f1f5de47a9bd42ff6610d
|
3 |
+
size 3711053
|
Data Analitics/Week 1/Lab1-Introduction.pdf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:483029e96a11d38f5e1416d7808bbd3764d4d7f8c66e5f36bba32c94798eae85
|
3 |
+
size 2934417
|
Data Analitics/Week 1/TU257_Lab1-Introduction-Solution.ipynb
ADDED
@@ -0,0 +1,337 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"cells": [
|
3 |
+
{
|
4 |
+
"cell_type": "code",
|
5 |
+
"execution_count": 12,
|
6 |
+
"metadata": {},
|
7 |
+
"outputs": [
|
8 |
+
{
|
9 |
+
"name": "stdout",
|
10 |
+
"output_type": "stream",
|
11 |
+
"text": [
|
12 |
+
"3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 16:52:21) \n",
|
13 |
+
"[Clang 6.0 (clang-600.0.57)]\n",
|
14 |
+
"sys.version_info(major=3, minor=7, micro=3, releaselevel='final', serial=0)\n"
|
15 |
+
]
|
16 |
+
}
|
17 |
+
],
|
18 |
+
"source": [
|
19 |
+
"import sys\n",
|
20 |
+
"import platform\n",
|
21 |
+
"\n",
|
22 |
+
"#Print Python version details\n",
|
23 |
+
"print(sys.version)\n",
|
24 |
+
"print(sys.version_info)\n"
|
25 |
+
]
|
26 |
+
},
|
27 |
+
{
|
28 |
+
"cell_type": "code",
|
29 |
+
"execution_count": 13,
|
30 |
+
"metadata": {},
|
31 |
+
"outputs": [
|
32 |
+
{
|
33 |
+
"name": "stdout",
|
34 |
+
"output_type": "stream",
|
35 |
+
"text": [
|
36 |
+
"x86_64\n",
|
37 |
+
"Darwin Kernel Version 19.6.0: Tue Jun 21 21:18:39 PDT 2022; root:xnu-6153.141.66~1/RELEASE_X86_64\n",
|
38 |
+
"Darwin\n",
|
39 |
+
"i386\n"
|
40 |
+
]
|
41 |
+
}
|
42 |
+
],
|
43 |
+
"source": [
|
44 |
+
"#Print details about your computer\n",
|
45 |
+
"\n",
|
46 |
+
"print(platform.machine())\n",
|
47 |
+
"print(platform.version())\n",
|
48 |
+
"print(platform.system())\n",
|
49 |
+
"print(platform.processor())"
|
50 |
+
]
|
51 |
+
},
|
52 |
+
{
|
53 |
+
"cell_type": "code",
|
54 |
+
"execution_count": 3,
|
55 |
+
"metadata": {},
|
56 |
+
"outputs": [],
|
57 |
+
"source": [
|
58 |
+
"#Write code to print your name"
|
59 |
+
]
|
60 |
+
},
|
61 |
+
{
|
62 |
+
"cell_type": "code",
|
63 |
+
"execution_count": 14,
|
64 |
+
"metadata": {},
|
65 |
+
"outputs": [
|
66 |
+
{
|
67 |
+
"name": "stdout",
|
68 |
+
"output_type": "stream",
|
69 |
+
"text": [
|
70 |
+
"Brendan Tierney\n"
|
71 |
+
]
|
72 |
+
}
|
73 |
+
],
|
74 |
+
"source": [
|
75 |
+
"print('Brendan Tierney')"
|
76 |
+
]
|
77 |
+
},
|
78 |
+
{
|
79 |
+
"cell_type": "code",
|
80 |
+
"execution_count": 4,
|
81 |
+
"metadata": {},
|
82 |
+
"outputs": [],
|
83 |
+
"source": [
|
84 |
+
"#Crate a variable containing your name, and print it to the screen"
|
85 |
+
]
|
86 |
+
},
|
87 |
+
{
|
88 |
+
"cell_type": "code",
|
89 |
+
"execution_count": 16,
|
90 |
+
"metadata": {},
|
91 |
+
"outputs": [
|
92 |
+
{
|
93 |
+
"name": "stdout",
|
94 |
+
"output_type": "stream",
|
95 |
+
"text": [
|
96 |
+
"My name is Brendan Tierney\n"
|
97 |
+
]
|
98 |
+
}
|
99 |
+
],
|
100 |
+
"source": [
|
101 |
+
"name='Brendan Tierney'\n",
|
102 |
+
"print('My name is ', name)"
|
103 |
+
]
|
104 |
+
},
|
105 |
+
{
|
106 |
+
"cell_type": "code",
|
107 |
+
"execution_count": 3,
|
108 |
+
"metadata": {},
|
109 |
+
"outputs": [
|
110 |
+
{
|
111 |
+
"name": "stdout",
|
112 |
+
"output_type": "stream",
|
113 |
+
"text": [
|
114 |
+
"Enter your name:\n",
|
115 |
+
"brendan\n",
|
116 |
+
"Hello, brendan how are you today?\n"
|
117 |
+
]
|
118 |
+
}
|
119 |
+
],
|
120 |
+
"source": [
|
121 |
+
"#Use the input command to allow a user to enter a value\n",
|
122 |
+
"#for eample, use the input command to ask the use to enter a string\n",
|
123 |
+
"#save the inputted value to a variable\n",
|
124 |
+
"#print the variable\n",
|
125 |
+
"print('Enter your name:')\n",
|
126 |
+
"x = input()\n",
|
127 |
+
"print('Hello, ' + x + ' how are you today?') "
|
128 |
+
]
|
129 |
+
},
|
130 |
+
{
|
131 |
+
"cell_type": "code",
|
132 |
+
"execution_count": null,
|
133 |
+
"metadata": {},
|
134 |
+
"outputs": [],
|
135 |
+
"source": []
|
136 |
+
},
|
137 |
+
{
|
138 |
+
"cell_type": "code",
|
139 |
+
"execution_count": 6,
|
140 |
+
"metadata": {},
|
141 |
+
"outputs": [],
|
142 |
+
"source": [
|
143 |
+
"#Use an IF condition to decide what mesage to print\n",
|
144 |
+
"#Use the input command to ask the use to enter a number\n",
|
145 |
+
"#Use the IF condition to determine if the number if Postive or Negative"
|
146 |
+
]
|
147 |
+
},
|
148 |
+
{
|
149 |
+
"cell_type": "code",
|
150 |
+
"execution_count": 15,
|
151 |
+
"metadata": {},
|
152 |
+
"outputs": [
|
153 |
+
{
|
154 |
+
"name": "stdout",
|
155 |
+
"output_type": "stream",
|
156 |
+
"text": [
|
157 |
+
"Enter a number :\n",
|
158 |
+
"1\n",
|
159 |
+
"The number 1 is postive\n"
|
160 |
+
]
|
161 |
+
}
|
162 |
+
],
|
163 |
+
"source": [
|
164 |
+
"print('Enter a number :')\n",
|
165 |
+
"x = input()\n",
|
166 |
+
"if int(x) >= 0:\n",
|
167 |
+
" print('The number ', x, ' is postive')\n",
|
168 |
+
"elif int(x) < 0:\n",
|
169 |
+
" print('The number ', x, ' is negative')\n",
|
170 |
+
"else:\n",
|
171 |
+
" print('Unkown number')\n",
|
172 |
+
" "
|
173 |
+
]
|
174 |
+
},
|
175 |
+
{
|
176 |
+
"cell_type": "code",
|
177 |
+
"execution_count": 7,
|
178 |
+
"metadata": {},
|
179 |
+
"outputs": [],
|
180 |
+
"source": [
|
181 |
+
"#Add an 'else' condition to the above IF condition\n",
|
182 |
+
"#You can decide what the statement should be, what message should be printed?\n"
|
183 |
+
]
|
184 |
+
},
|
185 |
+
{
|
186 |
+
"cell_type": "code",
|
187 |
+
"execution_count": null,
|
188 |
+
"metadata": {},
|
189 |
+
"outputs": [],
|
190 |
+
"source": []
|
191 |
+
},
|
192 |
+
{
|
193 |
+
"cell_type": "code",
|
194 |
+
"execution_count": 8,
|
195 |
+
"metadata": {},
|
196 |
+
"outputs": [
|
197 |
+
{
|
198 |
+
"name": "stdout",
|
199 |
+
"output_type": "stream",
|
200 |
+
"text": [
|
201 |
+
"0\n",
|
202 |
+
"1\n",
|
203 |
+
"2\n",
|
204 |
+
"3\n",
|
205 |
+
"4\n"
|
206 |
+
]
|
207 |
+
}
|
208 |
+
],
|
209 |
+
"source": [
|
210 |
+
"#the following will print a range of values\n",
|
211 |
+
"for i in range(0,5):\n",
|
212 |
+
" print(i)"
|
213 |
+
]
|
214 |
+
},
|
215 |
+
{
|
216 |
+
"cell_type": "code",
|
217 |
+
"execution_count": 9,
|
218 |
+
"metadata": {},
|
219 |
+
"outputs": [],
|
220 |
+
"source": [
|
221 |
+
"#Take a copy of this code and paste it into the next cell\n",
|
222 |
+
"#Change the code to add two variables to contain a number, assign a number to these variables\n",
|
223 |
+
"#Change the code in the 'for' loop to print the numbers between the values in the two variables"
|
224 |
+
]
|
225 |
+
},
|
226 |
+
{
|
227 |
+
"cell_type": "code",
|
228 |
+
"execution_count": 16,
|
229 |
+
"metadata": {},
|
230 |
+
"outputs": [
|
231 |
+
{
|
232 |
+
"name": "stdout",
|
233 |
+
"output_type": "stream",
|
234 |
+
"text": [
|
235 |
+
"11\n",
|
236 |
+
"12\n",
|
237 |
+
"13\n"
|
238 |
+
]
|
239 |
+
}
|
240 |
+
],
|
241 |
+
"source": [
|
242 |
+
"x = 11\n",
|
243 |
+
"y = 14\n",
|
244 |
+
"for i in range(x, y):\n",
|
245 |
+
" print(i)"
|
246 |
+
]
|
247 |
+
},
|
248 |
+
{
|
249 |
+
"cell_type": "code",
|
250 |
+
"execution_count": 10,
|
251 |
+
"metadata": {},
|
252 |
+
"outputs": [],
|
253 |
+
"source": [
|
254 |
+
"#Take a copy of the code in the previous cell\n",
|
255 |
+
"#Expand it to ask the user to input two values.\n",
|
256 |
+
"#Change the loop to print the values between these two values"
|
257 |
+
]
|
258 |
+
},
|
259 |
+
{
|
260 |
+
"cell_type": "code",
|
261 |
+
"execution_count": 20,
|
262 |
+
"metadata": {},
|
263 |
+
"outputs": [
|
264 |
+
{
|
265 |
+
"name": "stdout",
|
266 |
+
"output_type": "stream",
|
267 |
+
"text": [
|
268 |
+
"Enter the Start number: \n",
|
269 |
+
"31\n",
|
270 |
+
"Enter the End number: \n",
|
271 |
+
"44\n",
|
272 |
+
"----------\n",
|
273 |
+
"31\n",
|
274 |
+
"32\n",
|
275 |
+
"33\n",
|
276 |
+
"34\n",
|
277 |
+
"35\n",
|
278 |
+
"36\n",
|
279 |
+
"37\n",
|
280 |
+
"38\n",
|
281 |
+
"39\n",
|
282 |
+
"40\n",
|
283 |
+
"41\n",
|
284 |
+
"42\n",
|
285 |
+
"43\n"
|
286 |
+
]
|
287 |
+
}
|
288 |
+
],
|
289 |
+
"source": [
|
290 |
+
"print('Enter the Start number: ')\n",
|
291 |
+
"x = input()\n",
|
292 |
+
"print('Enter the End number: ')\n",
|
293 |
+
"y = input()\n",
|
294 |
+
"print(('----------'))\n",
|
295 |
+
"for i in range(int(x), int(y)):\n",
|
296 |
+
" print(i)"
|
297 |
+
]
|
298 |
+
},
|
299 |
+
{
|
300 |
+
"cell_type": "code",
|
301 |
+
"execution_count": 11,
|
302 |
+
"metadata": {},
|
303 |
+
"outputs": [],
|
304 |
+
"source": [
|
305 |
+
"#what else can you do?"
|
306 |
+
]
|
307 |
+
},
|
308 |
+
{
|
309 |
+
"cell_type": "code",
|
310 |
+
"execution_count": null,
|
311 |
+
"metadata": {},
|
312 |
+
"outputs": [],
|
313 |
+
"source": []
|
314 |
+
}
|
315 |
+
],
|
316 |
+
"metadata": {
|
317 |
+
"kernelspec": {
|
318 |
+
"display_name": "Python 3",
|
319 |
+
"language": "python",
|
320 |
+
"name": "python3"
|
321 |
+
},
|
322 |
+
"language_info": {
|
323 |
+
"codemirror_mode": {
|
324 |
+
"name": "ipython",
|
325 |
+
"version": 3
|
326 |
+
},
|
327 |
+
"file_extension": ".py",
|
328 |
+
"mimetype": "text/x-python",
|
329 |
+
"name": "python",
|
330 |
+
"nbconvert_exporter": "python",
|
331 |
+
"pygments_lexer": "ipython3",
|
332 |
+
"version": "3.7.3"
|
333 |
+
}
|
334 |
+
},
|
335 |
+
"nbformat": 4,
|
336 |
+
"nbformat_minor": 4
|
337 |
+
}
|
Data Analitics/Week 1/TU257_Lab1-Introduction.ipynb
ADDED
@@ -0,0 +1,235 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"cells": [
|
3 |
+
{
|
4 |
+
"cell_type": "code",
|
5 |
+
"execution_count": 1,
|
6 |
+
"metadata": {},
|
7 |
+
"outputs": [
|
8 |
+
{
|
9 |
+
"name": "stdout",
|
10 |
+
"output_type": "stream",
|
11 |
+
"text": [
|
12 |
+
"3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 16:52:21) \n",
|
13 |
+
"[Clang 6.0 (clang-600.0.57)]\n",
|
14 |
+
"sys.version_info(major=3, minor=7, micro=3, releaselevel='final', serial=0)\n"
|
15 |
+
]
|
16 |
+
}
|
17 |
+
],
|
18 |
+
"source": [
|
19 |
+
"import sys\n",
|
20 |
+
"import platform\n",
|
21 |
+
"\n",
|
22 |
+
"#Print Python version details\n",
|
23 |
+
"print(sys.version)\n",
|
24 |
+
"print(sys.version_info)\n"
|
25 |
+
]
|
26 |
+
},
|
27 |
+
{
|
28 |
+
"cell_type": "code",
|
29 |
+
"execution_count": 2,
|
30 |
+
"metadata": {},
|
31 |
+
"outputs": [
|
32 |
+
{
|
33 |
+
"name": "stdout",
|
34 |
+
"output_type": "stream",
|
35 |
+
"text": [
|
36 |
+
"x86_64\n",
|
37 |
+
"Darwin Kernel Version 19.6.0: Tue Jun 21 21:18:39 PDT 2022; root:xnu-6153.141.66~1/RELEASE_X86_64\n",
|
38 |
+
"Darwin\n",
|
39 |
+
"i386\n"
|
40 |
+
]
|
41 |
+
}
|
42 |
+
],
|
43 |
+
"source": [
|
44 |
+
"#Print details about your computer\n",
|
45 |
+
"\n",
|
46 |
+
"print(platform.machine())\n",
|
47 |
+
"print(platform.version())\n",
|
48 |
+
"print(platform.system())\n",
|
49 |
+
"print(platform.processor())"
|
50 |
+
]
|
51 |
+
},
|
52 |
+
{
|
53 |
+
"cell_type": "code",
|
54 |
+
"execution_count": null,
|
55 |
+
"metadata": {},
|
56 |
+
"outputs": [],
|
57 |
+
"source": [
|
58 |
+
"#Write code to print your name"
|
59 |
+
]
|
60 |
+
},
|
61 |
+
{
|
62 |
+
"cell_type": "code",
|
63 |
+
"execution_count": null,
|
64 |
+
"metadata": {},
|
65 |
+
"outputs": [],
|
66 |
+
"source": []
|
67 |
+
},
|
68 |
+
{
|
69 |
+
"cell_type": "code",
|
70 |
+
"execution_count": null,
|
71 |
+
"metadata": {},
|
72 |
+
"outputs": [],
|
73 |
+
"source": [
|
74 |
+
"#Crate a variable containing your name, and print it to the screen"
|
75 |
+
]
|
76 |
+
},
|
77 |
+
{
|
78 |
+
"cell_type": "code",
|
79 |
+
"execution_count": null,
|
80 |
+
"metadata": {},
|
81 |
+
"outputs": [],
|
82 |
+
"source": []
|
83 |
+
},
|
84 |
+
{
|
85 |
+
"cell_type": "code",
|
86 |
+
"execution_count": null,
|
87 |
+
"metadata": {},
|
88 |
+
"outputs": [],
|
89 |
+
"source": [
|
90 |
+
"#Use the input command to allow a user to enter a value\n",
|
91 |
+
"#for eample, use the input command to ask the use to enter a string\n",
|
92 |
+
"#save the inputted value to a variable\n",
|
93 |
+
"#print the variable"
|
94 |
+
]
|
95 |
+
},
|
96 |
+
{
|
97 |
+
"cell_type": "code",
|
98 |
+
"execution_count": null,
|
99 |
+
"metadata": {},
|
100 |
+
"outputs": [],
|
101 |
+
"source": []
|
102 |
+
},
|
103 |
+
{
|
104 |
+
"cell_type": "code",
|
105 |
+
"execution_count": null,
|
106 |
+
"metadata": {},
|
107 |
+
"outputs": [],
|
108 |
+
"source": [
|
109 |
+
"#Use an IF condition to decide what mesage to print\n",
|
110 |
+
"#Use the input command to ask the use to enter a number\n",
|
111 |
+
"#Use the IF condition to determine if the number if Postive or Negative"
|
112 |
+
]
|
113 |
+
},
|
114 |
+
{
|
115 |
+
"cell_type": "code",
|
116 |
+
"execution_count": null,
|
117 |
+
"metadata": {},
|
118 |
+
"outputs": [],
|
119 |
+
"source": []
|
120 |
+
},
|
121 |
+
{
|
122 |
+
"cell_type": "code",
|
123 |
+
"execution_count": null,
|
124 |
+
"metadata": {},
|
125 |
+
"outputs": [],
|
126 |
+
"source": [
|
127 |
+
"#Add an 'else' condition to the above IF condition\n",
|
128 |
+
"#You can decide what the statement should be, what message should be printed?\n"
|
129 |
+
]
|
130 |
+
},
|
131 |
+
{
|
132 |
+
"cell_type": "code",
|
133 |
+
"execution_count": null,
|
134 |
+
"metadata": {},
|
135 |
+
"outputs": [],
|
136 |
+
"source": []
|
137 |
+
},
|
138 |
+
{
|
139 |
+
"cell_type": "code",
|
140 |
+
"execution_count": 2,
|
141 |
+
"metadata": {},
|
142 |
+
"outputs": [
|
143 |
+
{
|
144 |
+
"name": "stdout",
|
145 |
+
"output_type": "stream",
|
146 |
+
"text": [
|
147 |
+
"0\n",
|
148 |
+
"1\n",
|
149 |
+
"2\n",
|
150 |
+
"3\n",
|
151 |
+
"4\n"
|
152 |
+
]
|
153 |
+
}
|
154 |
+
],
|
155 |
+
"source": [
|
156 |
+
"#the following will print a range of values\n",
|
157 |
+
"for i in range(0,5):\n",
|
158 |
+
" print(i)"
|
159 |
+
]
|
160 |
+
},
|
161 |
+
{
|
162 |
+
"cell_type": "code",
|
163 |
+
"execution_count": null,
|
164 |
+
"metadata": {},
|
165 |
+
"outputs": [],
|
166 |
+
"source": [
|
167 |
+
"#Take a copy of this code and paste it into the next cell\n",
|
168 |
+
"#Change the code to add two variables to contain a number, assign a number to these variables\n",
|
169 |
+
"#Change the code in the 'for' loop to print the numbers between the values in the two variables"
|
170 |
+
]
|
171 |
+
},
|
172 |
+
{
|
173 |
+
"cell_type": "code",
|
174 |
+
"execution_count": null,
|
175 |
+
"metadata": {},
|
176 |
+
"outputs": [],
|
177 |
+
"source": []
|
178 |
+
},
|
179 |
+
{
|
180 |
+
"cell_type": "code",
|
181 |
+
"execution_count": null,
|
182 |
+
"metadata": {},
|
183 |
+
"outputs": [],
|
184 |
+
"source": [
|
185 |
+
"#Take a copy of the code in the previous cell\n",
|
186 |
+
"#Expand it to ask the user to input two values.\n",
|
187 |
+
"#Change the loop to print the values between these two values"
|
188 |
+
]
|
189 |
+
},
|
190 |
+
{
|
191 |
+
"cell_type": "code",
|
192 |
+
"execution_count": null,
|
193 |
+
"metadata": {},
|
194 |
+
"outputs": [],
|
195 |
+
"source": []
|
196 |
+
},
|
197 |
+
{
|
198 |
+
"cell_type": "code",
|
199 |
+
"execution_count": null,
|
200 |
+
"metadata": {},
|
201 |
+
"outputs": [],
|
202 |
+
"source": [
|
203 |
+
"#what else can you do?"
|
204 |
+
]
|
205 |
+
},
|
206 |
+
{
|
207 |
+
"cell_type": "code",
|
208 |
+
"execution_count": null,
|
209 |
+
"metadata": {},
|
210 |
+
"outputs": [],
|
211 |
+
"source": []
|
212 |
+
}
|
213 |
+
],
|
214 |
+
"metadata": {
|
215 |
+
"kernelspec": {
|
216 |
+
"display_name": "Python 3",
|
217 |
+
"language": "python",
|
218 |
+
"name": "python3"
|
219 |
+
},
|
220 |
+
"language_info": {
|
221 |
+
"codemirror_mode": {
|
222 |
+
"name": "ipython",
|
223 |
+
"version": 3
|
224 |
+
},
|
225 |
+
"file_extension": ".py",
|
226 |
+
"mimetype": "text/x-python",
|
227 |
+
"name": "python",
|
228 |
+
"nbconvert_exporter": "python",
|
229 |
+
"pygments_lexer": "ipython3",
|
230 |
+
"version": "3.7.3"
|
231 |
+
}
|
232 |
+
},
|
233 |
+
"nbformat": 4,
|
234 |
+
"nbformat_minor": 4
|
235 |
+
}
|
Data Analitics/Week 10/L10-Clustering-Data.pdf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:1bcef526c6bc5e2ee4395eeaf3313956ed549cd8ea91b030c37caf97b668a1b3
|
3 |
+
size 4691849
|
Data Analitics/Week 10/Lab10-Clustering-Data.pdf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:caa26cb042538e1f17832c5681a029770fad12e102ecfefc0288496d7608496b
|
3 |
+
size 100254
|
Data Analitics/Week 10/TU257-Lab10-1-Clustering-Demo.ipynb
ADDED
The diff for this file is too large to render.
See raw diff
|
|
Data Analitics/Week 10/TU257-Lab10-2-Clustering-DBScan-Demo.ipynb
ADDED
The diff for this file is too large to render.
See raw diff
|
|
Data Analitics/Week 10/Week 10 Additional Reading and links.txt
ADDED
@@ -0,0 +1,23 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
### Week 10 Additional Reading and Links
|
2 |
+
|
3 |
+
### YouTube links for classes Lab:
|
4 |
+
|
5 |
+
https://youtu.be/f0wYy9SsuVA
|
6 |
+
|
7 |
+
https://youtu.be/RWLmcq1ykVw
|
8 |
+
|
9 |
+
https://youtu.be/N4beiC6sNmk
|
10 |
+
|
11 |
+
### Additional Reading:
|
12 |
+
|
13 |
+
https://oralytics.com/2019/04/18/data-sets-for-analytics/
|
14 |
+
|
15 |
+
https://oralytics.com/2021/10/18/dbscan-clustering-in-python/
|
16 |
+
|
17 |
+
https://www.kdnuggets.com/2023/05/clustering-scikitlearn-tutorial-unsupervised-learning.html
|
18 |
+
|
19 |
+
https://www.kdnuggets.com/2023/05/principal-component-analysis-pca-scikitlearn.html
|
20 |
+
|
21 |
+
https://drive.google.com/file/d/13Wyub3clCDMzt5Z8i1zhM7xcTSsE_v2B/view
|
22 |
+
|
23 |
+
https://drive.google.com/file/d/1kRDtiM3e_sl_FqJH2tLlg4XvDg1FVJkT/view
|
Data Analitics/Week 11/L11-Text-Mining.pdf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:0e43275de38fbb61d964280a3deb7ae49f6f63ee7616aeeddcf58337224cfa73
|
3 |
+
size 1196030
|
Data Analitics/Week 11/Lab11-Text-Mining (1).pdf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:2b05ea9b2454e0190cee970d577a57e0dfa85626218fd633b75c264a0282a32f
|
3 |
+
size 290828
|
Data Analitics/Week 11/Lab11-Text-Mining.pdf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:2b05ea9b2454e0190cee970d577a57e0dfa85626218fd633b75c264a0282a32f
|
3 |
+
size 290828
|
Data Analitics/Week 11/TU257-Lab11-1-Demo.ipynb
ADDED
The diff for this file is too large to render.
See raw diff
|
|
Data Analitics/Week 11/TU257-Lab11-2-Text-ML-Predictions.ipynb
ADDED
The diff for this file is too large to render.
See raw diff
|
|
Data Analitics/Week 11/Week 11 Additional reading and links.txt
ADDED
@@ -0,0 +1,22 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
### Additional reading and links
|
2 |
+
|
3 |
+
### Additional Links
|
4 |
+
|
5 |
+
https://oralytics.com/2021/11/01/combining-nlp-and-machine-learning-for-document-classification/
|
6 |
+
|
7 |
+
https://oralytics.com/2020/01/21/ge2020-analysing-party-manifestos-using-python/
|
8 |
+
|
9 |
+
https://oralytics.com/2020/01/24/ge2020-comparing-party-manifestos-to-2016/
|
10 |
+
|
11 |
+
https://oralytics.com/2019/04/18/data-sets-for-analytics/
|
12 |
+
|
13 |
+
https://www.kdnuggets.com/2015/01/text-analysis-101-document-classification.html
|
14 |
+
|
15 |
+
|
16 |
+
### Links Youtube:
|
17 |
+
|
18 |
+
https://youtu.be/4vHwcbFKM2Q
|
19 |
+
|
20 |
+
https://youtu.be/xfphUjk9jz0
|
21 |
+
|
22 |
+
https://youtu.be/k-QMStX-ZWw
|
Data Analitics/Week 11/review_polarity.tar.gz
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:fc0dccc2671af5db3c5d8f81f77a1ebfec953ecdd422334062df61ede36b2179
|
3 |
+
size 3127238
|
Data Analitics/Week 2/Week 2 Material Complementar.txt
ADDED
@@ -0,0 +1,29 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
### A Week 2 foi um feriado, entao nao houve aula. No entanto há materiais importantes e relevantes para a materia.
|
2 |
+
|
3 |
+
### Abaixo um curso de 4h de duracao sobre Pandas. Link:
|
4 |
+
|
5 |
+
https://www.kaggle.com/learn/pandas
|
6 |
+
|
7 |
+
|
8 |
+
### Abaixo o link para uma aula sobre "Exploratory Data with Pandas":
|
9 |
+
|
10 |
+
|
11 |
+
https://www.kaggle.com/code/kashnitsky/topic-1-exploratory-data-analysis-with-pandas
|
12 |
+
|
13 |
+
### Abaixo uma aula sobre "Visual Data analysis in Python":
|
14 |
+
|
15 |
+
https://www.kaggle.com/code/kashnitsky/topic-2-visual-data-analysis-in-python
|
16 |
+
|
17 |
+
|
18 |
+
|
19 |
+
### Alguns links de videos no Youtube que complementam a materia:
|
20 |
+
|
21 |
+
https://youtu.be/X3paOmcrTjQ
|
22 |
+
|
23 |
+
https://youtu.be/yZvFH7B6gKI
|
24 |
+
|
25 |
+
https://youtu.be/lYWt-aCnE2U
|
26 |
+
|
27 |
+
https://youtu.be/lIFLeHDanmA
|
28 |
+
|
29 |
+
https://youtu.be/C_Q_L0wdPNg
|
Data Analitics/Week 3/### Week 3 Additional Reading.txt
ADDED
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
### Week 3 Additional Reading:
|
2 |
+
|
3 |
+
https://www.dropbox.com/scl/fi/pile8auga8sux1c7eb00y/Berry-Chapter-1.pdf?rlkey=ioba8cbfie5cc2bxj8p6owqgz&e=1
|
4 |
+
|
5 |
+
https://www.dropbox.com/scl/fi/nacqj76um27pc0ree2q0c/DM-in-Business.pdf?rlkey=py6t5uffbpy5fx82v7ofsnlxl&e=1
|
6 |
+
|
7 |
+
https://b-tierney.com/wp-content/uploads/2018/07/CRISPDM.pdf
|
8 |
+
|
Data Analitics/Week 3/.ipynb_checkpoints/Lab2-1-checkpoint.ipynb
ADDED
@@ -0,0 +1,149 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"cells": [
|
3 |
+
{
|
4 |
+
"cell_type": "code",
|
5 |
+
"execution_count": 8,
|
6 |
+
"metadata": {},
|
7 |
+
"outputs": [
|
8 |
+
{
|
9 |
+
"data": {
|
10 |
+
"text/html": [
|
11 |
+
"<style>.container { width:85% !important; }</style>"
|
12 |
+
],
|
13 |
+
"text/plain": [
|
14 |
+
"<IPython.core.display.HTML object>"
|
15 |
+
]
|
16 |
+
},
|
17 |
+
"metadata": {},
|
18 |
+
"output_type": "display_data"
|
19 |
+
}
|
20 |
+
],
|
21 |
+
"source": [
|
22 |
+
"from IPython.display import display, HTML\n",
|
23 |
+
"display(HTML(\"<style>.container { width:85% !important; }</style>\"))"
|
24 |
+
]
|
25 |
+
},
|
26 |
+
{
|
27 |
+
"cell_type": "code",
|
28 |
+
"execution_count": 1,
|
29 |
+
"metadata": {},
|
30 |
+
"outputs": [
|
31 |
+
{
|
32 |
+
"ename": "SyntaxError",
|
33 |
+
"evalue": "(unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \\UXXXXXXXX escape (403486649.py, line 4)",
|
34 |
+
"output_type": "error",
|
35 |
+
"traceback": [
|
36 |
+
"\u001b[1;36m Cell \u001b[1;32mIn[1], line 4\u001b[1;36m\u001b[0m\n\u001b[1;33m df = pd.read_csv(\"C:\\Users\\Rafael\\Documents\\DataScience\\Data Analitics\\Week 3\\TU257-Lab2-1-Automated-Data-Profiling.ipynb\")\u001b[0m\n\u001b[1;37m ^\u001b[0m\n\u001b[1;31mSyntaxError\u001b[0m\u001b[1;31m:\u001b[0m (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \\UXXXXXXXX escape\n"
|
37 |
+
]
|
38 |
+
}
|
39 |
+
],
|
40 |
+
"source": [
|
41 |
+
"import pandas as pd\n",
|
42 |
+
"\n",
|
43 |
+
"#Change this next command to the location of train.csv on your Computer\n",
|
44 |
+
"df = pd.read_csv(\"C:\\Users\\Rafael\\Documents\\DataScience\\Data Analitics\\Week 3\\TU257-Lab2-1-Automated-Data-Profiling.ipynb\")\n",
|
45 |
+
"#df = pd.read_csv(\"C:\\Studies\\TU257\\DataAnalytics\\Week2\\train.csv\")\n",
|
46 |
+
"df.head(8)"
|
47 |
+
]
|
48 |
+
},
|
49 |
+
{
|
50 |
+
"cell_type": "code",
|
51 |
+
"execution_count": null,
|
52 |
+
"metadata": {},
|
53 |
+
"outputs": [],
|
54 |
+
"source": [
|
55 |
+
"df2 = df.iloc[:,[1,2,4,5,6,7,8,10,11]]\n",
|
56 |
+
"df2.head(8)"
|
57 |
+
]
|
58 |
+
},
|
59 |
+
{
|
60 |
+
"cell_type": "code",
|
61 |
+
"execution_count": null,
|
62 |
+
"metadata": {},
|
63 |
+
"outputs": [],
|
64 |
+
"source": [
|
65 |
+
"df2.describe()"
|
66 |
+
]
|
67 |
+
},
|
68 |
+
{
|
69 |
+
"cell_type": "code",
|
70 |
+
"execution_count": null,
|
71 |
+
"metadata": {},
|
72 |
+
"outputs": [],
|
73 |
+
"source": [
|
74 |
+
"df2.describe().transpose()"
|
75 |
+
]
|
76 |
+
},
|
77 |
+
{
|
78 |
+
"cell_type": "code",
|
79 |
+
"execution_count": null,
|
80 |
+
"metadata": {},
|
81 |
+
"outputs": [],
|
82 |
+
"source": [
|
83 |
+
"#Make sure to install 'ydata_profiling' library before running the following\n",
|
84 |
+
"#see Lab Notes\n",
|
85 |
+
"\n",
|
86 |
+
"from ydata_profiling import ProfileReport\n",
|
87 |
+
"\n",
|
88 |
+
"profile = ProfileReport(df2, title=\"Profiling Report\")\n",
|
89 |
+
"profile"
|
90 |
+
]
|
91 |
+
},
|
92 |
+
{
|
93 |
+
"cell_type": "code",
|
94 |
+
"execution_count": null,
|
95 |
+
"metadata": {},
|
96 |
+
"outputs": [],
|
97 |
+
"source": [
|
98 |
+
"#Can you save the Data Profile Report to a file?\n",
|
99 |
+
"#Check the package Github site for examples (link to this is in the Lab Notes)\n",
|
100 |
+
"# https://github.com/ydataai/ydata-profiling\n",
|
101 |
+
"# Scroll to the bottom of the main GitHub page for examples of saving the report\n"
|
102 |
+
]
|
103 |
+
},
|
104 |
+
{
|
105 |
+
"cell_type": "code",
|
106 |
+
"execution_count": null,
|
107 |
+
"metadata": {},
|
108 |
+
"outputs": [],
|
109 |
+
"source": [
|
110 |
+
"#Enter the code here\n"
|
111 |
+
]
|
112 |
+
},
|
113 |
+
{
|
114 |
+
"cell_type": "code",
|
115 |
+
"execution_count": null,
|
116 |
+
"metadata": {},
|
117 |
+
"outputs": [],
|
118 |
+
"source": []
|
119 |
+
},
|
120 |
+
{
|
121 |
+
"cell_type": "markdown",
|
122 |
+
"metadata": {},
|
123 |
+
"source": [
|
124 |
+
"### See lots more examples of using this library/package for analysing datasets on the Github page. Scroll to bottom of main page to get the links"
|
125 |
+
]
|
126 |
+
}
|
127 |
+
],
|
128 |
+
"metadata": {
|
129 |
+
"kernelspec": {
|
130 |
+
"display_name": "Python 3 (ipykernel)",
|
131 |
+
"language": "python",
|
132 |
+
"name": "python3"
|
133 |
+
},
|
134 |
+
"language_info": {
|
135 |
+
"codemirror_mode": {
|
136 |
+
"name": "ipython",
|
137 |
+
"version": 3
|
138 |
+
},
|
139 |
+
"file_extension": ".py",
|
140 |
+
"mimetype": "text/x-python",
|
141 |
+
"name": "python",
|
142 |
+
"nbconvert_exporter": "python",
|
143 |
+
"pygments_lexer": "ipython3",
|
144 |
+
"version": "3.12.9"
|
145 |
+
}
|
146 |
+
},
|
147 |
+
"nbformat": 4,
|
148 |
+
"nbformat_minor": 4
|
149 |
+
}
|
Data Analitics/Week 3/.ipynb_checkpoints/TU257-Lab2-1-Automated-Data-Profiling-checkpoint.ipynb
ADDED
The diff for this file is too large to render.
See raw diff
|
|
Data Analitics/Week 3/.ipynb_checkpoints/TU257-Lab2-2-Data-Exploration-checkpoint.ipynb
ADDED
@@ -0,0 +1,1024 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"cells": [
|
3 |
+
{
|
4 |
+
"cell_type": "markdown",
|
5 |
+
"metadata": {},
|
6 |
+
"source": [
|
7 |
+
"TU258 - Lab 2-2 - Data Exploration\n",
|
8 |
+
"\n",
|
9 |
+
"This lab gives an example of using Pandas dataframes for analysing data\n"
|
10 |
+
]
|
11 |
+
},
|
12 |
+
{
|
13 |
+
"cell_type": "code",
|
14 |
+
"execution_count": 1,
|
15 |
+
"metadata": {},
|
16 |
+
"outputs": [],
|
17 |
+
"source": [
|
18 |
+
"#import pandas\n",
|
19 |
+
"import pandas as pd\n"
|
20 |
+
]
|
21 |
+
},
|
22 |
+
{
|
23 |
+
"cell_type": "code",
|
24 |
+
"execution_count": 2,
|
25 |
+
"metadata": {},
|
26 |
+
"outputs": [],
|
27 |
+
"source": [
|
28 |
+
"#reading a CSV File into a Panda\n",
|
29 |
+
"videoReview = pd.read_csv('/Users/brendan.tierney/Dropbox/4-Datasets/Video_Games_Sales_as_at_22_Dec_2016.csv')"
|
30 |
+
]
|
31 |
+
},
|
32 |
+
{
|
33 |
+
"cell_type": "code",
|
34 |
+
"execution_count": 3,
|
35 |
+
"metadata": {},
|
36 |
+
"outputs": [
|
37 |
+
{
|
38 |
+
"name": "stdout",
|
39 |
+
"output_type": "stream",
|
40 |
+
"text": [
|
41 |
+
"# print first 3 rows\n",
|
42 |
+
" Name Platform Year_of_Release Genre Publisher NA_Sales \\\n",
|
43 |
+
"0 Wii Sports Wii 2006.0 Sports Nintendo 41.36 \n",
|
44 |
+
"1 Super Mario Bros. NES 1985.0 Platform Nintendo 29.08 \n",
|
45 |
+
"2 Mario Kart Wii Wii 2008.0 Racing Nintendo 15.68 \n",
|
46 |
+
"\n",
|
47 |
+
" EU_Sales JP_Sales Other_Sales Global_Sales Critic_Score Critic_Count \\\n",
|
48 |
+
"0 28.96 3.77 8.45 82.53 76.0 51.0 \n",
|
49 |
+
"1 3.58 6.81 0.77 40.24 NaN NaN \n",
|
50 |
+
"2 12.76 3.79 3.29 35.52 82.0 73.0 \n",
|
51 |
+
"\n",
|
52 |
+
" User_Score User_Count Developer Rating \n",
|
53 |
+
"0 8 322.0 Nintendo E \n",
|
54 |
+
"1 NaN NaN NaN NaN \n",
|
55 |
+
"2 8.3 709.0 Nintendo E \n"
|
56 |
+
]
|
57 |
+
}
|
58 |
+
],
|
59 |
+
"source": [
|
60 |
+
"print('# print first 3 rows')\n",
|
61 |
+
"print(videoReview[:3])\n",
|
62 |
+
"\n"
|
63 |
+
]
|
64 |
+
},
|
65 |
+
{
|
66 |
+
"cell_type": "code",
|
67 |
+
"execution_count": 4,
|
68 |
+
"metadata": {},
|
69 |
+
"outputs": [
|
70 |
+
{
|
71 |
+
"data": {
|
72 |
+
"text/html": [
|
73 |
+
"<div>\n",
|
74 |
+
"<style scoped>\n",
|
75 |
+
" .dataframe tbody tr th:only-of-type {\n",
|
76 |
+
" vertical-align: middle;\n",
|
77 |
+
" }\n",
|
78 |
+
"\n",
|
79 |
+
" .dataframe tbody tr th {\n",
|
80 |
+
" vertical-align: top;\n",
|
81 |
+
" }\n",
|
82 |
+
"\n",
|
83 |
+
" .dataframe thead th {\n",
|
84 |
+
" text-align: right;\n",
|
85 |
+
" }\n",
|
86 |
+
"</style>\n",
|
87 |
+
"<table border=\"1\" class=\"dataframe\">\n",
|
88 |
+
" <thead>\n",
|
89 |
+
" <tr style=\"text-align: right;\">\n",
|
90 |
+
" <th></th>\n",
|
91 |
+
" <th>Name</th>\n",
|
92 |
+
" <th>Platform</th>\n",
|
93 |
+
" <th>Year_of_Release</th>\n",
|
94 |
+
" <th>Genre</th>\n",
|
95 |
+
" <th>Publisher</th>\n",
|
96 |
+
" <th>NA_Sales</th>\n",
|
97 |
+
" <th>EU_Sales</th>\n",
|
98 |
+
" <th>JP_Sales</th>\n",
|
99 |
+
" <th>Other_Sales</th>\n",
|
100 |
+
" <th>Global_Sales</th>\n",
|
101 |
+
" <th>Critic_Score</th>\n",
|
102 |
+
" <th>Critic_Count</th>\n",
|
103 |
+
" <th>User_Score</th>\n",
|
104 |
+
" <th>User_Count</th>\n",
|
105 |
+
" <th>Developer</th>\n",
|
106 |
+
" <th>Rating</th>\n",
|
107 |
+
" </tr>\n",
|
108 |
+
" </thead>\n",
|
109 |
+
" <tbody>\n",
|
110 |
+
" <tr>\n",
|
111 |
+
" <th>0</th>\n",
|
112 |
+
" <td>Wii Sports</td>\n",
|
113 |
+
" <td>Wii</td>\n",
|
114 |
+
" <td>2006.0</td>\n",
|
115 |
+
" <td>Sports</td>\n",
|
116 |
+
" <td>Nintendo</td>\n",
|
117 |
+
" <td>41.36</td>\n",
|
118 |
+
" <td>28.96</td>\n",
|
119 |
+
" <td>3.77</td>\n",
|
120 |
+
" <td>8.45</td>\n",
|
121 |
+
" <td>82.53</td>\n",
|
122 |
+
" <td>76.0</td>\n",
|
123 |
+
" <td>51.0</td>\n",
|
124 |
+
" <td>8</td>\n",
|
125 |
+
" <td>322.0</td>\n",
|
126 |
+
" <td>Nintendo</td>\n",
|
127 |
+
" <td>E</td>\n",
|
128 |
+
" </tr>\n",
|
129 |
+
" <tr>\n",
|
130 |
+
" <th>1</th>\n",
|
131 |
+
" <td>Super Mario Bros.</td>\n",
|
132 |
+
" <td>NES</td>\n",
|
133 |
+
" <td>1985.0</td>\n",
|
134 |
+
" <td>Platform</td>\n",
|
135 |
+
" <td>Nintendo</td>\n",
|
136 |
+
" <td>29.08</td>\n",
|
137 |
+
" <td>3.58</td>\n",
|
138 |
+
" <td>6.81</td>\n",
|
139 |
+
" <td>0.77</td>\n",
|
140 |
+
" <td>40.24</td>\n",
|
141 |
+
" <td>NaN</td>\n",
|
142 |
+
" <td>NaN</td>\n",
|
143 |
+
" <td>NaN</td>\n",
|
144 |
+
" <td>NaN</td>\n",
|
145 |
+
" <td>NaN</td>\n",
|
146 |
+
" <td>NaN</td>\n",
|
147 |
+
" </tr>\n",
|
148 |
+
" <tr>\n",
|
149 |
+
" <th>2</th>\n",
|
150 |
+
" <td>Mario Kart Wii</td>\n",
|
151 |
+
" <td>Wii</td>\n",
|
152 |
+
" <td>2008.0</td>\n",
|
153 |
+
" <td>Racing</td>\n",
|
154 |
+
" <td>Nintendo</td>\n",
|
155 |
+
" <td>15.68</td>\n",
|
156 |
+
" <td>12.76</td>\n",
|
157 |
+
" <td>3.79</td>\n",
|
158 |
+
" <td>3.29</td>\n",
|
159 |
+
" <td>35.52</td>\n",
|
160 |
+
" <td>82.0</td>\n",
|
161 |
+
" <td>73.0</td>\n",
|
162 |
+
" <td>8.3</td>\n",
|
163 |
+
" <td>709.0</td>\n",
|
164 |
+
" <td>Nintendo</td>\n",
|
165 |
+
" <td>E</td>\n",
|
166 |
+
" </tr>\n",
|
167 |
+
" </tbody>\n",
|
168 |
+
"</table>\n",
|
169 |
+
"</div>"
|
170 |
+
],
|
171 |
+
"text/plain": [
|
172 |
+
" Name Platform Year_of_Release Genre Publisher NA_Sales \\\n",
|
173 |
+
"0 Wii Sports Wii 2006.0 Sports Nintendo 41.36 \n",
|
174 |
+
"1 Super Mario Bros. NES 1985.0 Platform Nintendo 29.08 \n",
|
175 |
+
"2 Mario Kart Wii Wii 2008.0 Racing Nintendo 15.68 \n",
|
176 |
+
"\n",
|
177 |
+
" EU_Sales JP_Sales Other_Sales Global_Sales Critic_Score Critic_Count \\\n",
|
178 |
+
"0 28.96 3.77 8.45 82.53 76.0 51.0 \n",
|
179 |
+
"1 3.58 6.81 0.77 40.24 NaN NaN \n",
|
180 |
+
"2 12.76 3.79 3.29 35.52 82.0 73.0 \n",
|
181 |
+
"\n",
|
182 |
+
" User_Score User_Count Developer Rating \n",
|
183 |
+
"0 8 322.0 Nintendo E \n",
|
184 |
+
"1 NaN NaN NaN NaN \n",
|
185 |
+
"2 8.3 709.0 Nintendo E "
|
186 |
+
]
|
187 |
+
},
|
188 |
+
"execution_count": 4,
|
189 |
+
"metadata": {},
|
190 |
+
"output_type": "execute_result"
|
191 |
+
}
|
192 |
+
],
|
193 |
+
"source": [
|
194 |
+
"videoReview.head(3)"
|
195 |
+
]
|
196 |
+
},
|
197 |
+
{
|
198 |
+
"cell_type": "code",
|
199 |
+
"execution_count": 5,
|
200 |
+
"metadata": {},
|
201 |
+
"outputs": [
|
202 |
+
{
|
203 |
+
"name": "stdout",
|
204 |
+
"output_type": "stream",
|
205 |
+
"text": [
|
206 |
+
"----------\n",
|
207 |
+
"# print columns\n",
|
208 |
+
"0 Wii Sports\n",
|
209 |
+
"1 Super Mario Bros.\n",
|
210 |
+
"2 Mario Kart Wii\n",
|
211 |
+
"3 Wii Sports Resort\n",
|
212 |
+
"4 Pokemon Red/Pokemon Blue\n",
|
213 |
+
" ... \n",
|
214 |
+
"16714 Samurai Warriors: Sanada Maru\n",
|
215 |
+
"16715 LMA Manager 2007\n",
|
216 |
+
"16716 Haitaka no Psychedelica\n",
|
217 |
+
"16717 Spirits & Spells\n",
|
218 |
+
"16718 Winning Post 8 2016\n",
|
219 |
+
"Name: Name, Length: 16719, dtype: object\n"
|
220 |
+
]
|
221 |
+
}
|
222 |
+
],
|
223 |
+
"source": [
|
224 |
+
"print('----------')\n",
|
225 |
+
"print('# print columns')\n",
|
226 |
+
"print(videoReview['Name'])\n",
|
227 |
+
"\n"
|
228 |
+
]
|
229 |
+
},
|
230 |
+
{
|
231 |
+
"cell_type": "code",
|
232 |
+
"execution_count": 6,
|
233 |
+
"metadata": {},
|
234 |
+
"outputs": [
|
235 |
+
{
|
236 |
+
"data": {
|
237 |
+
"text/plain": [
|
238 |
+
"0 Wii Sports\n",
|
239 |
+
"1 Super Mario Bros.\n",
|
240 |
+
"2 Mario Kart Wii\n",
|
241 |
+
"3 Wii Sports Resort\n",
|
242 |
+
"4 Pokemon Red/Pokemon Blue\n",
|
243 |
+
" ... \n",
|
244 |
+
"16714 Samurai Warriors: Sanada Maru\n",
|
245 |
+
"16715 LMA Manager 2007\n",
|
246 |
+
"16716 Haitaka no Psychedelica\n",
|
247 |
+
"16717 Spirits & Spells\n",
|
248 |
+
"16718 Winning Post 8 2016\n",
|
249 |
+
"Name: Name, Length: 16719, dtype: object"
|
250 |
+
]
|
251 |
+
},
|
252 |
+
"execution_count": 6,
|
253 |
+
"metadata": {},
|
254 |
+
"output_type": "execute_result"
|
255 |
+
}
|
256 |
+
],
|
257 |
+
"source": [
|
258 |
+
"videoReview['Name']"
|
259 |
+
]
|
260 |
+
},
|
261 |
+
{
|
262 |
+
"cell_type": "code",
|
263 |
+
"execution_count": 7,
|
264 |
+
"metadata": {},
|
265 |
+
"outputs": [
|
266 |
+
{
|
267 |
+
"name": "stdout",
|
268 |
+
"output_type": "stream",
|
269 |
+
"text": [
|
270 |
+
"----------\n",
|
271 |
+
"# print columns, first 5 rows\n",
|
272 |
+
"0 Wii Sports\n",
|
273 |
+
"1 Super Mario Bros.\n",
|
274 |
+
"2 Mario Kart Wii\n",
|
275 |
+
"3 Wii Sports Resort\n",
|
276 |
+
"4 Pokemon Red/Pokemon Blue\n",
|
277 |
+
"Name: Name, dtype: object\n"
|
278 |
+
]
|
279 |
+
}
|
280 |
+
],
|
281 |
+
"source": [
|
282 |
+
"print('----------')\n",
|
283 |
+
"print('# print columns, first 5 rows')\n",
|
284 |
+
"print(videoReview['Name'][:5])\n",
|
285 |
+
"\n",
|
286 |
+
"#videoReview['Name'].head(5)\n",
|
287 |
+
"#videoReview['Name'].tail(5)\n"
|
288 |
+
]
|
289 |
+
},
|
290 |
+
{
|
291 |
+
"cell_type": "code",
|
292 |
+
"execution_count": 8,
|
293 |
+
"metadata": {},
|
294 |
+
"outputs": [
|
295 |
+
{
|
296 |
+
"name": "stdout",
|
297 |
+
"output_type": "stream",
|
298 |
+
"text": [
|
299 |
+
"----------\n",
|
300 |
+
"# Platform #\n",
|
301 |
+
"Platform\n",
|
302 |
+
"PS2 2161\n",
|
303 |
+
"DS 2152\n",
|
304 |
+
"PS3 1331\n",
|
305 |
+
"Wii 1320\n",
|
306 |
+
"X360 1262\n",
|
307 |
+
"PSP 1209\n",
|
308 |
+
"PS 1197\n",
|
309 |
+
"PC 974\n",
|
310 |
+
"XB 824\n",
|
311 |
+
"GBA 822\n",
|
312 |
+
"GC 556\n",
|
313 |
+
"3DS 520\n",
|
314 |
+
"PSV 432\n",
|
315 |
+
"PS4 393\n",
|
316 |
+
"N64 319\n",
|
317 |
+
"XOne 247\n",
|
318 |
+
"SNES 239\n",
|
319 |
+
"SAT 173\n",
|
320 |
+
"WiiU 147\n",
|
321 |
+
"2600 133\n",
|
322 |
+
"NES 98\n",
|
323 |
+
"GB 98\n",
|
324 |
+
"DC 52\n",
|
325 |
+
"GEN 29\n",
|
326 |
+
"NG 12\n",
|
327 |
+
"SCD 6\n",
|
328 |
+
"WS 6\n",
|
329 |
+
"3DO 3\n",
|
330 |
+
"TG16 2\n",
|
331 |
+
"GG 1\n",
|
332 |
+
"PCFX 1\n",
|
333 |
+
"Name: count, dtype: int64\n"
|
334 |
+
]
|
335 |
+
}
|
336 |
+
],
|
337 |
+
"source": [
|
338 |
+
"print('----------')\n",
|
339 |
+
"print('# Platform #')\n",
|
340 |
+
"print(videoReview['Platform'].value_counts())\n",
|
341 |
+
"\n"
|
342 |
+
]
|
343 |
+
},
|
344 |
+
{
|
345 |
+
"cell_type": "code",
|
346 |
+
"execution_count": 9,
|
347 |
+
"metadata": {},
|
348 |
+
"outputs": [
|
349 |
+
{
|
350 |
+
"name": "stdout",
|
351 |
+
"output_type": "stream",
|
352 |
+
"text": [
|
353 |
+
"----------\n",
|
354 |
+
"#shape\n",
|
355 |
+
"Number of rows = 16719\n",
|
356 |
+
"Number of columns = 16\n",
|
357 |
+
"Shape = (16719, 16)\n"
|
358 |
+
]
|
359 |
+
}
|
360 |
+
],
|
361 |
+
"source": [
|
362 |
+
"print('----------')\n",
|
363 |
+
"print('#shape')\n",
|
364 |
+
"print('Number of rows = ', videoReview.shape[0])\n",
|
365 |
+
"print('Number of columns = ', videoReview.shape[1])\n",
|
366 |
+
"print('Shape = ', videoReview.shape)\n",
|
367 |
+
"\n"
|
368 |
+
]
|
369 |
+
},
|
370 |
+
{
|
371 |
+
"cell_type": "code",
|
372 |
+
"execution_count": 10,
|
373 |
+
"metadata": {},
|
374 |
+
"outputs": [
|
375 |
+
{
|
376 |
+
"name": "stdout",
|
377 |
+
"output_type": "stream",
|
378 |
+
"text": [
|
379 |
+
"----------\n",
|
380 |
+
"# Name #\n",
|
381 |
+
"Name\n",
|
382 |
+
"Need for Speed: Most Wanted 12\n",
|
383 |
+
"FIFA 14 9\n",
|
384 |
+
"Ratatouille 9\n",
|
385 |
+
"LEGO Marvel Super Heroes 9\n",
|
386 |
+
"Madden NFL 07 9\n",
|
387 |
+
" ..\n",
|
388 |
+
"Jewels of the Tropical Lost Island 1\n",
|
389 |
+
"Sherlock Holmes and the Mystery of Osborne House 1\n",
|
390 |
+
"The King of Fighters '95 (CD) 1\n",
|
391 |
+
"Megamind: Mega Team Unite 1\n",
|
392 |
+
"Haitaka no Psychedelica 1\n",
|
393 |
+
"Name: count, Length: 11562, dtype: int64\n"
|
394 |
+
]
|
395 |
+
}
|
396 |
+
],
|
397 |
+
"source": [
|
398 |
+
"print('----------')\n",
|
399 |
+
"print('# Name #')\n",
|
400 |
+
"print(videoReview['Name'].value_counts())\n",
|
401 |
+
"\n"
|
402 |
+
]
|
403 |
+
},
|
404 |
+
{
|
405 |
+
"cell_type": "code",
|
406 |
+
"execution_count": 11,
|
407 |
+
"metadata": {},
|
408 |
+
"outputs": [
|
409 |
+
{
|
410 |
+
"name": "stdout",
|
411 |
+
"output_type": "stream",
|
412 |
+
"text": [
|
413 |
+
"----------\n",
|
414 |
+
"#head\n",
|
415 |
+
" Name Platform Year_of_Release Genre Publisher \\\n",
|
416 |
+
"0 Wii Sports Wii 2006.0 Sports Nintendo \n",
|
417 |
+
"1 Super Mario Bros. NES 1985.0 Platform Nintendo \n",
|
418 |
+
"2 Mario Kart Wii Wii 2008.0 Racing Nintendo \n",
|
419 |
+
"3 Wii Sports Resort Wii 2009.0 Sports Nintendo \n",
|
420 |
+
"4 Pokemon Red/Pokemon Blue GB 1996.0 Role-Playing Nintendo \n",
|
421 |
+
"\n",
|
422 |
+
" NA_Sales EU_Sales JP_Sales Other_Sales Global_Sales Critic_Score \\\n",
|
423 |
+
"0 41.36 28.96 3.77 8.45 82.53 76.0 \n",
|
424 |
+
"1 29.08 3.58 6.81 0.77 40.24 NaN \n",
|
425 |
+
"2 15.68 12.76 3.79 3.29 35.52 82.0 \n",
|
426 |
+
"3 15.61 10.93 3.28 2.95 32.77 80.0 \n",
|
427 |
+
"4 11.27 8.89 10.22 1.00 31.37 NaN \n",
|
428 |
+
"\n",
|
429 |
+
" Critic_Count User_Score User_Count Developer Rating \n",
|
430 |
+
"0 51.0 8 322.0 Nintendo E \n",
|
431 |
+
"1 NaN NaN NaN NaN NaN \n",
|
432 |
+
"2 73.0 8.3 709.0 Nintendo E \n",
|
433 |
+
"3 73.0 8 192.0 Nintendo E \n",
|
434 |
+
"4 NaN NaN NaN NaN NaN \n",
|
435 |
+
" Name Platform Year_of_Release Genre Publisher \\\n",
|
436 |
+
"0 Wii Sports Wii 2006.0 Sports Nintendo \n",
|
437 |
+
"1 Super Mario Bros. NES 1985.0 Platform Nintendo \n",
|
438 |
+
"2 Mario Kart Wii Wii 2008.0 Racing Nintendo \n",
|
439 |
+
"3 Wii Sports Resort Wii 2009.0 Sports Nintendo \n",
|
440 |
+
"4 Pokemon Red/Pokemon Blue GB 1996.0 Role-Playing Nintendo \n",
|
441 |
+
"5 Tetris GB 1989.0 Puzzle Nintendo \n",
|
442 |
+
"6 New Super Mario Bros. DS 2006.0 Platform Nintendo \n",
|
443 |
+
"7 Wii Play Wii 2006.0 Misc Nintendo \n",
|
444 |
+
"\n",
|
445 |
+
" NA_Sales EU_Sales JP_Sales Other_Sales Global_Sales Critic_Score \\\n",
|
446 |
+
"0 41.36 28.96 3.77 8.45 82.53 76.0 \n",
|
447 |
+
"1 29.08 3.58 6.81 0.77 40.24 NaN \n",
|
448 |
+
"2 15.68 12.76 3.79 3.29 35.52 82.0 \n",
|
449 |
+
"3 15.61 10.93 3.28 2.95 32.77 80.0 \n",
|
450 |
+
"4 11.27 8.89 10.22 1.00 31.37 NaN \n",
|
451 |
+
"5 23.20 2.26 4.22 0.58 30.26 NaN \n",
|
452 |
+
"6 11.28 9.14 6.50 2.88 29.80 89.0 \n",
|
453 |
+
"7 13.96 9.18 2.93 2.84 28.92 58.0 \n",
|
454 |
+
"\n",
|
455 |
+
" Critic_Count User_Score User_Count Developer Rating \n",
|
456 |
+
"0 51.0 8 322.0 Nintendo E \n",
|
457 |
+
"1 NaN NaN NaN NaN NaN \n",
|
458 |
+
"2 73.0 8.3 709.0 Nintendo E \n",
|
459 |
+
"3 73.0 8 192.0 Nintendo E \n",
|
460 |
+
"4 NaN NaN NaN NaN NaN \n",
|
461 |
+
"5 NaN NaN NaN NaN NaN \n",
|
462 |
+
"6 65.0 8.5 431.0 Nintendo E \n",
|
463 |
+
"7 41.0 6.6 129.0 Nintendo E \n"
|
464 |
+
]
|
465 |
+
}
|
466 |
+
],
|
467 |
+
"source": [
|
468 |
+
"print('----------')\n",
|
469 |
+
"print('#head')\n",
|
470 |
+
"print(videoReview.head())\n",
|
471 |
+
"print(videoReview.head(8))\n",
|
472 |
+
"\n"
|
473 |
+
]
|
474 |
+
},
|
475 |
+
{
|
476 |
+
"cell_type": "code",
|
477 |
+
"execution_count": 12,
|
478 |
+
"metadata": {},
|
479 |
+
"outputs": [
|
480 |
+
{
|
481 |
+
"name": "stdout",
|
482 |
+
"output_type": "stream",
|
483 |
+
"text": [
|
484 |
+
"----------\n",
|
485 |
+
"#tail\n",
|
486 |
+
" Name Platform Year_of_Release Genre \\\n",
|
487 |
+
"16714 Samurai Warriors: Sanada Maru PS3 2016.0 Action \n",
|
488 |
+
"16715 LMA Manager 2007 X360 2006.0 Sports \n",
|
489 |
+
"16716 Haitaka no Psychedelica PSV 2016.0 Adventure \n",
|
490 |
+
"16717 Spirits & Spells GBA 2003.0 Platform \n",
|
491 |
+
"16718 Winning Post 8 2016 PSV 2016.0 Simulation \n",
|
492 |
+
"\n",
|
493 |
+
" Publisher NA_Sales EU_Sales JP_Sales Other_Sales Global_Sales \\\n",
|
494 |
+
"16714 Tecmo Koei 0.00 0.00 0.01 0.0 0.01 \n",
|
495 |
+
"16715 Codemasters 0.00 0.01 0.00 0.0 0.01 \n",
|
496 |
+
"16716 Idea Factory 0.00 0.00 0.01 0.0 0.01 \n",
|
497 |
+
"16717 Wanadoo 0.01 0.00 0.00 0.0 0.01 \n",
|
498 |
+
"16718 Tecmo Koei 0.00 0.00 0.01 0.0 0.01 \n",
|
499 |
+
"\n",
|
500 |
+
" Critic_Score Critic_Count User_Score User_Count Developer Rating \n",
|
501 |
+
"16714 NaN NaN NaN NaN NaN NaN \n",
|
502 |
+
"16715 NaN NaN NaN NaN NaN NaN \n",
|
503 |
+
"16716 NaN NaN NaN NaN NaN NaN \n",
|
504 |
+
"16717 NaN NaN NaN NaN NaN NaN \n",
|
505 |
+
"16718 NaN NaN NaN NaN NaN NaN \n",
|
506 |
+
" Name Platform \\\n",
|
507 |
+
"16711 Aiyoku no Eustia PSV \n",
|
508 |
+
"16712 Woody Woodpecker in Crazy Castle 5 GBA \n",
|
509 |
+
"16713 SCORE International Baja 1000: The Official Game PS2 \n",
|
510 |
+
"16714 Samurai Warriors: Sanada Maru PS3 \n",
|
511 |
+
"16715 LMA Manager 2007 X360 \n",
|
512 |
+
"16716 Haitaka no Psychedelica PSV \n",
|
513 |
+
"16717 Spirits & Spells GBA \n",
|
514 |
+
"16718 Winning Post 8 2016 PSV \n",
|
515 |
+
"\n",
|
516 |
+
" Year_of_Release Genre Publisher NA_Sales EU_Sales \\\n",
|
517 |
+
"16711 2014.0 Misc dramatic create 0.00 0.00 \n",
|
518 |
+
"16712 2002.0 Platform Kemco 0.01 0.00 \n",
|
519 |
+
"16713 2008.0 Racing Activision 0.00 0.00 \n",
|
520 |
+
"16714 2016.0 Action Tecmo Koei 0.00 0.00 \n",
|
521 |
+
"16715 2006.0 Sports Codemasters 0.00 0.01 \n",
|
522 |
+
"16716 2016.0 Adventure Idea Factory 0.00 0.00 \n",
|
523 |
+
"16717 2003.0 Platform Wanadoo 0.01 0.00 \n",
|
524 |
+
"16718 2016.0 Simulation Tecmo Koei 0.00 0.00 \n",
|
525 |
+
"\n",
|
526 |
+
" JP_Sales Other_Sales Global_Sales Critic_Score Critic_Count \\\n",
|
527 |
+
"16711 0.01 0.0 0.01 NaN NaN \n",
|
528 |
+
"16712 0.00 0.0 0.01 NaN NaN \n",
|
529 |
+
"16713 0.00 0.0 0.01 NaN NaN \n",
|
530 |
+
"16714 0.01 0.0 0.01 NaN NaN \n",
|
531 |
+
"16715 0.00 0.0 0.01 NaN NaN \n",
|
532 |
+
"16716 0.01 0.0 0.01 NaN NaN \n",
|
533 |
+
"16717 0.00 0.0 0.01 NaN NaN \n",
|
534 |
+
"16718 0.01 0.0 0.01 NaN NaN \n",
|
535 |
+
"\n",
|
536 |
+
" User_Score User_Count Developer Rating \n",
|
537 |
+
"16711 NaN NaN NaN NaN \n",
|
538 |
+
"16712 NaN NaN NaN NaN \n",
|
539 |
+
"16713 NaN NaN NaN NaN \n",
|
540 |
+
"16714 NaN NaN NaN NaN \n",
|
541 |
+
"16715 NaN NaN NaN NaN \n",
|
542 |
+
"16716 NaN NaN NaN NaN \n",
|
543 |
+
"16717 NaN NaN NaN NaN \n",
|
544 |
+
"16718 NaN NaN NaN NaN \n"
|
545 |
+
]
|
546 |
+
}
|
547 |
+
],
|
548 |
+
"source": [
|
549 |
+
"print('----------')\n",
|
550 |
+
"print('#tail')\n",
|
551 |
+
"print(videoReview.tail())\n",
|
552 |
+
"print(videoReview.tail(8))\n",
|
553 |
+
"\n"
|
554 |
+
]
|
555 |
+
},
|
556 |
+
{
|
557 |
+
"cell_type": "code",
|
558 |
+
"execution_count": 13,
|
559 |
+
"metadata": {},
|
560 |
+
"outputs": [
|
561 |
+
{
|
562 |
+
"name": "stdout",
|
563 |
+
"output_type": "stream",
|
564 |
+
"text": [
|
565 |
+
"----------\n",
|
566 |
+
"#Describe\n",
|
567 |
+
" Year_of_Release NA_Sales EU_Sales JP_Sales \\\n",
|
568 |
+
"count 16450.000000 16719.000000 16719.000000 16719.000000 \n",
|
569 |
+
"mean 2006.487356 0.263330 0.145025 0.077602 \n",
|
570 |
+
"std 5.878995 0.813514 0.503283 0.308818 \n",
|
571 |
+
"min 1980.000000 0.000000 0.000000 0.000000 \n",
|
572 |
+
"25% 2003.000000 0.000000 0.000000 0.000000 \n",
|
573 |
+
"50% 2007.000000 0.080000 0.020000 0.000000 \n",
|
574 |
+
"75% 2010.000000 0.240000 0.110000 0.040000 \n",
|
575 |
+
"max 2020.000000 41.360000 28.960000 10.220000 \n",
|
576 |
+
"\n",
|
577 |
+
" Other_Sales Global_Sales Critic_Score Critic_Count User_Count \n",
|
578 |
+
"count 16719.000000 16719.000000 8137.000000 8137.000000 7590.000000 \n",
|
579 |
+
"mean 0.047332 0.533543 68.967679 26.360821 162.229908 \n",
|
580 |
+
"std 0.186710 1.547935 13.938165 18.980495 561.282326 \n",
|
581 |
+
"min 0.000000 0.010000 13.000000 3.000000 4.000000 \n",
|
582 |
+
"25% 0.000000 0.060000 60.000000 12.000000 10.000000 \n",
|
583 |
+
"50% 0.010000 0.170000 71.000000 21.000000 24.000000 \n",
|
584 |
+
"75% 0.030000 0.470000 79.000000 36.000000 81.000000 \n",
|
585 |
+
"max 10.570000 82.530000 98.000000 113.000000 10665.000000 \n",
|
586 |
+
"count 16719\n",
|
587 |
+
"unique 31\n",
|
588 |
+
"top PS2\n",
|
589 |
+
"freq 2161\n",
|
590 |
+
"Name: Platform, dtype: object\n",
|
591 |
+
"count 16450.000000\n",
|
592 |
+
"mean 2006.487356\n",
|
593 |
+
"std 5.878995\n",
|
594 |
+
"min 1980.000000\n",
|
595 |
+
"25% 2003.000000\n",
|
596 |
+
"50% 2007.000000\n",
|
597 |
+
"75% 2010.000000\n",
|
598 |
+
"max 2020.000000\n",
|
599 |
+
"Name: Year_of_Release, dtype: float64\n"
|
600 |
+
]
|
601 |
+
}
|
602 |
+
],
|
603 |
+
"source": [
|
604 |
+
"print('----------')\n",
|
605 |
+
"print('#Describe')\n",
|
606 |
+
"print(videoReview.describe()) # calculates measures of central tendency\n",
|
607 |
+
"print(videoReview['Platform'].describe())\n",
|
608 |
+
"print(videoReview['Year_of_Release'].describe())\n",
|
609 |
+
"\n"
|
610 |
+
]
|
611 |
+
},
|
612 |
+
{
|
613 |
+
"cell_type": "code",
|
614 |
+
"execution_count": 14,
|
615 |
+
"metadata": {},
|
616 |
+
"outputs": [
|
617 |
+
{
|
618 |
+
"name": "stdout",
|
619 |
+
"output_type": "stream",
|
620 |
+
"text": [
|
621 |
+
"----------\n",
|
622 |
+
"#info\n",
|
623 |
+
"<class 'pandas.core.frame.DataFrame'>\n",
|
624 |
+
"RangeIndex: 16719 entries, 0 to 16718\n",
|
625 |
+
"Data columns (total 16 columns):\n",
|
626 |
+
" # Column Non-Null Count Dtype \n",
|
627 |
+
"--- ------ -------------- ----- \n",
|
628 |
+
" 0 Name 16717 non-null object \n",
|
629 |
+
" 1 Platform 16719 non-null object \n",
|
630 |
+
" 2 Year_of_Release 16450 non-null float64\n",
|
631 |
+
" 3 Genre 16717 non-null object \n",
|
632 |
+
" 4 Publisher 16665 non-null object \n",
|
633 |
+
" 5 NA_Sales 16719 non-null float64\n",
|
634 |
+
" 6 EU_Sales 16719 non-null float64\n",
|
635 |
+
" 7 JP_Sales 16719 non-null float64\n",
|
636 |
+
" 8 Other_Sales 16719 non-null float64\n",
|
637 |
+
" 9 Global_Sales 16719 non-null float64\n",
|
638 |
+
" 10 Critic_Score 8137 non-null float64\n",
|
639 |
+
" 11 Critic_Count 8137 non-null float64\n",
|
640 |
+
" 12 User_Score 10015 non-null object \n",
|
641 |
+
" 13 User_Count 7590 non-null float64\n",
|
642 |
+
" 14 Developer 10096 non-null object \n",
|
643 |
+
" 15 Rating 9950 non-null object \n",
|
644 |
+
"dtypes: float64(9), object(7)\n",
|
645 |
+
"memory usage: 2.0+ MB\n",
|
646 |
+
"None\n"
|
647 |
+
]
|
648 |
+
}
|
649 |
+
],
|
650 |
+
"source": [
|
651 |
+
"print('----------')\n",
|
652 |
+
"print('#info')\n",
|
653 |
+
"print(videoReview.info()) # memory footprint and datatypes"
|
654 |
+
]
|
655 |
+
},
|
656 |
+
{
|
657 |
+
"cell_type": "code",
|
658 |
+
"execution_count": 15,
|
659 |
+
"metadata": {},
|
660 |
+
"outputs": [
|
661 |
+
{
|
662 |
+
"name": "stdout",
|
663 |
+
"output_type": "stream",
|
664 |
+
"text": [
|
665 |
+
"----------\n",
|
666 |
+
"Transpose - Describe\n",
|
667 |
+
" count mean std min 25% 50% \\\n",
|
668 |
+
"Year_of_Release 16450.0 2006.487356 5.878995 1980.00 2003.00 2007.00 \n",
|
669 |
+
"NA_Sales 16719.0 0.263330 0.813514 0.00 0.00 0.08 \n",
|
670 |
+
"EU_Sales 16719.0 0.145025 0.503283 0.00 0.00 0.02 \n",
|
671 |
+
"JP_Sales 16719.0 0.077602 0.308818 0.00 0.00 0.00 \n",
|
672 |
+
"Other_Sales 16719.0 0.047332 0.186710 0.00 0.00 0.01 \n",
|
673 |
+
"Global_Sales 16719.0 0.533543 1.547935 0.01 0.06 0.17 \n",
|
674 |
+
"Critic_Score 8137.0 68.967679 13.938165 13.00 60.00 71.00 \n",
|
675 |
+
"Critic_Count 8137.0 26.360821 18.980495 3.00 12.00 21.00 \n",
|
676 |
+
"User_Count 7590.0 162.229908 561.282326 4.00 10.00 24.00 \n",
|
677 |
+
"\n",
|
678 |
+
" 75% max \n",
|
679 |
+
"Year_of_Release 2010.00 2020.00 \n",
|
680 |
+
"NA_Sales 0.24 41.36 \n",
|
681 |
+
"EU_Sales 0.11 28.96 \n",
|
682 |
+
"JP_Sales 0.04 10.22 \n",
|
683 |
+
"Other_Sales 0.03 10.57 \n",
|
684 |
+
"Global_Sales 0.47 82.53 \n",
|
685 |
+
"Critic_Score 79.00 98.00 \n",
|
686 |
+
"Critic_Count 36.00 113.00 \n",
|
687 |
+
"User_Count 81.00 10665.00 \n"
|
688 |
+
]
|
689 |
+
}
|
690 |
+
],
|
691 |
+
"source": [
|
692 |
+
"print('----------')\n",
|
693 |
+
"print('Transpose - Describe')\n",
|
694 |
+
"print(videoReview.describe().transpose())\n",
|
695 |
+
"\n"
|
696 |
+
]
|
697 |
+
},
|
698 |
+
{
|
699 |
+
"cell_type": "code",
|
700 |
+
"execution_count": 16,
|
701 |
+
"metadata": {},
|
702 |
+
"outputs": [
|
703 |
+
{
|
704 |
+
"name": "stdout",
|
705 |
+
"output_type": "stream",
|
706 |
+
"text": [
|
707 |
+
"----------\n",
|
708 |
+
"Iterate some rows from DF\n",
|
709 |
+
"#### Printing row ####\n",
|
710 |
+
"Wii Sports\n",
|
711 |
+
"#### Printing row ####\n",
|
712 |
+
"Super Mario Bros.\n",
|
713 |
+
"#### Printing row ####\n",
|
714 |
+
"Mario Kart Wii\n",
|
715 |
+
"#### Printing row ####\n",
|
716 |
+
"Wii Sports Resort\n",
|
717 |
+
"#### Printing row ####\n",
|
718 |
+
"Pokemon Red/Pokemon Blue\n",
|
719 |
+
"#### Printing row ####\n",
|
720 |
+
"Tetris\n",
|
721 |
+
"#### Printing row ####\n",
|
722 |
+
"New Super Mario Bros.\n",
|
723 |
+
"#### Printing row ####\n",
|
724 |
+
"Wii Play\n",
|
725 |
+
"#### Printing row ####\n",
|
726 |
+
"New Super Mario Bros. Wii\n"
|
727 |
+
]
|
728 |
+
}
|
729 |
+
],
|
730 |
+
"source": [
|
731 |
+
"print('----------')\n",
|
732 |
+
"print('Iterate some rows from DF')\n",
|
733 |
+
"for i, row in videoReview[:9].iterrows():\n",
|
734 |
+
" print('#### Printing row ####')\n",
|
735 |
+
" print(row['Name'])\n",
|
736 |
+
"\n"
|
737 |
+
]
|
738 |
+
},
|
739 |
+
{
|
740 |
+
"cell_type": "code",
|
741 |
+
"execution_count": 17,
|
742 |
+
"metadata": {},
|
743 |
+
"outputs": [
|
744 |
+
{
|
745 |
+
"name": "stdout",
|
746 |
+
"output_type": "stream",
|
747 |
+
"text": [
|
748 |
+
"----------\n",
|
749 |
+
"Group by Year, Platform by Count : \n",
|
750 |
+
" Name Genre Publisher NA_Sales EU_Sales \\\n",
|
751 |
+
"Year_of_Release Platform \n",
|
752 |
+
"1980.0 2600 9 9 9 9 9 \n",
|
753 |
+
"1981.0 2600 46 46 46 46 46 \n",
|
754 |
+
"1982.0 2600 36 36 36 36 36 \n",
|
755 |
+
"1983.0 2600 11 11 11 11 11 \n",
|
756 |
+
" NES 6 6 6 6 6 \n",
|
757 |
+
"... ... ... ... ... ... \n",
|
758 |
+
"2016.0 X360 13 13 13 13 13 \n",
|
759 |
+
" XOne 87 87 87 87 87 \n",
|
760 |
+
"2017.0 PS4 1 1 1 1 1 \n",
|
761 |
+
" PSV 2 2 2 2 2 \n",
|
762 |
+
"2020.0 DS 1 1 1 1 1 \n",
|
763 |
+
"\n",
|
764 |
+
" JP_Sales Other_Sales Global_Sales Critic_Score \\\n",
|
765 |
+
"Year_of_Release Platform \n",
|
766 |
+
"1980.0 2600 9 9 9 0 \n",
|
767 |
+
"1981.0 2600 46 46 46 0 \n",
|
768 |
+
"1982.0 2600 36 36 36 0 \n",
|
769 |
+
"1983.0 2600 11 11 11 0 \n",
|
770 |
+
" NES 6 6 6 0 \n",
|
771 |
+
"... ... ... ... ... \n",
|
772 |
+
"2016.0 X360 13 13 13 0 \n",
|
773 |
+
" XOne 87 87 87 60 \n",
|
774 |
+
"2017.0 PS4 1 1 1 0 \n",
|
775 |
+
" PSV 2 2 2 0 \n",
|
776 |
+
"2020.0 DS 1 1 1 0 \n",
|
777 |
+
"\n",
|
778 |
+
" Critic_Count User_Score User_Count Developer \\\n",
|
779 |
+
"Year_of_Release Platform \n",
|
780 |
+
"1980.0 2600 0 0 0 0 \n",
|
781 |
+
"1981.0 2600 0 0 0 0 \n",
|
782 |
+
"1982.0 2600 0 0 0 0 \n",
|
783 |
+
"1983.0 2600 0 0 0 0 \n",
|
784 |
+
" NES 0 0 0 0 \n",
|
785 |
+
"... ... ... ... ... \n",
|
786 |
+
"2016.0 X360 0 12 7 12 \n",
|
787 |
+
" XOne 60 74 66 74 \n",
|
788 |
+
"2017.0 PS4 0 0 0 0 \n",
|
789 |
+
" PSV 0 0 0 0 \n",
|
790 |
+
"2020.0 DS 0 1 0 1 \n",
|
791 |
+
"\n",
|
792 |
+
" Rating \n",
|
793 |
+
"Year_of_Release Platform \n",
|
794 |
+
"1980.0 2600 0 \n",
|
795 |
+
"1981.0 2600 0 \n",
|
796 |
+
"1982.0 2600 0 \n",
|
797 |
+
"1983.0 2600 0 \n",
|
798 |
+
" NES 0 \n",
|
799 |
+
"... ... \n",
|
800 |
+
"2016.0 X360 12 \n",
|
801 |
+
" XOne 71 \n",
|
802 |
+
"2017.0 PS4 0 \n",
|
803 |
+
" PSV 0 \n",
|
804 |
+
"2020.0 DS 1 \n",
|
805 |
+
"\n",
|
806 |
+
"[241 rows x 14 columns]\n"
|
807 |
+
]
|
808 |
+
}
|
809 |
+
],
|
810 |
+
"source": [
|
811 |
+
"#Subsetting and ordering Pandas\n",
|
812 |
+
"print('----------')\n",
|
813 |
+
"print('Group by Year, Platform by Count : ')\n",
|
814 |
+
"print(videoReview.groupby(['Year_of_Release','Platform']).count())\n",
|
815 |
+
"\n"
|
816 |
+
]
|
817 |
+
},
|
818 |
+
{
|
819 |
+
"cell_type": "code",
|
820 |
+
"execution_count": 18,
|
821 |
+
"metadata": {},
|
822 |
+
"outputs": [
|
823 |
+
{
|
824 |
+
"name": "stdout",
|
825 |
+
"output_type": "stream",
|
826 |
+
"text": [
|
827 |
+
"Group by : for Year=2016 group by Platform and count Global Sales\n",
|
828 |
+
"Platform\n",
|
829 |
+
"3DS 15.14\n",
|
830 |
+
"PC 5.27\n",
|
831 |
+
"PS3 3.58\n",
|
832 |
+
"PS4 69.29\n",
|
833 |
+
"PSV 4.27\n",
|
834 |
+
"Wii 0.18\n",
|
835 |
+
"WiiU 4.58\n",
|
836 |
+
"X360 1.52\n",
|
837 |
+
"XOne 26.27\n",
|
838 |
+
"Name: Global_Sales, dtype: float64\n"
|
839 |
+
]
|
840 |
+
}
|
841 |
+
],
|
842 |
+
"source": [
|
843 |
+
"print('Group by : for Year=2016 group by Platform and count Global Sales')\n",
|
844 |
+
"print(videoReview[videoReview.Year_of_Release==2016.0].groupby('Platform')['Global_Sales'].sum())"
|
845 |
+
]
|
846 |
+
},
|
847 |
+
{
|
848 |
+
"cell_type": "code",
|
849 |
+
"execution_count": 19,
|
850 |
+
"metadata": {},
|
851 |
+
"outputs": [
|
852 |
+
{
|
853 |
+
"name": "stdout",
|
854 |
+
"output_type": "stream",
|
855 |
+
"text": [
|
856 |
+
"----------\n",
|
857 |
+
"Sorting and Ordering\n",
|
858 |
+
" Name Platform Year_of_Release Genre Publisher \\\n",
|
859 |
+
"15 Wii Fit Plus Wii 2009.0 Sports Nintendo \n",
|
860 |
+
"8 New Super Mario Bros. Wii Wii 2009.0 Platform Nintendo \n",
|
861 |
+
"7 Wii Play Wii 2006.0 Misc Nintendo \n",
|
862 |
+
"3 Wii Sports Resort Wii 2009.0 Sports Nintendo \n",
|
863 |
+
"2 Mario Kart Wii Wii 2008.0 Racing Nintendo \n",
|
864 |
+
"0 Wii Sports Wii 2006.0 Sports Nintendo \n",
|
865 |
+
"\n",
|
866 |
+
" NA_Sales EU_Sales JP_Sales Other_Sales Global_Sales Critic_Score \\\n",
|
867 |
+
"15 9.01 8.49 2.53 1.77 21.79 80.0 \n",
|
868 |
+
"8 14.44 6.94 4.70 2.24 28.32 87.0 \n",
|
869 |
+
"7 13.96 9.18 2.93 2.84 28.92 58.0 \n",
|
870 |
+
"3 15.61 10.93 3.28 2.95 32.77 80.0 \n",
|
871 |
+
"2 15.68 12.76 3.79 3.29 35.52 82.0 \n",
|
872 |
+
"0 41.36 28.96 3.77 8.45 82.53 76.0 \n",
|
873 |
+
"\n",
|
874 |
+
" Critic_Count User_Score User_Count Developer Rating \n",
|
875 |
+
"15 33.0 7.4 52.0 Nintendo E \n",
|
876 |
+
"8 80.0 8.4 594.0 Nintendo E \n",
|
877 |
+
"7 41.0 6.6 129.0 Nintendo E \n",
|
878 |
+
"3 73.0 8 192.0 Nintendo E \n",
|
879 |
+
"2 73.0 8.3 709.0 Nintendo E \n",
|
880 |
+
"0 51.0 8 322.0 Nintendo E \n"
|
881 |
+
]
|
882 |
+
}
|
883 |
+
],
|
884 |
+
"source": [
|
885 |
+
"#More Panda functions - Sorting & Ordering\n",
|
886 |
+
"print('----------')\n",
|
887 |
+
"print('Sorting and Ordering')\n",
|
888 |
+
"df = videoReview[(videoReview.Platform=='Wii') & (videoReview.NA_Sales>9)]\n",
|
889 |
+
"print(df.sort_values('Global_Sales', ascending=True))"
|
890 |
+
]
|
891 |
+
},
|
892 |
+
{
|
893 |
+
"cell_type": "code",
|
894 |
+
"execution_count": 20,
|
895 |
+
"metadata": {},
|
896 |
+
"outputs": [],
|
897 |
+
"source": [
|
898 |
+
"#Writing a Panda to a CSV file\n",
|
899 |
+
"df.to_csv('/Users/brendan.tierney/video_games_wii.csv', sep=',')"
|
900 |
+
]
|
901 |
+
},
|
902 |
+
{
|
903 |
+
"cell_type": "code",
|
904 |
+
"execution_count": 21,
|
905 |
+
"metadata": {},
|
906 |
+
"outputs": [],
|
907 |
+
"source": [
|
908 |
+
"#Go inspect the CSV file created.\n",
|
909 |
+
"#Is it what you expected?\n",
|
910 |
+
"#Could the output be formatted differently?\n",
|
911 |
+
"#If so, look up the Pandas 'to_csv' function to see what you can change\n"
|
912 |
+
]
|
913 |
+
},
|
914 |
+
{
|
915 |
+
"cell_type": "code",
|
916 |
+
"execution_count": 22,
|
917 |
+
"metadata": {},
|
918 |
+
"outputs": [
|
919 |
+
{
|
920 |
+
"name": "stdout",
|
921 |
+
"output_type": "stream",
|
922 |
+
"text": [
|
923 |
+
"----------\n",
|
924 |
+
"Plotting - Histogram\n"
|
925 |
+
]
|
926 |
+
},
|
927 |
+
{
|
928 |
+
"data": {
|
929 |
+
"text/plain": [
|
930 |
+
"<Axes: ylabel='Frequency'>"
|
931 |
+
]
|
932 |
+
},
|
933 |
+
"execution_count": 22,
|
934 |
+
"metadata": {},
|
935 |
+
"output_type": "execute_result"
|
936 |
+
},
|
937 |
+
{
|
938 |
+
"data": {
|
939 |
+
"image/png": "",
|
940 |
+
"text/plain": [
|
941 |
+
"<Figure size 640x480 with 1 Axes>"
|
942 |
+
]
|
943 |
+
},
|
944 |
+
"metadata": {},
|
945 |
+
"output_type": "display_data"
|
946 |
+
}
|
947 |
+
],
|
948 |
+
"source": [
|
949 |
+
"#Creating Graphs for a Panda\n",
|
950 |
+
"#plotting\n",
|
951 |
+
"print('----------')\n",
|
952 |
+
"print('Plotting - Histogram')\n",
|
953 |
+
"videoReview['Year_of_Release'].plot(kind='hist')"
|
954 |
+
]
|
955 |
+
},
|
956 |
+
{
|
957 |
+
"cell_type": "code",
|
958 |
+
"execution_count": 23,
|
959 |
+
"metadata": {},
|
960 |
+
"outputs": [],
|
961 |
+
"source": [
|
962 |
+
"#If the chart does not appear in the above cell, just go back to that cell and rerun. It should appear now."
|
963 |
+
]
|
964 |
+
},
|
965 |
+
{
|
966 |
+
"cell_type": "code",
|
967 |
+
"execution_count": 24,
|
968 |
+
"metadata": {},
|
969 |
+
"outputs": [],
|
970 |
+
"source": [
|
971 |
+
"#Can you create any other plots?"
|
972 |
+
]
|
973 |
+
},
|
974 |
+
{
|
975 |
+
"cell_type": "code",
|
976 |
+
"execution_count": null,
|
977 |
+
"metadata": {},
|
978 |
+
"outputs": [],
|
979 |
+
"source": []
|
980 |
+
},
|
981 |
+
{
|
982 |
+
"cell_type": "code",
|
983 |
+
"execution_count": null,
|
984 |
+
"metadata": {},
|
985 |
+
"outputs": [],
|
986 |
+
"source": []
|
987 |
+
},
|
988 |
+
{
|
989 |
+
"cell_type": "code",
|
990 |
+
"execution_count": null,
|
991 |
+
"metadata": {},
|
992 |
+
"outputs": [],
|
993 |
+
"source": []
|
994 |
+
},
|
995 |
+
{
|
996 |
+
"cell_type": "code",
|
997 |
+
"execution_count": null,
|
998 |
+
"metadata": {},
|
999 |
+
"outputs": [],
|
1000 |
+
"source": []
|
1001 |
+
}
|
1002 |
+
],
|
1003 |
+
"metadata": {
|
1004 |
+
"kernelspec": {
|
1005 |
+
"display_name": "Python 3 (ipykernel)",
|
1006 |
+
"language": "python",
|
1007 |
+
"name": "python3"
|
1008 |
+
},
|
1009 |
+
"language_info": {
|
1010 |
+
"codemirror_mode": {
|
1011 |
+
"name": "ipython",
|
1012 |
+
"version": 3
|
1013 |
+
},
|
1014 |
+
"file_extension": ".py",
|
1015 |
+
"mimetype": "text/x-python",
|
1016 |
+
"name": "python",
|
1017 |
+
"nbconvert_exporter": "python",
|
1018 |
+
"pygments_lexer": "ipython3",
|
1019 |
+
"version": "3.12.4"
|
1020 |
+
}
|
1021 |
+
},
|
1022 |
+
"nbformat": 4,
|
1023 |
+
"nbformat_minor": 4
|
1024 |
+
}
|
Data Analitics/Week 3/.ipynb_checkpoints/Video_Games_Sales_as_at_22_Dec_2016-checkpoint.csv
ADDED
The diff for this file is too large to render.
See raw diff
|
|
Data Analitics/Week 3/.ipynb_checkpoints/train-checkpoint.csv
ADDED
@@ -0,0 +1,892 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
|
2 |
+
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S
|
3 |
+
2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38,1,0,PC 17599,71.2833,C85,C
|
4 |
+
3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2. 3101282,7.925,,S
|
5 |
+
4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35,1,0,113803,53.1,C123,S
|
6 |
+
5,0,3,"Allen, Mr. William Henry",male,35,0,0,373450,8.05,,S
|
7 |
+
6,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q
|
8 |
+
7,0,1,"McCarthy, Mr. Timothy J",male,54,0,0,17463,51.8625,E46,S
|
9 |
+
8,0,3,"Palsson, Master. Gosta Leonard",male,2,3,1,349909,21.075,,S
|
10 |
+
9,1,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female,27,0,2,347742,11.1333,,S
|
11 |
+
10,1,2,"Nasser, Mrs. Nicholas (Adele Achem)",female,14,1,0,237736,30.0708,,C
|
12 |
+
11,1,3,"Sandstrom, Miss. Marguerite Rut",female,4,1,1,PP 9549,16.7,G6,S
|
13 |
+
12,1,1,"Bonnell, Miss. Elizabeth",female,58,0,0,113783,26.55,C103,S
|
14 |
+
13,0,3,"Saundercock, Mr. William Henry",male,20,0,0,A/5. 2151,8.05,,S
|
15 |
+
14,0,3,"Andersson, Mr. Anders Johan",male,39,1,5,347082,31.275,,S
|
16 |
+
15,0,3,"Vestrom, Miss. Hulda Amanda Adolfina",female,14,0,0,350406,7.8542,,S
|
17 |
+
16,1,2,"Hewlett, Mrs. (Mary D Kingcome) ",female,55,0,0,248706,16,,S
|
18 |
+
17,0,3,"Rice, Master. Eugene",male,2,4,1,382652,29.125,,Q
|
19 |
+
18,1,2,"Williams, Mr. Charles Eugene",male,,0,0,244373,13,,S
|
20 |
+
19,0,3,"Vander Planke, Mrs. Julius (Emelia Maria Vandemoortele)",female,31,1,0,345763,18,,S
|
21 |
+
20,1,3,"Masselmani, Mrs. Fatima",female,,0,0,2649,7.225,,C
|
22 |
+
21,0,2,"Fynney, Mr. Joseph J",male,35,0,0,239865,26,,S
|
23 |
+
22,1,2,"Beesley, Mr. Lawrence",male,34,0,0,248698,13,D56,S
|
24 |
+
23,1,3,"McGowan, Miss. Anna ""Annie""",female,15,0,0,330923,8.0292,,Q
|
25 |
+
24,1,1,"Sloper, Mr. William Thompson",male,28,0,0,113788,35.5,A6,S
|
26 |
+
25,0,3,"Palsson, Miss. Torborg Danira",female,8,3,1,349909,21.075,,S
|
27 |
+
26,1,3,"Asplund, Mrs. Carl Oscar (Selma Augusta Emilia Johansson)",female,38,1,5,347077,31.3875,,S
|
28 |
+
27,0,3,"Emir, Mr. Farred Chehab",male,,0,0,2631,7.225,,C
|
29 |
+
28,0,1,"Fortune, Mr. Charles Alexander",male,19,3,2,19950,263,C23 C25 C27,S
|
30 |
+
29,1,3,"O'Dwyer, Miss. Ellen ""Nellie""",female,,0,0,330959,7.8792,,Q
|
31 |
+
30,0,3,"Todoroff, Mr. Lalio",male,,0,0,349216,7.8958,,S
|
32 |
+
31,0,1,"Uruchurtu, Don. Manuel E",male,40,0,0,PC 17601,27.7208,,C
|
33 |
+
32,1,1,"Spencer, Mrs. William Augustus (Marie Eugenie)",female,,1,0,PC 17569,146.5208,B78,C
|
34 |
+
33,1,3,"Glynn, Miss. Mary Agatha",female,,0,0,335677,7.75,,Q
|
35 |
+
34,0,2,"Wheadon, Mr. Edward H",male,66,0,0,C.A. 24579,10.5,,S
|
36 |
+
35,0,1,"Meyer, Mr. Edgar Joseph",male,28,1,0,PC 17604,82.1708,,C
|
37 |
+
36,0,1,"Holverson, Mr. Alexander Oskar",male,42,1,0,113789,52,,S
|
38 |
+
37,1,3,"Mamee, Mr. Hanna",male,,0,0,2677,7.2292,,C
|
39 |
+
38,0,3,"Cann, Mr. Ernest Charles",male,21,0,0,A./5. 2152,8.05,,S
|
40 |
+
39,0,3,"Vander Planke, Miss. Augusta Maria",female,18,2,0,345764,18,,S
|
41 |
+
40,1,3,"Nicola-Yarred, Miss. Jamila",female,14,1,0,2651,11.2417,,C
|
42 |
+
41,0,3,"Ahlin, Mrs. Johan (Johanna Persdotter Larsson)",female,40,1,0,7546,9.475,,S
|
43 |
+
42,0,2,"Turpin, Mrs. William John Robert (Dorothy Ann Wonnacott)",female,27,1,0,11668,21,,S
|
44 |
+
43,0,3,"Kraeff, Mr. Theodor",male,,0,0,349253,7.8958,,C
|
45 |
+
44,1,2,"Laroche, Miss. Simonne Marie Anne Andree",female,3,1,2,SC/Paris 2123,41.5792,,C
|
46 |
+
45,1,3,"Devaney, Miss. Margaret Delia",female,19,0,0,330958,7.8792,,Q
|
47 |
+
46,0,3,"Rogers, Mr. William John",male,,0,0,S.C./A.4. 23567,8.05,,S
|
48 |
+
47,0,3,"Lennon, Mr. Denis",male,,1,0,370371,15.5,,Q
|
49 |
+
48,1,3,"O'Driscoll, Miss. Bridget",female,,0,0,14311,7.75,,Q
|
50 |
+
49,0,3,"Samaan, Mr. Youssef",male,,2,0,2662,21.6792,,C
|
51 |
+
50,0,3,"Arnold-Franchi, Mrs. Josef (Josefine Franchi)",female,18,1,0,349237,17.8,,S
|
52 |
+
51,0,3,"Panula, Master. Juha Niilo",male,7,4,1,3101295,39.6875,,S
|
53 |
+
52,0,3,"Nosworthy, Mr. Richard Cater",male,21,0,0,A/4. 39886,7.8,,S
|
54 |
+
53,1,1,"Harper, Mrs. Henry Sleeper (Myna Haxtun)",female,49,1,0,PC 17572,76.7292,D33,C
|
55 |
+
54,1,2,"Faunthorpe, Mrs. Lizzie (Elizabeth Anne Wilkinson)",female,29,1,0,2926,26,,S
|
56 |
+
55,0,1,"Ostby, Mr. Engelhart Cornelius",male,65,0,1,113509,61.9792,B30,C
|
57 |
+
56,1,1,"Woolner, Mr. Hugh",male,,0,0,19947,35.5,C52,S
|
58 |
+
57,1,2,"Rugg, Miss. Emily",female,21,0,0,C.A. 31026,10.5,,S
|
59 |
+
58,0,3,"Novel, Mr. Mansouer",male,28.5,0,0,2697,7.2292,,C
|
60 |
+
59,1,2,"West, Miss. Constance Mirium",female,5,1,2,C.A. 34651,27.75,,S
|
61 |
+
60,0,3,"Goodwin, Master. William Frederick",male,11,5,2,CA 2144,46.9,,S
|
62 |
+
61,0,3,"Sirayanian, Mr. Orsen",male,22,0,0,2669,7.2292,,C
|
63 |
+
62,1,1,"Icard, Miss. Amelie",female,38,0,0,113572,80,B28,
|
64 |
+
63,0,1,"Harris, Mr. Henry Birkhardt",male,45,1,0,36973,83.475,C83,S
|
65 |
+
64,0,3,"Skoog, Master. Harald",male,4,3,2,347088,27.9,,S
|
66 |
+
65,0,1,"Stewart, Mr. Albert A",male,,0,0,PC 17605,27.7208,,C
|
67 |
+
66,1,3,"Moubarek, Master. Gerios",male,,1,1,2661,15.2458,,C
|
68 |
+
67,1,2,"Nye, Mrs. (Elizabeth Ramell)",female,29,0,0,C.A. 29395,10.5,F33,S
|
69 |
+
68,0,3,"Crease, Mr. Ernest James",male,19,0,0,S.P. 3464,8.1583,,S
|
70 |
+
69,1,3,"Andersson, Miss. Erna Alexandra",female,17,4,2,3101281,7.925,,S
|
71 |
+
70,0,3,"Kink, Mr. Vincenz",male,26,2,0,315151,8.6625,,S
|
72 |
+
71,0,2,"Jenkin, Mr. Stephen Curnow",male,32,0,0,C.A. 33111,10.5,,S
|
73 |
+
72,0,3,"Goodwin, Miss. Lillian Amy",female,16,5,2,CA 2144,46.9,,S
|
74 |
+
73,0,2,"Hood, Mr. Ambrose Jr",male,21,0,0,S.O.C. 14879,73.5,,S
|
75 |
+
74,0,3,"Chronopoulos, Mr. Apostolos",male,26,1,0,2680,14.4542,,C
|
76 |
+
75,1,3,"Bing, Mr. Lee",male,32,0,0,1601,56.4958,,S
|
77 |
+
76,0,3,"Moen, Mr. Sigurd Hansen",male,25,0,0,348123,7.65,F G73,S
|
78 |
+
77,0,3,"Staneff, Mr. Ivan",male,,0,0,349208,7.8958,,S
|
79 |
+
78,0,3,"Moutal, Mr. Rahamin Haim",male,,0,0,374746,8.05,,S
|
80 |
+
79,1,2,"Caldwell, Master. Alden Gates",male,0.83,0,2,248738,29,,S
|
81 |
+
80,1,3,"Dowdell, Miss. Elizabeth",female,30,0,0,364516,12.475,,S
|
82 |
+
81,0,3,"Waelens, Mr. Achille",male,22,0,0,345767,9,,S
|
83 |
+
82,1,3,"Sheerlinck, Mr. Jan Baptist",male,29,0,0,345779,9.5,,S
|
84 |
+
83,1,3,"McDermott, Miss. Brigdet Delia",female,,0,0,330932,7.7875,,Q
|
85 |
+
84,0,1,"Carrau, Mr. Francisco M",male,28,0,0,113059,47.1,,S
|
86 |
+
85,1,2,"Ilett, Miss. Bertha",female,17,0,0,SO/C 14885,10.5,,S
|
87 |
+
86,1,3,"Backstrom, Mrs. Karl Alfred (Maria Mathilda Gustafsson)",female,33,3,0,3101278,15.85,,S
|
88 |
+
87,0,3,"Ford, Mr. William Neal",male,16,1,3,W./C. 6608,34.375,,S
|
89 |
+
88,0,3,"Slocovski, Mr. Selman Francis",male,,0,0,SOTON/OQ 392086,8.05,,S
|
90 |
+
89,1,1,"Fortune, Miss. Mabel Helen",female,23,3,2,19950,263,C23 C25 C27,S
|
91 |
+
90,0,3,"Celotti, Mr. Francesco",male,24,0,0,343275,8.05,,S
|
92 |
+
91,0,3,"Christmann, Mr. Emil",male,29,0,0,343276,8.05,,S
|
93 |
+
92,0,3,"Andreasson, Mr. Paul Edvin",male,20,0,0,347466,7.8542,,S
|
94 |
+
93,0,1,"Chaffee, Mr. Herbert Fuller",male,46,1,0,W.E.P. 5734,61.175,E31,S
|
95 |
+
94,0,3,"Dean, Mr. Bertram Frank",male,26,1,2,C.A. 2315,20.575,,S
|
96 |
+
95,0,3,"Coxon, Mr. Daniel",male,59,0,0,364500,7.25,,S
|
97 |
+
96,0,3,"Shorney, Mr. Charles Joseph",male,,0,0,374910,8.05,,S
|
98 |
+
97,0,1,"Goldschmidt, Mr. George B",male,71,0,0,PC 17754,34.6542,A5,C
|
99 |
+
98,1,1,"Greenfield, Mr. William Bertram",male,23,0,1,PC 17759,63.3583,D10 D12,C
|
100 |
+
99,1,2,"Doling, Mrs. John T (Ada Julia Bone)",female,34,0,1,231919,23,,S
|
101 |
+
100,0,2,"Kantor, Mr. Sinai",male,34,1,0,244367,26,,S
|
102 |
+
101,0,3,"Petranec, Miss. Matilda",female,28,0,0,349245,7.8958,,S
|
103 |
+
102,0,3,"Petroff, Mr. Pastcho (""Pentcho"")",male,,0,0,349215,7.8958,,S
|
104 |
+
103,0,1,"White, Mr. Richard Frasar",male,21,0,1,35281,77.2875,D26,S
|
105 |
+
104,0,3,"Johansson, Mr. Gustaf Joel",male,33,0,0,7540,8.6542,,S
|
106 |
+
105,0,3,"Gustafsson, Mr. Anders Vilhelm",male,37,2,0,3101276,7.925,,S
|
107 |
+
106,0,3,"Mionoff, Mr. Stoytcho",male,28,0,0,349207,7.8958,,S
|
108 |
+
107,1,3,"Salkjelsvik, Miss. Anna Kristine",female,21,0,0,343120,7.65,,S
|
109 |
+
108,1,3,"Moss, Mr. Albert Johan",male,,0,0,312991,7.775,,S
|
110 |
+
109,0,3,"Rekic, Mr. Tido",male,38,0,0,349249,7.8958,,S
|
111 |
+
110,1,3,"Moran, Miss. Bertha",female,,1,0,371110,24.15,,Q
|
112 |
+
111,0,1,"Porter, Mr. Walter Chamberlain",male,47,0,0,110465,52,C110,S
|
113 |
+
112,0,3,"Zabour, Miss. Hileni",female,14.5,1,0,2665,14.4542,,C
|
114 |
+
113,0,3,"Barton, Mr. David John",male,22,0,0,324669,8.05,,S
|
115 |
+
114,0,3,"Jussila, Miss. Katriina",female,20,1,0,4136,9.825,,S
|
116 |
+
115,0,3,"Attalah, Miss. Malake",female,17,0,0,2627,14.4583,,C
|
117 |
+
116,0,3,"Pekoniemi, Mr. Edvard",male,21,0,0,STON/O 2. 3101294,7.925,,S
|
118 |
+
117,0,3,"Connors, Mr. Patrick",male,70.5,0,0,370369,7.75,,Q
|
119 |
+
118,0,2,"Turpin, Mr. William John Robert",male,29,1,0,11668,21,,S
|
120 |
+
119,0,1,"Baxter, Mr. Quigg Edmond",male,24,0,1,PC 17558,247.5208,B58 B60,C
|
121 |
+
120,0,3,"Andersson, Miss. Ellis Anna Maria",female,2,4,2,347082,31.275,,S
|
122 |
+
121,0,2,"Hickman, Mr. Stanley George",male,21,2,0,S.O.C. 14879,73.5,,S
|
123 |
+
122,0,3,"Moore, Mr. Leonard Charles",male,,0,0,A4. 54510,8.05,,S
|
124 |
+
123,0,2,"Nasser, Mr. Nicholas",male,32.5,1,0,237736,30.0708,,C
|
125 |
+
124,1,2,"Webber, Miss. Susan",female,32.5,0,0,27267,13,E101,S
|
126 |
+
125,0,1,"White, Mr. Percival Wayland",male,54,0,1,35281,77.2875,D26,S
|
127 |
+
126,1,3,"Nicola-Yarred, Master. Elias",male,12,1,0,2651,11.2417,,C
|
128 |
+
127,0,3,"McMahon, Mr. Martin",male,,0,0,370372,7.75,,Q
|
129 |
+
128,1,3,"Madsen, Mr. Fridtjof Arne",male,24,0,0,C 17369,7.1417,,S
|
130 |
+
129,1,3,"Peter, Miss. Anna",female,,1,1,2668,22.3583,F E69,C
|
131 |
+
130,0,3,"Ekstrom, Mr. Johan",male,45,0,0,347061,6.975,,S
|
132 |
+
131,0,3,"Drazenoic, Mr. Jozef",male,33,0,0,349241,7.8958,,C
|
133 |
+
132,0,3,"Coelho, Mr. Domingos Fernandeo",male,20,0,0,SOTON/O.Q. 3101307,7.05,,S
|
134 |
+
133,0,3,"Robins, Mrs. Alexander A (Grace Charity Laury)",female,47,1,0,A/5. 3337,14.5,,S
|
135 |
+
134,1,2,"Weisz, Mrs. Leopold (Mathilde Francoise Pede)",female,29,1,0,228414,26,,S
|
136 |
+
135,0,2,"Sobey, Mr. Samuel James Hayden",male,25,0,0,C.A. 29178,13,,S
|
137 |
+
136,0,2,"Richard, Mr. Emile",male,23,0,0,SC/PARIS 2133,15.0458,,C
|
138 |
+
137,1,1,"Newsom, Miss. Helen Monypeny",female,19,0,2,11752,26.2833,D47,S
|
139 |
+
138,0,1,"Futrelle, Mr. Jacques Heath",male,37,1,0,113803,53.1,C123,S
|
140 |
+
139,0,3,"Osen, Mr. Olaf Elon",male,16,0,0,7534,9.2167,,S
|
141 |
+
140,0,1,"Giglio, Mr. Victor",male,24,0,0,PC 17593,79.2,B86,C
|
142 |
+
141,0,3,"Boulos, Mrs. Joseph (Sultana)",female,,0,2,2678,15.2458,,C
|
143 |
+
142,1,3,"Nysten, Miss. Anna Sofia",female,22,0,0,347081,7.75,,S
|
144 |
+
143,1,3,"Hakkarainen, Mrs. Pekka Pietari (Elin Matilda Dolck)",female,24,1,0,STON/O2. 3101279,15.85,,S
|
145 |
+
144,0,3,"Burke, Mr. Jeremiah",male,19,0,0,365222,6.75,,Q
|
146 |
+
145,0,2,"Andrew, Mr. Edgardo Samuel",male,18,0,0,231945,11.5,,S
|
147 |
+
146,0,2,"Nicholls, Mr. Joseph Charles",male,19,1,1,C.A. 33112,36.75,,S
|
148 |
+
147,1,3,"Andersson, Mr. August Edvard (""Wennerstrom"")",male,27,0,0,350043,7.7958,,S
|
149 |
+
148,0,3,"Ford, Miss. Robina Maggie ""Ruby""",female,9,2,2,W./C. 6608,34.375,,S
|
150 |
+
149,0,2,"Navratil, Mr. Michel (""Louis M Hoffman"")",male,36.5,0,2,230080,26,F2,S
|
151 |
+
150,0,2,"Byles, Rev. Thomas Roussel Davids",male,42,0,0,244310,13,,S
|
152 |
+
151,0,2,"Bateman, Rev. Robert James",male,51,0,0,S.O.P. 1166,12.525,,S
|
153 |
+
152,1,1,"Pears, Mrs. Thomas (Edith Wearne)",female,22,1,0,113776,66.6,C2,S
|
154 |
+
153,0,3,"Meo, Mr. Alfonzo",male,55.5,0,0,A.5. 11206,8.05,,S
|
155 |
+
154,0,3,"van Billiard, Mr. Austin Blyler",male,40.5,0,2,A/5. 851,14.5,,S
|
156 |
+
155,0,3,"Olsen, Mr. Ole Martin",male,,0,0,Fa 265302,7.3125,,S
|
157 |
+
156,0,1,"Williams, Mr. Charles Duane",male,51,0,1,PC 17597,61.3792,,C
|
158 |
+
157,1,3,"Gilnagh, Miss. Katherine ""Katie""",female,16,0,0,35851,7.7333,,Q
|
159 |
+
158,0,3,"Corn, Mr. Harry",male,30,0,0,SOTON/OQ 392090,8.05,,S
|
160 |
+
159,0,3,"Smiljanic, Mr. Mile",male,,0,0,315037,8.6625,,S
|
161 |
+
160,0,3,"Sage, Master. Thomas Henry",male,,8,2,CA. 2343,69.55,,S
|
162 |
+
161,0,3,"Cribb, Mr. John Hatfield",male,44,0,1,371362,16.1,,S
|
163 |
+
162,1,2,"Watt, Mrs. James (Elizabeth ""Bessie"" Inglis Milne)",female,40,0,0,C.A. 33595,15.75,,S
|
164 |
+
163,0,3,"Bengtsson, Mr. John Viktor",male,26,0,0,347068,7.775,,S
|
165 |
+
164,0,3,"Calic, Mr. Jovo",male,17,0,0,315093,8.6625,,S
|
166 |
+
165,0,3,"Panula, Master. Eino Viljami",male,1,4,1,3101295,39.6875,,S
|
167 |
+
166,1,3,"Goldsmith, Master. Frank John William ""Frankie""",male,9,0,2,363291,20.525,,S
|
168 |
+
167,1,1,"Chibnall, Mrs. (Edith Martha Bowerman)",female,,0,1,113505,55,E33,S
|
169 |
+
168,0,3,"Skoog, Mrs. William (Anna Bernhardina Karlsson)",female,45,1,4,347088,27.9,,S
|
170 |
+
169,0,1,"Baumann, Mr. John D",male,,0,0,PC 17318,25.925,,S
|
171 |
+
170,0,3,"Ling, Mr. Lee",male,28,0,0,1601,56.4958,,S
|
172 |
+
171,0,1,"Van der hoef, Mr. Wyckoff",male,61,0,0,111240,33.5,B19,S
|
173 |
+
172,0,3,"Rice, Master. Arthur",male,4,4,1,382652,29.125,,Q
|
174 |
+
173,1,3,"Johnson, Miss. Eleanor Ileen",female,1,1,1,347742,11.1333,,S
|
175 |
+
174,0,3,"Sivola, Mr. Antti Wilhelm",male,21,0,0,STON/O 2. 3101280,7.925,,S
|
176 |
+
175,0,1,"Smith, Mr. James Clinch",male,56,0,0,17764,30.6958,A7,C
|
177 |
+
176,0,3,"Klasen, Mr. Klas Albin",male,18,1,1,350404,7.8542,,S
|
178 |
+
177,0,3,"Lefebre, Master. Henry Forbes",male,,3,1,4133,25.4667,,S
|
179 |
+
178,0,1,"Isham, Miss. Ann Elizabeth",female,50,0,0,PC 17595,28.7125,C49,C
|
180 |
+
179,0,2,"Hale, Mr. Reginald",male,30,0,0,250653,13,,S
|
181 |
+
180,0,3,"Leonard, Mr. Lionel",male,36,0,0,LINE,0,,S
|
182 |
+
181,0,3,"Sage, Miss. Constance Gladys",female,,8,2,CA. 2343,69.55,,S
|
183 |
+
182,0,2,"Pernot, Mr. Rene",male,,0,0,SC/PARIS 2131,15.05,,C
|
184 |
+
183,0,3,"Asplund, Master. Clarence Gustaf Hugo",male,9,4,2,347077,31.3875,,S
|
185 |
+
184,1,2,"Becker, Master. Richard F",male,1,2,1,230136,39,F4,S
|
186 |
+
185,1,3,"Kink-Heilmann, Miss. Luise Gretchen",female,4,0,2,315153,22.025,,S
|
187 |
+
186,0,1,"Rood, Mr. Hugh Roscoe",male,,0,0,113767,50,A32,S
|
188 |
+
187,1,3,"O'Brien, Mrs. Thomas (Johanna ""Hannah"" Godfrey)",female,,1,0,370365,15.5,,Q
|
189 |
+
188,1,1,"Romaine, Mr. Charles Hallace (""Mr C Rolmane"")",male,45,0,0,111428,26.55,,S
|
190 |
+
189,0,3,"Bourke, Mr. John",male,40,1,1,364849,15.5,,Q
|
191 |
+
190,0,3,"Turcin, Mr. Stjepan",male,36,0,0,349247,7.8958,,S
|
192 |
+
191,1,2,"Pinsky, Mrs. (Rosa)",female,32,0,0,234604,13,,S
|
193 |
+
192,0,2,"Carbines, Mr. William",male,19,0,0,28424,13,,S
|
194 |
+
193,1,3,"Andersen-Jensen, Miss. Carla Christine Nielsine",female,19,1,0,350046,7.8542,,S
|
195 |
+
194,1,2,"Navratil, Master. Michel M",male,3,1,1,230080,26,F2,S
|
196 |
+
195,1,1,"Brown, Mrs. James Joseph (Margaret Tobin)",female,44,0,0,PC 17610,27.7208,B4,C
|
197 |
+
196,1,1,"Lurette, Miss. Elise",female,58,0,0,PC 17569,146.5208,B80,C
|
198 |
+
197,0,3,"Mernagh, Mr. Robert",male,,0,0,368703,7.75,,Q
|
199 |
+
198,0,3,"Olsen, Mr. Karl Siegwart Andreas",male,42,0,1,4579,8.4042,,S
|
200 |
+
199,1,3,"Madigan, Miss. Margaret ""Maggie""",female,,0,0,370370,7.75,,Q
|
201 |
+
200,0,2,"Yrois, Miss. Henriette (""Mrs Harbeck"")",female,24,0,0,248747,13,,S
|
202 |
+
201,0,3,"Vande Walle, Mr. Nestor Cyriel",male,28,0,0,345770,9.5,,S
|
203 |
+
202,0,3,"Sage, Mr. Frederick",male,,8,2,CA. 2343,69.55,,S
|
204 |
+
203,0,3,"Johanson, Mr. Jakob Alfred",male,34,0,0,3101264,6.4958,,S
|
205 |
+
204,0,3,"Youseff, Mr. Gerious",male,45.5,0,0,2628,7.225,,C
|
206 |
+
205,1,3,"Cohen, Mr. Gurshon ""Gus""",male,18,0,0,A/5 3540,8.05,,S
|
207 |
+
206,0,3,"Strom, Miss. Telma Matilda",female,2,0,1,347054,10.4625,G6,S
|
208 |
+
207,0,3,"Backstrom, Mr. Karl Alfred",male,32,1,0,3101278,15.85,,S
|
209 |
+
208,1,3,"Albimona, Mr. Nassef Cassem",male,26,0,0,2699,18.7875,,C
|
210 |
+
209,1,3,"Carr, Miss. Helen ""Ellen""",female,16,0,0,367231,7.75,,Q
|
211 |
+
210,1,1,"Blank, Mr. Henry",male,40,0,0,112277,31,A31,C
|
212 |
+
211,0,3,"Ali, Mr. Ahmed",male,24,0,0,SOTON/O.Q. 3101311,7.05,,S
|
213 |
+
212,1,2,"Cameron, Miss. Clear Annie",female,35,0,0,F.C.C. 13528,21,,S
|
214 |
+
213,0,3,"Perkin, Mr. John Henry",male,22,0,0,A/5 21174,7.25,,S
|
215 |
+
214,0,2,"Givard, Mr. Hans Kristensen",male,30,0,0,250646,13,,S
|
216 |
+
215,0,3,"Kiernan, Mr. Philip",male,,1,0,367229,7.75,,Q
|
217 |
+
216,1,1,"Newell, Miss. Madeleine",female,31,1,0,35273,113.275,D36,C
|
218 |
+
217,1,3,"Honkanen, Miss. Eliina",female,27,0,0,STON/O2. 3101283,7.925,,S
|
219 |
+
218,0,2,"Jacobsohn, Mr. Sidney Samuel",male,42,1,0,243847,27,,S
|
220 |
+
219,1,1,"Bazzani, Miss. Albina",female,32,0,0,11813,76.2917,D15,C
|
221 |
+
220,0,2,"Harris, Mr. Walter",male,30,0,0,W/C 14208,10.5,,S
|
222 |
+
221,1,3,"Sunderland, Mr. Victor Francis",male,16,0,0,SOTON/OQ 392089,8.05,,S
|
223 |
+
222,0,2,"Bracken, Mr. James H",male,27,0,0,220367,13,,S
|
224 |
+
223,0,3,"Green, Mr. George Henry",male,51,0,0,21440,8.05,,S
|
225 |
+
224,0,3,"Nenkoff, Mr. Christo",male,,0,0,349234,7.8958,,S
|
226 |
+
225,1,1,"Hoyt, Mr. Frederick Maxfield",male,38,1,0,19943,90,C93,S
|
227 |
+
226,0,3,"Berglund, Mr. Karl Ivar Sven",male,22,0,0,PP 4348,9.35,,S
|
228 |
+
227,1,2,"Mellors, Mr. William John",male,19,0,0,SW/PP 751,10.5,,S
|
229 |
+
228,0,3,"Lovell, Mr. John Hall (""Henry"")",male,20.5,0,0,A/5 21173,7.25,,S
|
230 |
+
229,0,2,"Fahlstrom, Mr. Arne Jonas",male,18,0,0,236171,13,,S
|
231 |
+
230,0,3,"Lefebre, Miss. Mathilde",female,,3,1,4133,25.4667,,S
|
232 |
+
231,1,1,"Harris, Mrs. Henry Birkhardt (Irene Wallach)",female,35,1,0,36973,83.475,C83,S
|
233 |
+
232,0,3,"Larsson, Mr. Bengt Edvin",male,29,0,0,347067,7.775,,S
|
234 |
+
233,0,2,"Sjostedt, Mr. Ernst Adolf",male,59,0,0,237442,13.5,,S
|
235 |
+
234,1,3,"Asplund, Miss. Lillian Gertrud",female,5,4,2,347077,31.3875,,S
|
236 |
+
235,0,2,"Leyson, Mr. Robert William Norman",male,24,0,0,C.A. 29566,10.5,,S
|
237 |
+
236,0,3,"Harknett, Miss. Alice Phoebe",female,,0,0,W./C. 6609,7.55,,S
|
238 |
+
237,0,2,"Hold, Mr. Stephen",male,44,1,0,26707,26,,S
|
239 |
+
238,1,2,"Collyer, Miss. Marjorie ""Lottie""",female,8,0,2,C.A. 31921,26.25,,S
|
240 |
+
239,0,2,"Pengelly, Mr. Frederick William",male,19,0,0,28665,10.5,,S
|
241 |
+
240,0,2,"Hunt, Mr. George Henry",male,33,0,0,SCO/W 1585,12.275,,S
|
242 |
+
241,0,3,"Zabour, Miss. Thamine",female,,1,0,2665,14.4542,,C
|
243 |
+
242,1,3,"Murphy, Miss. Katherine ""Kate""",female,,1,0,367230,15.5,,Q
|
244 |
+
243,0,2,"Coleridge, Mr. Reginald Charles",male,29,0,0,W./C. 14263,10.5,,S
|
245 |
+
244,0,3,"Maenpaa, Mr. Matti Alexanteri",male,22,0,0,STON/O 2. 3101275,7.125,,S
|
246 |
+
245,0,3,"Attalah, Mr. Sleiman",male,30,0,0,2694,7.225,,C
|
247 |
+
246,0,1,"Minahan, Dr. William Edward",male,44,2,0,19928,90,C78,Q
|
248 |
+
247,0,3,"Lindahl, Miss. Agda Thorilda Viktoria",female,25,0,0,347071,7.775,,S
|
249 |
+
248,1,2,"Hamalainen, Mrs. William (Anna)",female,24,0,2,250649,14.5,,S
|
250 |
+
249,1,1,"Beckwith, Mr. Richard Leonard",male,37,1,1,11751,52.5542,D35,S
|
251 |
+
250,0,2,"Carter, Rev. Ernest Courtenay",male,54,1,0,244252,26,,S
|
252 |
+
251,0,3,"Reed, Mr. James George",male,,0,0,362316,7.25,,S
|
253 |
+
252,0,3,"Strom, Mrs. Wilhelm (Elna Matilda Persson)",female,29,1,1,347054,10.4625,G6,S
|
254 |
+
253,0,1,"Stead, Mr. William Thomas",male,62,0,0,113514,26.55,C87,S
|
255 |
+
254,0,3,"Lobb, Mr. William Arthur",male,30,1,0,A/5. 3336,16.1,,S
|
256 |
+
255,0,3,"Rosblom, Mrs. Viktor (Helena Wilhelmina)",female,41,0,2,370129,20.2125,,S
|
257 |
+
256,1,3,"Touma, Mrs. Darwis (Hanne Youssef Razi)",female,29,0,2,2650,15.2458,,C
|
258 |
+
257,1,1,"Thorne, Mrs. Gertrude Maybelle",female,,0,0,PC 17585,79.2,,C
|
259 |
+
258,1,1,"Cherry, Miss. Gladys",female,30,0,0,110152,86.5,B77,S
|
260 |
+
259,1,1,"Ward, Miss. Anna",female,35,0,0,PC 17755,512.3292,,C
|
261 |
+
260,1,2,"Parrish, Mrs. (Lutie Davis)",female,50,0,1,230433,26,,S
|
262 |
+
261,0,3,"Smith, Mr. Thomas",male,,0,0,384461,7.75,,Q
|
263 |
+
262,1,3,"Asplund, Master. Edvin Rojj Felix",male,3,4,2,347077,31.3875,,S
|
264 |
+
263,0,1,"Taussig, Mr. Emil",male,52,1,1,110413,79.65,E67,S
|
265 |
+
264,0,1,"Harrison, Mr. William",male,40,0,0,112059,0,B94,S
|
266 |
+
265,0,3,"Henry, Miss. Delia",female,,0,0,382649,7.75,,Q
|
267 |
+
266,0,2,"Reeves, Mr. David",male,36,0,0,C.A. 17248,10.5,,S
|
268 |
+
267,0,3,"Panula, Mr. Ernesti Arvid",male,16,4,1,3101295,39.6875,,S
|
269 |
+
268,1,3,"Persson, Mr. Ernst Ulrik",male,25,1,0,347083,7.775,,S
|
270 |
+
269,1,1,"Graham, Mrs. William Thompson (Edith Junkins)",female,58,0,1,PC 17582,153.4625,C125,S
|
271 |
+
270,1,1,"Bissette, Miss. Amelia",female,35,0,0,PC 17760,135.6333,C99,S
|
272 |
+
271,0,1,"Cairns, Mr. Alexander",male,,0,0,113798,31,,S
|
273 |
+
272,1,3,"Tornquist, Mr. William Henry",male,25,0,0,LINE,0,,S
|
274 |
+
273,1,2,"Mellinger, Mrs. (Elizabeth Anne Maidment)",female,41,0,1,250644,19.5,,S
|
275 |
+
274,0,1,"Natsch, Mr. Charles H",male,37,0,1,PC 17596,29.7,C118,C
|
276 |
+
275,1,3,"Healy, Miss. Hanora ""Nora""",female,,0,0,370375,7.75,,Q
|
277 |
+
276,1,1,"Andrews, Miss. Kornelia Theodosia",female,63,1,0,13502,77.9583,D7,S
|
278 |
+
277,0,3,"Lindblom, Miss. Augusta Charlotta",female,45,0,0,347073,7.75,,S
|
279 |
+
278,0,2,"Parkes, Mr. Francis ""Frank""",male,,0,0,239853,0,,S
|
280 |
+
279,0,3,"Rice, Master. Eric",male,7,4,1,382652,29.125,,Q
|
281 |
+
280,1,3,"Abbott, Mrs. Stanton (Rosa Hunt)",female,35,1,1,C.A. 2673,20.25,,S
|
282 |
+
281,0,3,"Duane, Mr. Frank",male,65,0,0,336439,7.75,,Q
|
283 |
+
282,0,3,"Olsson, Mr. Nils Johan Goransson",male,28,0,0,347464,7.8542,,S
|
284 |
+
283,0,3,"de Pelsmaeker, Mr. Alfons",male,16,0,0,345778,9.5,,S
|
285 |
+
284,1,3,"Dorking, Mr. Edward Arthur",male,19,0,0,A/5. 10482,8.05,,S
|
286 |
+
285,0,1,"Smith, Mr. Richard William",male,,0,0,113056,26,A19,S
|
287 |
+
286,0,3,"Stankovic, Mr. Ivan",male,33,0,0,349239,8.6625,,C
|
288 |
+
287,1,3,"de Mulder, Mr. Theodore",male,30,0,0,345774,9.5,,S
|
289 |
+
288,0,3,"Naidenoff, Mr. Penko",male,22,0,0,349206,7.8958,,S
|
290 |
+
289,1,2,"Hosono, Mr. Masabumi",male,42,0,0,237798,13,,S
|
291 |
+
290,1,3,"Connolly, Miss. Kate",female,22,0,0,370373,7.75,,Q
|
292 |
+
291,1,1,"Barber, Miss. Ellen ""Nellie""",female,26,0,0,19877,78.85,,S
|
293 |
+
292,1,1,"Bishop, Mrs. Dickinson H (Helen Walton)",female,19,1,0,11967,91.0792,B49,C
|
294 |
+
293,0,2,"Levy, Mr. Rene Jacques",male,36,0,0,SC/Paris 2163,12.875,D,C
|
295 |
+
294,0,3,"Haas, Miss. Aloisia",female,24,0,0,349236,8.85,,S
|
296 |
+
295,0,3,"Mineff, Mr. Ivan",male,24,0,0,349233,7.8958,,S
|
297 |
+
296,0,1,"Lewy, Mr. Ervin G",male,,0,0,PC 17612,27.7208,,C
|
298 |
+
297,0,3,"Hanna, Mr. Mansour",male,23.5,0,0,2693,7.2292,,C
|
299 |
+
298,0,1,"Allison, Miss. Helen Loraine",female,2,1,2,113781,151.55,C22 C26,S
|
300 |
+
299,1,1,"Saalfeld, Mr. Adolphe",male,,0,0,19988,30.5,C106,S
|
301 |
+
300,1,1,"Baxter, Mrs. James (Helene DeLaudeniere Chaput)",female,50,0,1,PC 17558,247.5208,B58 B60,C
|
302 |
+
301,1,3,"Kelly, Miss. Anna Katherine ""Annie Kate""",female,,0,0,9234,7.75,,Q
|
303 |
+
302,1,3,"McCoy, Mr. Bernard",male,,2,0,367226,23.25,,Q
|
304 |
+
303,0,3,"Johnson, Mr. William Cahoone Jr",male,19,0,0,LINE,0,,S
|
305 |
+
304,1,2,"Keane, Miss. Nora A",female,,0,0,226593,12.35,E101,Q
|
306 |
+
305,0,3,"Williams, Mr. Howard Hugh ""Harry""",male,,0,0,A/5 2466,8.05,,S
|
307 |
+
306,1,1,"Allison, Master. Hudson Trevor",male,0.92,1,2,113781,151.55,C22 C26,S
|
308 |
+
307,1,1,"Fleming, Miss. Margaret",female,,0,0,17421,110.8833,,C
|
309 |
+
308,1,1,"Penasco y Castellana, Mrs. Victor de Satode (Maria Josefa Perez de Soto y Vallejo)",female,17,1,0,PC 17758,108.9,C65,C
|
310 |
+
309,0,2,"Abelson, Mr. Samuel",male,30,1,0,P/PP 3381,24,,C
|
311 |
+
310,1,1,"Francatelli, Miss. Laura Mabel",female,30,0,0,PC 17485,56.9292,E36,C
|
312 |
+
311,1,1,"Hays, Miss. Margaret Bechstein",female,24,0,0,11767,83.1583,C54,C
|
313 |
+
312,1,1,"Ryerson, Miss. Emily Borie",female,18,2,2,PC 17608,262.375,B57 B59 B63 B66,C
|
314 |
+
313,0,2,"Lahtinen, Mrs. William (Anna Sylfven)",female,26,1,1,250651,26,,S
|
315 |
+
314,0,3,"Hendekovic, Mr. Ignjac",male,28,0,0,349243,7.8958,,S
|
316 |
+
315,0,2,"Hart, Mr. Benjamin",male,43,1,1,F.C.C. 13529,26.25,,S
|
317 |
+
316,1,3,"Nilsson, Miss. Helmina Josefina",female,26,0,0,347470,7.8542,,S
|
318 |
+
317,1,2,"Kantor, Mrs. Sinai (Miriam Sternin)",female,24,1,0,244367,26,,S
|
319 |
+
318,0,2,"Moraweck, Dr. Ernest",male,54,0,0,29011,14,,S
|
320 |
+
319,1,1,"Wick, Miss. Mary Natalie",female,31,0,2,36928,164.8667,C7,S
|
321 |
+
320,1,1,"Spedden, Mrs. Frederic Oakley (Margaretta Corning Stone)",female,40,1,1,16966,134.5,E34,C
|
322 |
+
321,0,3,"Dennis, Mr. Samuel",male,22,0,0,A/5 21172,7.25,,S
|
323 |
+
322,0,3,"Danoff, Mr. Yoto",male,27,0,0,349219,7.8958,,S
|
324 |
+
323,1,2,"Slayter, Miss. Hilda Mary",female,30,0,0,234818,12.35,,Q
|
325 |
+
324,1,2,"Caldwell, Mrs. Albert Francis (Sylvia Mae Harbaugh)",female,22,1,1,248738,29,,S
|
326 |
+
325,0,3,"Sage, Mr. George John Jr",male,,8,2,CA. 2343,69.55,,S
|
327 |
+
326,1,1,"Young, Miss. Marie Grice",female,36,0,0,PC 17760,135.6333,C32,C
|
328 |
+
327,0,3,"Nysveen, Mr. Johan Hansen",male,61,0,0,345364,6.2375,,S
|
329 |
+
328,1,2,"Ball, Mrs. (Ada E Hall)",female,36,0,0,28551,13,D,S
|
330 |
+
329,1,3,"Goldsmith, Mrs. Frank John (Emily Alice Brown)",female,31,1,1,363291,20.525,,S
|
331 |
+
330,1,1,"Hippach, Miss. Jean Gertrude",female,16,0,1,111361,57.9792,B18,C
|
332 |
+
331,1,3,"McCoy, Miss. Agnes",female,,2,0,367226,23.25,,Q
|
333 |
+
332,0,1,"Partner, Mr. Austen",male,45.5,0,0,113043,28.5,C124,S
|
334 |
+
333,0,1,"Graham, Mr. George Edward",male,38,0,1,PC 17582,153.4625,C91,S
|
335 |
+
334,0,3,"Vander Planke, Mr. Leo Edmondus",male,16,2,0,345764,18,,S
|
336 |
+
335,1,1,"Frauenthal, Mrs. Henry William (Clara Heinsheimer)",female,,1,0,PC 17611,133.65,,S
|
337 |
+
336,0,3,"Denkoff, Mr. Mitto",male,,0,0,349225,7.8958,,S
|
338 |
+
337,0,1,"Pears, Mr. Thomas Clinton",male,29,1,0,113776,66.6,C2,S
|
339 |
+
338,1,1,"Burns, Miss. Elizabeth Margaret",female,41,0,0,16966,134.5,E40,C
|
340 |
+
339,1,3,"Dahl, Mr. Karl Edwart",male,45,0,0,7598,8.05,,S
|
341 |
+
340,0,1,"Blackwell, Mr. Stephen Weart",male,45,0,0,113784,35.5,T,S
|
342 |
+
341,1,2,"Navratil, Master. Edmond Roger",male,2,1,1,230080,26,F2,S
|
343 |
+
342,1,1,"Fortune, Miss. Alice Elizabeth",female,24,3,2,19950,263,C23 C25 C27,S
|
344 |
+
343,0,2,"Collander, Mr. Erik Gustaf",male,28,0,0,248740,13,,S
|
345 |
+
344,0,2,"Sedgwick, Mr. Charles Frederick Waddington",male,25,0,0,244361,13,,S
|
346 |
+
345,0,2,"Fox, Mr. Stanley Hubert",male,36,0,0,229236,13,,S
|
347 |
+
346,1,2,"Brown, Miss. Amelia ""Mildred""",female,24,0,0,248733,13,F33,S
|
348 |
+
347,1,2,"Smith, Miss. Marion Elsie",female,40,0,0,31418,13,,S
|
349 |
+
348,1,3,"Davison, Mrs. Thomas Henry (Mary E Finck)",female,,1,0,386525,16.1,,S
|
350 |
+
349,1,3,"Coutts, Master. William Loch ""William""",male,3,1,1,C.A. 37671,15.9,,S
|
351 |
+
350,0,3,"Dimic, Mr. Jovan",male,42,0,0,315088,8.6625,,S
|
352 |
+
351,0,3,"Odahl, Mr. Nils Martin",male,23,0,0,7267,9.225,,S
|
353 |
+
352,0,1,"Williams-Lambert, Mr. Fletcher Fellows",male,,0,0,113510,35,C128,S
|
354 |
+
353,0,3,"Elias, Mr. Tannous",male,15,1,1,2695,7.2292,,C
|
355 |
+
354,0,3,"Arnold-Franchi, Mr. Josef",male,25,1,0,349237,17.8,,S
|
356 |
+
355,0,3,"Yousif, Mr. Wazli",male,,0,0,2647,7.225,,C
|
357 |
+
356,0,3,"Vanden Steen, Mr. Leo Peter",male,28,0,0,345783,9.5,,S
|
358 |
+
357,1,1,"Bowerman, Miss. Elsie Edith",female,22,0,1,113505,55,E33,S
|
359 |
+
358,0,2,"Funk, Miss. Annie Clemmer",female,38,0,0,237671,13,,S
|
360 |
+
359,1,3,"McGovern, Miss. Mary",female,,0,0,330931,7.8792,,Q
|
361 |
+
360,1,3,"Mockler, Miss. Helen Mary ""Ellie""",female,,0,0,330980,7.8792,,Q
|
362 |
+
361,0,3,"Skoog, Mr. Wilhelm",male,40,1,4,347088,27.9,,S
|
363 |
+
362,0,2,"del Carlo, Mr. Sebastiano",male,29,1,0,SC/PARIS 2167,27.7208,,C
|
364 |
+
363,0,3,"Barbara, Mrs. (Catherine David)",female,45,0,1,2691,14.4542,,C
|
365 |
+
364,0,3,"Asim, Mr. Adola",male,35,0,0,SOTON/O.Q. 3101310,7.05,,S
|
366 |
+
365,0,3,"O'Brien, Mr. Thomas",male,,1,0,370365,15.5,,Q
|
367 |
+
366,0,3,"Adahl, Mr. Mauritz Nils Martin",male,30,0,0,C 7076,7.25,,S
|
368 |
+
367,1,1,"Warren, Mrs. Frank Manley (Anna Sophia Atkinson)",female,60,1,0,110813,75.25,D37,C
|
369 |
+
368,1,3,"Moussa, Mrs. (Mantoura Boulos)",female,,0,0,2626,7.2292,,C
|
370 |
+
369,1,3,"Jermyn, Miss. Annie",female,,0,0,14313,7.75,,Q
|
371 |
+
370,1,1,"Aubart, Mme. Leontine Pauline",female,24,0,0,PC 17477,69.3,B35,C
|
372 |
+
371,1,1,"Harder, Mr. George Achilles",male,25,1,0,11765,55.4417,E50,C
|
373 |
+
372,0,3,"Wiklund, Mr. Jakob Alfred",male,18,1,0,3101267,6.4958,,S
|
374 |
+
373,0,3,"Beavan, Mr. William Thomas",male,19,0,0,323951,8.05,,S
|
375 |
+
374,0,1,"Ringhini, Mr. Sante",male,22,0,0,PC 17760,135.6333,,C
|
376 |
+
375,0,3,"Palsson, Miss. Stina Viola",female,3,3,1,349909,21.075,,S
|
377 |
+
376,1,1,"Meyer, Mrs. Edgar Joseph (Leila Saks)",female,,1,0,PC 17604,82.1708,,C
|
378 |
+
377,1,3,"Landergren, Miss. Aurora Adelia",female,22,0,0,C 7077,7.25,,S
|
379 |
+
378,0,1,"Widener, Mr. Harry Elkins",male,27,0,2,113503,211.5,C82,C
|
380 |
+
379,0,3,"Betros, Mr. Tannous",male,20,0,0,2648,4.0125,,C
|
381 |
+
380,0,3,"Gustafsson, Mr. Karl Gideon",male,19,0,0,347069,7.775,,S
|
382 |
+
381,1,1,"Bidois, Miss. Rosalie",female,42,0,0,PC 17757,227.525,,C
|
383 |
+
382,1,3,"Nakid, Miss. Maria (""Mary"")",female,1,0,2,2653,15.7417,,C
|
384 |
+
383,0,3,"Tikkanen, Mr. Juho",male,32,0,0,STON/O 2. 3101293,7.925,,S
|
385 |
+
384,1,1,"Holverson, Mrs. Alexander Oskar (Mary Aline Towner)",female,35,1,0,113789,52,,S
|
386 |
+
385,0,3,"Plotcharsky, Mr. Vasil",male,,0,0,349227,7.8958,,S
|
387 |
+
386,0,2,"Davies, Mr. Charles Henry",male,18,0,0,S.O.C. 14879,73.5,,S
|
388 |
+
387,0,3,"Goodwin, Master. Sidney Leonard",male,1,5,2,CA 2144,46.9,,S
|
389 |
+
388,1,2,"Buss, Miss. Kate",female,36,0,0,27849,13,,S
|
390 |
+
389,0,3,"Sadlier, Mr. Matthew",male,,0,0,367655,7.7292,,Q
|
391 |
+
390,1,2,"Lehmann, Miss. Bertha",female,17,0,0,SC 1748,12,,C
|
392 |
+
391,1,1,"Carter, Mr. William Ernest",male,36,1,2,113760,120,B96 B98,S
|
393 |
+
392,1,3,"Jansson, Mr. Carl Olof",male,21,0,0,350034,7.7958,,S
|
394 |
+
393,0,3,"Gustafsson, Mr. Johan Birger",male,28,2,0,3101277,7.925,,S
|
395 |
+
394,1,1,"Newell, Miss. Marjorie",female,23,1,0,35273,113.275,D36,C
|
396 |
+
395,1,3,"Sandstrom, Mrs. Hjalmar (Agnes Charlotta Bengtsson)",female,24,0,2,PP 9549,16.7,G6,S
|
397 |
+
396,0,3,"Johansson, Mr. Erik",male,22,0,0,350052,7.7958,,S
|
398 |
+
397,0,3,"Olsson, Miss. Elina",female,31,0,0,350407,7.8542,,S
|
399 |
+
398,0,2,"McKane, Mr. Peter David",male,46,0,0,28403,26,,S
|
400 |
+
399,0,2,"Pain, Dr. Alfred",male,23,0,0,244278,10.5,,S
|
401 |
+
400,1,2,"Trout, Mrs. William H (Jessie L)",female,28,0,0,240929,12.65,,S
|
402 |
+
401,1,3,"Niskanen, Mr. Juha",male,39,0,0,STON/O 2. 3101289,7.925,,S
|
403 |
+
402,0,3,"Adams, Mr. John",male,26,0,0,341826,8.05,,S
|
404 |
+
403,0,3,"Jussila, Miss. Mari Aina",female,21,1,0,4137,9.825,,S
|
405 |
+
404,0,3,"Hakkarainen, Mr. Pekka Pietari",male,28,1,0,STON/O2. 3101279,15.85,,S
|
406 |
+
405,0,3,"Oreskovic, Miss. Marija",female,20,0,0,315096,8.6625,,S
|
407 |
+
406,0,2,"Gale, Mr. Shadrach",male,34,1,0,28664,21,,S
|
408 |
+
407,0,3,"Widegren, Mr. Carl/Charles Peter",male,51,0,0,347064,7.75,,S
|
409 |
+
408,1,2,"Richards, Master. William Rowe",male,3,1,1,29106,18.75,,S
|
410 |
+
409,0,3,"Birkeland, Mr. Hans Martin Monsen",male,21,0,0,312992,7.775,,S
|
411 |
+
410,0,3,"Lefebre, Miss. Ida",female,,3,1,4133,25.4667,,S
|
412 |
+
411,0,3,"Sdycoff, Mr. Todor",male,,0,0,349222,7.8958,,S
|
413 |
+
412,0,3,"Hart, Mr. Henry",male,,0,0,394140,6.8583,,Q
|
414 |
+
413,1,1,"Minahan, Miss. Daisy E",female,33,1,0,19928,90,C78,Q
|
415 |
+
414,0,2,"Cunningham, Mr. Alfred Fleming",male,,0,0,239853,0,,S
|
416 |
+
415,1,3,"Sundman, Mr. Johan Julian",male,44,0,0,STON/O 2. 3101269,7.925,,S
|
417 |
+
416,0,3,"Meek, Mrs. Thomas (Annie Louise Rowley)",female,,0,0,343095,8.05,,S
|
418 |
+
417,1,2,"Drew, Mrs. James Vivian (Lulu Thorne Christian)",female,34,1,1,28220,32.5,,S
|
419 |
+
418,1,2,"Silven, Miss. Lyyli Karoliina",female,18,0,2,250652,13,,S
|
420 |
+
419,0,2,"Matthews, Mr. William John",male,30,0,0,28228,13,,S
|
421 |
+
420,0,3,"Van Impe, Miss. Catharina",female,10,0,2,345773,24.15,,S
|
422 |
+
421,0,3,"Gheorgheff, Mr. Stanio",male,,0,0,349254,7.8958,,C
|
423 |
+
422,0,3,"Charters, Mr. David",male,21,0,0,A/5. 13032,7.7333,,Q
|
424 |
+
423,0,3,"Zimmerman, Mr. Leo",male,29,0,0,315082,7.875,,S
|
425 |
+
424,0,3,"Danbom, Mrs. Ernst Gilbert (Anna Sigrid Maria Brogren)",female,28,1,1,347080,14.4,,S
|
426 |
+
425,0,3,"Rosblom, Mr. Viktor Richard",male,18,1,1,370129,20.2125,,S
|
427 |
+
426,0,3,"Wiseman, Mr. Phillippe",male,,0,0,A/4. 34244,7.25,,S
|
428 |
+
427,1,2,"Clarke, Mrs. Charles V (Ada Maria Winfield)",female,28,1,0,2003,26,,S
|
429 |
+
428,1,2,"Phillips, Miss. Kate Florence (""Mrs Kate Louise Phillips Marshall"")",female,19,0,0,250655,26,,S
|
430 |
+
429,0,3,"Flynn, Mr. James",male,,0,0,364851,7.75,,Q
|
431 |
+
430,1,3,"Pickard, Mr. Berk (Berk Trembisky)",male,32,0,0,SOTON/O.Q. 392078,8.05,E10,S
|
432 |
+
431,1,1,"Bjornstrom-Steffansson, Mr. Mauritz Hakan",male,28,0,0,110564,26.55,C52,S
|
433 |
+
432,1,3,"Thorneycroft, Mrs. Percival (Florence Kate White)",female,,1,0,376564,16.1,,S
|
434 |
+
433,1,2,"Louch, Mrs. Charles Alexander (Alice Adelaide Slow)",female,42,1,0,SC/AH 3085,26,,S
|
435 |
+
434,0,3,"Kallio, Mr. Nikolai Erland",male,17,0,0,STON/O 2. 3101274,7.125,,S
|
436 |
+
435,0,1,"Silvey, Mr. William Baird",male,50,1,0,13507,55.9,E44,S
|
437 |
+
436,1,1,"Carter, Miss. Lucile Polk",female,14,1,2,113760,120,B96 B98,S
|
438 |
+
437,0,3,"Ford, Miss. Doolina Margaret ""Daisy""",female,21,2,2,W./C. 6608,34.375,,S
|
439 |
+
438,1,2,"Richards, Mrs. Sidney (Emily Hocking)",female,24,2,3,29106,18.75,,S
|
440 |
+
439,0,1,"Fortune, Mr. Mark",male,64,1,4,19950,263,C23 C25 C27,S
|
441 |
+
440,0,2,"Kvillner, Mr. Johan Henrik Johannesson",male,31,0,0,C.A. 18723,10.5,,S
|
442 |
+
441,1,2,"Hart, Mrs. Benjamin (Esther Ada Bloomfield)",female,45,1,1,F.C.C. 13529,26.25,,S
|
443 |
+
442,0,3,"Hampe, Mr. Leon",male,20,0,0,345769,9.5,,S
|
444 |
+
443,0,3,"Petterson, Mr. Johan Emil",male,25,1,0,347076,7.775,,S
|
445 |
+
444,1,2,"Reynaldo, Ms. Encarnacion",female,28,0,0,230434,13,,S
|
446 |
+
445,1,3,"Johannesen-Bratthammer, Mr. Bernt",male,,0,0,65306,8.1125,,S
|
447 |
+
446,1,1,"Dodge, Master. Washington",male,4,0,2,33638,81.8583,A34,S
|
448 |
+
447,1,2,"Mellinger, Miss. Madeleine Violet",female,13,0,1,250644,19.5,,S
|
449 |
+
448,1,1,"Seward, Mr. Frederic Kimber",male,34,0,0,113794,26.55,,S
|
450 |
+
449,1,3,"Baclini, Miss. Marie Catherine",female,5,2,1,2666,19.2583,,C
|
451 |
+
450,1,1,"Peuchen, Major. Arthur Godfrey",male,52,0,0,113786,30.5,C104,S
|
452 |
+
451,0,2,"West, Mr. Edwy Arthur",male,36,1,2,C.A. 34651,27.75,,S
|
453 |
+
452,0,3,"Hagland, Mr. Ingvald Olai Olsen",male,,1,0,65303,19.9667,,S
|
454 |
+
453,0,1,"Foreman, Mr. Benjamin Laventall",male,30,0,0,113051,27.75,C111,C
|
455 |
+
454,1,1,"Goldenberg, Mr. Samuel L",male,49,1,0,17453,89.1042,C92,C
|
456 |
+
455,0,3,"Peduzzi, Mr. Joseph",male,,0,0,A/5 2817,8.05,,S
|
457 |
+
456,1,3,"Jalsevac, Mr. Ivan",male,29,0,0,349240,7.8958,,C
|
458 |
+
457,0,1,"Millet, Mr. Francis Davis",male,65,0,0,13509,26.55,E38,S
|
459 |
+
458,1,1,"Kenyon, Mrs. Frederick R (Marion)",female,,1,0,17464,51.8625,D21,S
|
460 |
+
459,1,2,"Toomey, Miss. Ellen",female,50,0,0,F.C.C. 13531,10.5,,S
|
461 |
+
460,0,3,"O'Connor, Mr. Maurice",male,,0,0,371060,7.75,,Q
|
462 |
+
461,1,1,"Anderson, Mr. Harry",male,48,0,0,19952,26.55,E12,S
|
463 |
+
462,0,3,"Morley, Mr. William",male,34,0,0,364506,8.05,,S
|
464 |
+
463,0,1,"Gee, Mr. Arthur H",male,47,0,0,111320,38.5,E63,S
|
465 |
+
464,0,2,"Milling, Mr. Jacob Christian",male,48,0,0,234360,13,,S
|
466 |
+
465,0,3,"Maisner, Mr. Simon",male,,0,0,A/S 2816,8.05,,S
|
467 |
+
466,0,3,"Goncalves, Mr. Manuel Estanslas",male,38,0,0,SOTON/O.Q. 3101306,7.05,,S
|
468 |
+
467,0,2,"Campbell, Mr. William",male,,0,0,239853,0,,S
|
469 |
+
468,0,1,"Smart, Mr. John Montgomery",male,56,0,0,113792,26.55,,S
|
470 |
+
469,0,3,"Scanlan, Mr. James",male,,0,0,36209,7.725,,Q
|
471 |
+
470,1,3,"Baclini, Miss. Helene Barbara",female,0.75,2,1,2666,19.2583,,C
|
472 |
+
471,0,3,"Keefe, Mr. Arthur",male,,0,0,323592,7.25,,S
|
473 |
+
472,0,3,"Cacic, Mr. Luka",male,38,0,0,315089,8.6625,,S
|
474 |
+
473,1,2,"West, Mrs. Edwy Arthur (Ada Mary Worth)",female,33,1,2,C.A. 34651,27.75,,S
|
475 |
+
474,1,2,"Jerwan, Mrs. Amin S (Marie Marthe Thuillard)",female,23,0,0,SC/AH Basle 541,13.7917,D,C
|
476 |
+
475,0,3,"Strandberg, Miss. Ida Sofia",female,22,0,0,7553,9.8375,,S
|
477 |
+
476,0,1,"Clifford, Mr. George Quincy",male,,0,0,110465,52,A14,S
|
478 |
+
477,0,2,"Renouf, Mr. Peter Henry",male,34,1,0,31027,21,,S
|
479 |
+
478,0,3,"Braund, Mr. Lewis Richard",male,29,1,0,3460,7.0458,,S
|
480 |
+
479,0,3,"Karlsson, Mr. Nils August",male,22,0,0,350060,7.5208,,S
|
481 |
+
480,1,3,"Hirvonen, Miss. Hildur E",female,2,0,1,3101298,12.2875,,S
|
482 |
+
481,0,3,"Goodwin, Master. Harold Victor",male,9,5,2,CA 2144,46.9,,S
|
483 |
+
482,0,2,"Frost, Mr. Anthony Wood ""Archie""",male,,0,0,239854,0,,S
|
484 |
+
483,0,3,"Rouse, Mr. Richard Henry",male,50,0,0,A/5 3594,8.05,,S
|
485 |
+
484,1,3,"Turkula, Mrs. (Hedwig)",female,63,0,0,4134,9.5875,,S
|
486 |
+
485,1,1,"Bishop, Mr. Dickinson H",male,25,1,0,11967,91.0792,B49,C
|
487 |
+
486,0,3,"Lefebre, Miss. Jeannie",female,,3,1,4133,25.4667,,S
|
488 |
+
487,1,1,"Hoyt, Mrs. Frederick Maxfield (Jane Anne Forby)",female,35,1,0,19943,90,C93,S
|
489 |
+
488,0,1,"Kent, Mr. Edward Austin",male,58,0,0,11771,29.7,B37,C
|
490 |
+
489,0,3,"Somerton, Mr. Francis William",male,30,0,0,A.5. 18509,8.05,,S
|
491 |
+
490,1,3,"Coutts, Master. Eden Leslie ""Neville""",male,9,1,1,C.A. 37671,15.9,,S
|
492 |
+
491,0,3,"Hagland, Mr. Konrad Mathias Reiersen",male,,1,0,65304,19.9667,,S
|
493 |
+
492,0,3,"Windelov, Mr. Einar",male,21,0,0,SOTON/OQ 3101317,7.25,,S
|
494 |
+
493,0,1,"Molson, Mr. Harry Markland",male,55,0,0,113787,30.5,C30,S
|
495 |
+
494,0,1,"Artagaveytia, Mr. Ramon",male,71,0,0,PC 17609,49.5042,,C
|
496 |
+
495,0,3,"Stanley, Mr. Edward Roland",male,21,0,0,A/4 45380,8.05,,S
|
497 |
+
496,0,3,"Yousseff, Mr. Gerious",male,,0,0,2627,14.4583,,C
|
498 |
+
497,1,1,"Eustis, Miss. Elizabeth Mussey",female,54,1,0,36947,78.2667,D20,C
|
499 |
+
498,0,3,"Shellard, Mr. Frederick William",male,,0,0,C.A. 6212,15.1,,S
|
500 |
+
499,0,1,"Allison, Mrs. Hudson J C (Bessie Waldo Daniels)",female,25,1,2,113781,151.55,C22 C26,S
|
501 |
+
500,0,3,"Svensson, Mr. Olof",male,24,0,0,350035,7.7958,,S
|
502 |
+
501,0,3,"Calic, Mr. Petar",male,17,0,0,315086,8.6625,,S
|
503 |
+
502,0,3,"Canavan, Miss. Mary",female,21,0,0,364846,7.75,,Q
|
504 |
+
503,0,3,"O'Sullivan, Miss. Bridget Mary",female,,0,0,330909,7.6292,,Q
|
505 |
+
504,0,3,"Laitinen, Miss. Kristina Sofia",female,37,0,0,4135,9.5875,,S
|
506 |
+
505,1,1,"Maioni, Miss. Roberta",female,16,0,0,110152,86.5,B79,S
|
507 |
+
506,0,1,"Penasco y Castellana, Mr. Victor de Satode",male,18,1,0,PC 17758,108.9,C65,C
|
508 |
+
507,1,2,"Quick, Mrs. Frederick Charles (Jane Richards)",female,33,0,2,26360,26,,S
|
509 |
+
508,1,1,"Bradley, Mr. George (""George Arthur Brayton"")",male,,0,0,111427,26.55,,S
|
510 |
+
509,0,3,"Olsen, Mr. Henry Margido",male,28,0,0,C 4001,22.525,,S
|
511 |
+
510,1,3,"Lang, Mr. Fang",male,26,0,0,1601,56.4958,,S
|
512 |
+
511,1,3,"Daly, Mr. Eugene Patrick",male,29,0,0,382651,7.75,,Q
|
513 |
+
512,0,3,"Webber, Mr. James",male,,0,0,SOTON/OQ 3101316,8.05,,S
|
514 |
+
513,1,1,"McGough, Mr. James Robert",male,36,0,0,PC 17473,26.2875,E25,S
|
515 |
+
514,1,1,"Rothschild, Mrs. Martin (Elizabeth L. Barrett)",female,54,1,0,PC 17603,59.4,,C
|
516 |
+
515,0,3,"Coleff, Mr. Satio",male,24,0,0,349209,7.4958,,S
|
517 |
+
516,0,1,"Walker, Mr. William Anderson",male,47,0,0,36967,34.0208,D46,S
|
518 |
+
517,1,2,"Lemore, Mrs. (Amelia Milley)",female,34,0,0,C.A. 34260,10.5,F33,S
|
519 |
+
518,0,3,"Ryan, Mr. Patrick",male,,0,0,371110,24.15,,Q
|
520 |
+
519,1,2,"Angle, Mrs. William A (Florence ""Mary"" Agnes Hughes)",female,36,1,0,226875,26,,S
|
521 |
+
520,0,3,"Pavlovic, Mr. Stefo",male,32,0,0,349242,7.8958,,S
|
522 |
+
521,1,1,"Perreault, Miss. Anne",female,30,0,0,12749,93.5,B73,S
|
523 |
+
522,0,3,"Vovk, Mr. Janko",male,22,0,0,349252,7.8958,,S
|
524 |
+
523,0,3,"Lahoud, Mr. Sarkis",male,,0,0,2624,7.225,,C
|
525 |
+
524,1,1,"Hippach, Mrs. Louis Albert (Ida Sophia Fischer)",female,44,0,1,111361,57.9792,B18,C
|
526 |
+
525,0,3,"Kassem, Mr. Fared",male,,0,0,2700,7.2292,,C
|
527 |
+
526,0,3,"Farrell, Mr. James",male,40.5,0,0,367232,7.75,,Q
|
528 |
+
527,1,2,"Ridsdale, Miss. Lucy",female,50,0,0,W./C. 14258,10.5,,S
|
529 |
+
528,0,1,"Farthing, Mr. John",male,,0,0,PC 17483,221.7792,C95,S
|
530 |
+
529,0,3,"Salonen, Mr. Johan Werner",male,39,0,0,3101296,7.925,,S
|
531 |
+
530,0,2,"Hocking, Mr. Richard George",male,23,2,1,29104,11.5,,S
|
532 |
+
531,1,2,"Quick, Miss. Phyllis May",female,2,1,1,26360,26,,S
|
533 |
+
532,0,3,"Toufik, Mr. Nakli",male,,0,0,2641,7.2292,,C
|
534 |
+
533,0,3,"Elias, Mr. Joseph Jr",male,17,1,1,2690,7.2292,,C
|
535 |
+
534,1,3,"Peter, Mrs. Catherine (Catherine Rizk)",female,,0,2,2668,22.3583,,C
|
536 |
+
535,0,3,"Cacic, Miss. Marija",female,30,0,0,315084,8.6625,,S
|
537 |
+
536,1,2,"Hart, Miss. Eva Miriam",female,7,0,2,F.C.C. 13529,26.25,,S
|
538 |
+
537,0,1,"Butt, Major. Archibald Willingham",male,45,0,0,113050,26.55,B38,S
|
539 |
+
538,1,1,"LeRoy, Miss. Bertha",female,30,0,0,PC 17761,106.425,,C
|
540 |
+
539,0,3,"Risien, Mr. Samuel Beard",male,,0,0,364498,14.5,,S
|
541 |
+
540,1,1,"Frolicher, Miss. Hedwig Margaritha",female,22,0,2,13568,49.5,B39,C
|
542 |
+
541,1,1,"Crosby, Miss. Harriet R",female,36,0,2,WE/P 5735,71,B22,S
|
543 |
+
542,0,3,"Andersson, Miss. Ingeborg Constanzia",female,9,4,2,347082,31.275,,S
|
544 |
+
543,0,3,"Andersson, Miss. Sigrid Elisabeth",female,11,4,2,347082,31.275,,S
|
545 |
+
544,1,2,"Beane, Mr. Edward",male,32,1,0,2908,26,,S
|
546 |
+
545,0,1,"Douglas, Mr. Walter Donald",male,50,1,0,PC 17761,106.425,C86,C
|
547 |
+
546,0,1,"Nicholson, Mr. Arthur Ernest",male,64,0,0,693,26,,S
|
548 |
+
547,1,2,"Beane, Mrs. Edward (Ethel Clarke)",female,19,1,0,2908,26,,S
|
549 |
+
548,1,2,"Padro y Manent, Mr. Julian",male,,0,0,SC/PARIS 2146,13.8625,,C
|
550 |
+
549,0,3,"Goldsmith, Mr. Frank John",male,33,1,1,363291,20.525,,S
|
551 |
+
550,1,2,"Davies, Master. John Morgan Jr",male,8,1,1,C.A. 33112,36.75,,S
|
552 |
+
551,1,1,"Thayer, Mr. John Borland Jr",male,17,0,2,17421,110.8833,C70,C
|
553 |
+
552,0,2,"Sharp, Mr. Percival James R",male,27,0,0,244358,26,,S
|
554 |
+
553,0,3,"O'Brien, Mr. Timothy",male,,0,0,330979,7.8292,,Q
|
555 |
+
554,1,3,"Leeni, Mr. Fahim (""Philip Zenni"")",male,22,0,0,2620,7.225,,C
|
556 |
+
555,1,3,"Ohman, Miss. Velin",female,22,0,0,347085,7.775,,S
|
557 |
+
556,0,1,"Wright, Mr. George",male,62,0,0,113807,26.55,,S
|
558 |
+
557,1,1,"Duff Gordon, Lady. (Lucille Christiana Sutherland) (""Mrs Morgan"")",female,48,1,0,11755,39.6,A16,C
|
559 |
+
558,0,1,"Robbins, Mr. Victor",male,,0,0,PC 17757,227.525,,C
|
560 |
+
559,1,1,"Taussig, Mrs. Emil (Tillie Mandelbaum)",female,39,1,1,110413,79.65,E67,S
|
561 |
+
560,1,3,"de Messemaeker, Mrs. Guillaume Joseph (Emma)",female,36,1,0,345572,17.4,,S
|
562 |
+
561,0,3,"Morrow, Mr. Thomas Rowan",male,,0,0,372622,7.75,,Q
|
563 |
+
562,0,3,"Sivic, Mr. Husein",male,40,0,0,349251,7.8958,,S
|
564 |
+
563,0,2,"Norman, Mr. Robert Douglas",male,28,0,0,218629,13.5,,S
|
565 |
+
564,0,3,"Simmons, Mr. John",male,,0,0,SOTON/OQ 392082,8.05,,S
|
566 |
+
565,0,3,"Meanwell, Miss. (Marion Ogden)",female,,0,0,SOTON/O.Q. 392087,8.05,,S
|
567 |
+
566,0,3,"Davies, Mr. Alfred J",male,24,2,0,A/4 48871,24.15,,S
|
568 |
+
567,0,3,"Stoytcheff, Mr. Ilia",male,19,0,0,349205,7.8958,,S
|
569 |
+
568,0,3,"Palsson, Mrs. Nils (Alma Cornelia Berglund)",female,29,0,4,349909,21.075,,S
|
570 |
+
569,0,3,"Doharr, Mr. Tannous",male,,0,0,2686,7.2292,,C
|
571 |
+
570,1,3,"Jonsson, Mr. Carl",male,32,0,0,350417,7.8542,,S
|
572 |
+
571,1,2,"Harris, Mr. George",male,62,0,0,S.W./PP 752,10.5,,S
|
573 |
+
572,1,1,"Appleton, Mrs. Edward Dale (Charlotte Lamson)",female,53,2,0,11769,51.4792,C101,S
|
574 |
+
573,1,1,"Flynn, Mr. John Irwin (""Irving"")",male,36,0,0,PC 17474,26.3875,E25,S
|
575 |
+
574,1,3,"Kelly, Miss. Mary",female,,0,0,14312,7.75,,Q
|
576 |
+
575,0,3,"Rush, Mr. Alfred George John",male,16,0,0,A/4. 20589,8.05,,S
|
577 |
+
576,0,3,"Patchett, Mr. George",male,19,0,0,358585,14.5,,S
|
578 |
+
577,1,2,"Garside, Miss. Ethel",female,34,0,0,243880,13,,S
|
579 |
+
578,1,1,"Silvey, Mrs. William Baird (Alice Munger)",female,39,1,0,13507,55.9,E44,S
|
580 |
+
579,0,3,"Caram, Mrs. Joseph (Maria Elias)",female,,1,0,2689,14.4583,,C
|
581 |
+
580,1,3,"Jussila, Mr. Eiriik",male,32,0,0,STON/O 2. 3101286,7.925,,S
|
582 |
+
581,1,2,"Christy, Miss. Julie Rachel",female,25,1,1,237789,30,,S
|
583 |
+
582,1,1,"Thayer, Mrs. John Borland (Marian Longstreth Morris)",female,39,1,1,17421,110.8833,C68,C
|
584 |
+
583,0,2,"Downton, Mr. William James",male,54,0,0,28403,26,,S
|
585 |
+
584,0,1,"Ross, Mr. John Hugo",male,36,0,0,13049,40.125,A10,C
|
586 |
+
585,0,3,"Paulner, Mr. Uscher",male,,0,0,3411,8.7125,,C
|
587 |
+
586,1,1,"Taussig, Miss. Ruth",female,18,0,2,110413,79.65,E68,S
|
588 |
+
587,0,2,"Jarvis, Mr. John Denzil",male,47,0,0,237565,15,,S
|
589 |
+
588,1,1,"Frolicher-Stehli, Mr. Maxmillian",male,60,1,1,13567,79.2,B41,C
|
590 |
+
589,0,3,"Gilinski, Mr. Eliezer",male,22,0,0,14973,8.05,,S
|
591 |
+
590,0,3,"Murdlin, Mr. Joseph",male,,0,0,A./5. 3235,8.05,,S
|
592 |
+
591,0,3,"Rintamaki, Mr. Matti",male,35,0,0,STON/O 2. 3101273,7.125,,S
|
593 |
+
592,1,1,"Stephenson, Mrs. Walter Bertram (Martha Eustis)",female,52,1,0,36947,78.2667,D20,C
|
594 |
+
593,0,3,"Elsbury, Mr. William James",male,47,0,0,A/5 3902,7.25,,S
|
595 |
+
594,0,3,"Bourke, Miss. Mary",female,,0,2,364848,7.75,,Q
|
596 |
+
595,0,2,"Chapman, Mr. John Henry",male,37,1,0,SC/AH 29037,26,,S
|
597 |
+
596,0,3,"Van Impe, Mr. Jean Baptiste",male,36,1,1,345773,24.15,,S
|
598 |
+
597,1,2,"Leitch, Miss. Jessie Wills",female,,0,0,248727,33,,S
|
599 |
+
598,0,3,"Johnson, Mr. Alfred",male,49,0,0,LINE,0,,S
|
600 |
+
599,0,3,"Boulos, Mr. Hanna",male,,0,0,2664,7.225,,C
|
601 |
+
600,1,1,"Duff Gordon, Sir. Cosmo Edmund (""Mr Morgan"")",male,49,1,0,PC 17485,56.9292,A20,C
|
602 |
+
601,1,2,"Jacobsohn, Mrs. Sidney Samuel (Amy Frances Christy)",female,24,2,1,243847,27,,S
|
603 |
+
602,0,3,"Slabenoff, Mr. Petco",male,,0,0,349214,7.8958,,S
|
604 |
+
603,0,1,"Harrington, Mr. Charles H",male,,0,0,113796,42.4,,S
|
605 |
+
604,0,3,"Torber, Mr. Ernst William",male,44,0,0,364511,8.05,,S
|
606 |
+
605,1,1,"Homer, Mr. Harry (""Mr E Haven"")",male,35,0,0,111426,26.55,,C
|
607 |
+
606,0,3,"Lindell, Mr. Edvard Bengtsson",male,36,1,0,349910,15.55,,S
|
608 |
+
607,0,3,"Karaic, Mr. Milan",male,30,0,0,349246,7.8958,,S
|
609 |
+
608,1,1,"Daniel, Mr. Robert Williams",male,27,0,0,113804,30.5,,S
|
610 |
+
609,1,2,"Laroche, Mrs. Joseph (Juliette Marie Louise Lafargue)",female,22,1,2,SC/Paris 2123,41.5792,,C
|
611 |
+
610,1,1,"Shutes, Miss. Elizabeth W",female,40,0,0,PC 17582,153.4625,C125,S
|
612 |
+
611,0,3,"Andersson, Mrs. Anders Johan (Alfrida Konstantia Brogren)",female,39,1,5,347082,31.275,,S
|
613 |
+
612,0,3,"Jardin, Mr. Jose Neto",male,,0,0,SOTON/O.Q. 3101305,7.05,,S
|
614 |
+
613,1,3,"Murphy, Miss. Margaret Jane",female,,1,0,367230,15.5,,Q
|
615 |
+
614,0,3,"Horgan, Mr. John",male,,0,0,370377,7.75,,Q
|
616 |
+
615,0,3,"Brocklebank, Mr. William Alfred",male,35,0,0,364512,8.05,,S
|
617 |
+
616,1,2,"Herman, Miss. Alice",female,24,1,2,220845,65,,S
|
618 |
+
617,0,3,"Danbom, Mr. Ernst Gilbert",male,34,1,1,347080,14.4,,S
|
619 |
+
618,0,3,"Lobb, Mrs. William Arthur (Cordelia K Stanlick)",female,26,1,0,A/5. 3336,16.1,,S
|
620 |
+
619,1,2,"Becker, Miss. Marion Louise",female,4,2,1,230136,39,F4,S
|
621 |
+
620,0,2,"Gavey, Mr. Lawrence",male,26,0,0,31028,10.5,,S
|
622 |
+
621,0,3,"Yasbeck, Mr. Antoni",male,27,1,0,2659,14.4542,,C
|
623 |
+
622,1,1,"Kimball, Mr. Edwin Nelson Jr",male,42,1,0,11753,52.5542,D19,S
|
624 |
+
623,1,3,"Nakid, Mr. Sahid",male,20,1,1,2653,15.7417,,C
|
625 |
+
624,0,3,"Hansen, Mr. Henry Damsgaard",male,21,0,0,350029,7.8542,,S
|
626 |
+
625,0,3,"Bowen, Mr. David John ""Dai""",male,21,0,0,54636,16.1,,S
|
627 |
+
626,0,1,"Sutton, Mr. Frederick",male,61,0,0,36963,32.3208,D50,S
|
628 |
+
627,0,2,"Kirkland, Rev. Charles Leonard",male,57,0,0,219533,12.35,,Q
|
629 |
+
628,1,1,"Longley, Miss. Gretchen Fiske",female,21,0,0,13502,77.9583,D9,S
|
630 |
+
629,0,3,"Bostandyeff, Mr. Guentcho",male,26,0,0,349224,7.8958,,S
|
631 |
+
630,0,3,"O'Connell, Mr. Patrick D",male,,0,0,334912,7.7333,,Q
|
632 |
+
631,1,1,"Barkworth, Mr. Algernon Henry Wilson",male,80,0,0,27042,30,A23,S
|
633 |
+
632,0,3,"Lundahl, Mr. Johan Svensson",male,51,0,0,347743,7.0542,,S
|
634 |
+
633,1,1,"Stahelin-Maeglin, Dr. Max",male,32,0,0,13214,30.5,B50,C
|
635 |
+
634,0,1,"Parr, Mr. William Henry Marsh",male,,0,0,112052,0,,S
|
636 |
+
635,0,3,"Skoog, Miss. Mabel",female,9,3,2,347088,27.9,,S
|
637 |
+
636,1,2,"Davis, Miss. Mary",female,28,0,0,237668,13,,S
|
638 |
+
637,0,3,"Leinonen, Mr. Antti Gustaf",male,32,0,0,STON/O 2. 3101292,7.925,,S
|
639 |
+
638,0,2,"Collyer, Mr. Harvey",male,31,1,1,C.A. 31921,26.25,,S
|
640 |
+
639,0,3,"Panula, Mrs. Juha (Maria Emilia Ojala)",female,41,0,5,3101295,39.6875,,S
|
641 |
+
640,0,3,"Thorneycroft, Mr. Percival",male,,1,0,376564,16.1,,S
|
642 |
+
641,0,3,"Jensen, Mr. Hans Peder",male,20,0,0,350050,7.8542,,S
|
643 |
+
642,1,1,"Sagesser, Mlle. Emma",female,24,0,0,PC 17477,69.3,B35,C
|
644 |
+
643,0,3,"Skoog, Miss. Margit Elizabeth",female,2,3,2,347088,27.9,,S
|
645 |
+
644,1,3,"Foo, Mr. Choong",male,,0,0,1601,56.4958,,S
|
646 |
+
645,1,3,"Baclini, Miss. Eugenie",female,0.75,2,1,2666,19.2583,,C
|
647 |
+
646,1,1,"Harper, Mr. Henry Sleeper",male,48,1,0,PC 17572,76.7292,D33,C
|
648 |
+
647,0,3,"Cor, Mr. Liudevit",male,19,0,0,349231,7.8958,,S
|
649 |
+
648,1,1,"Simonius-Blumer, Col. Oberst Alfons",male,56,0,0,13213,35.5,A26,C
|
650 |
+
649,0,3,"Willey, Mr. Edward",male,,0,0,S.O./P.P. 751,7.55,,S
|
651 |
+
650,1,3,"Stanley, Miss. Amy Zillah Elsie",female,23,0,0,CA. 2314,7.55,,S
|
652 |
+
651,0,3,"Mitkoff, Mr. Mito",male,,0,0,349221,7.8958,,S
|
653 |
+
652,1,2,"Doling, Miss. Elsie",female,18,0,1,231919,23,,S
|
654 |
+
653,0,3,"Kalvik, Mr. Johannes Halvorsen",male,21,0,0,8475,8.4333,,S
|
655 |
+
654,1,3,"O'Leary, Miss. Hanora ""Norah""",female,,0,0,330919,7.8292,,Q
|
656 |
+
655,0,3,"Hegarty, Miss. Hanora ""Nora""",female,18,0,0,365226,6.75,,Q
|
657 |
+
656,0,2,"Hickman, Mr. Leonard Mark",male,24,2,0,S.O.C. 14879,73.5,,S
|
658 |
+
657,0,3,"Radeff, Mr. Alexander",male,,0,0,349223,7.8958,,S
|
659 |
+
658,0,3,"Bourke, Mrs. John (Catherine)",female,32,1,1,364849,15.5,,Q
|
660 |
+
659,0,2,"Eitemiller, Mr. George Floyd",male,23,0,0,29751,13,,S
|
661 |
+
660,0,1,"Newell, Mr. Arthur Webster",male,58,0,2,35273,113.275,D48,C
|
662 |
+
661,1,1,"Frauenthal, Dr. Henry William",male,50,2,0,PC 17611,133.65,,S
|
663 |
+
662,0,3,"Badt, Mr. Mohamed",male,40,0,0,2623,7.225,,C
|
664 |
+
663,0,1,"Colley, Mr. Edward Pomeroy",male,47,0,0,5727,25.5875,E58,S
|
665 |
+
664,0,3,"Coleff, Mr. Peju",male,36,0,0,349210,7.4958,,S
|
666 |
+
665,1,3,"Lindqvist, Mr. Eino William",male,20,1,0,STON/O 2. 3101285,7.925,,S
|
667 |
+
666,0,2,"Hickman, Mr. Lewis",male,32,2,0,S.O.C. 14879,73.5,,S
|
668 |
+
667,0,2,"Butler, Mr. Reginald Fenton",male,25,0,0,234686,13,,S
|
669 |
+
668,0,3,"Rommetvedt, Mr. Knud Paust",male,,0,0,312993,7.775,,S
|
670 |
+
669,0,3,"Cook, Mr. Jacob",male,43,0,0,A/5 3536,8.05,,S
|
671 |
+
670,1,1,"Taylor, Mrs. Elmer Zebley (Juliet Cummins Wright)",female,,1,0,19996,52,C126,S
|
672 |
+
671,1,2,"Brown, Mrs. Thomas William Solomon (Elizabeth Catherine Ford)",female,40,1,1,29750,39,,S
|
673 |
+
672,0,1,"Davidson, Mr. Thornton",male,31,1,0,F.C. 12750,52,B71,S
|
674 |
+
673,0,2,"Mitchell, Mr. Henry Michael",male,70,0,0,C.A. 24580,10.5,,S
|
675 |
+
674,1,2,"Wilhelms, Mr. Charles",male,31,0,0,244270,13,,S
|
676 |
+
675,0,2,"Watson, Mr. Ennis Hastings",male,,0,0,239856,0,,S
|
677 |
+
676,0,3,"Edvardsson, Mr. Gustaf Hjalmar",male,18,0,0,349912,7.775,,S
|
678 |
+
677,0,3,"Sawyer, Mr. Frederick Charles",male,24.5,0,0,342826,8.05,,S
|
679 |
+
678,1,3,"Turja, Miss. Anna Sofia",female,18,0,0,4138,9.8417,,S
|
680 |
+
679,0,3,"Goodwin, Mrs. Frederick (Augusta Tyler)",female,43,1,6,CA 2144,46.9,,S
|
681 |
+
680,1,1,"Cardeza, Mr. Thomas Drake Martinez",male,36,0,1,PC 17755,512.3292,B51 B53 B55,C
|
682 |
+
681,0,3,"Peters, Miss. Katie",female,,0,0,330935,8.1375,,Q
|
683 |
+
682,1,1,"Hassab, Mr. Hammad",male,27,0,0,PC 17572,76.7292,D49,C
|
684 |
+
683,0,3,"Olsvigen, Mr. Thor Anderson",male,20,0,0,6563,9.225,,S
|
685 |
+
684,0,3,"Goodwin, Mr. Charles Edward",male,14,5,2,CA 2144,46.9,,S
|
686 |
+
685,0,2,"Brown, Mr. Thomas William Solomon",male,60,1,1,29750,39,,S
|
687 |
+
686,0,2,"Laroche, Mr. Joseph Philippe Lemercier",male,25,1,2,SC/Paris 2123,41.5792,,C
|
688 |
+
687,0,3,"Panula, Mr. Jaako Arnold",male,14,4,1,3101295,39.6875,,S
|
689 |
+
688,0,3,"Dakic, Mr. Branko",male,19,0,0,349228,10.1708,,S
|
690 |
+
689,0,3,"Fischer, Mr. Eberhard Thelander",male,18,0,0,350036,7.7958,,S
|
691 |
+
690,1,1,"Madill, Miss. Georgette Alexandra",female,15,0,1,24160,211.3375,B5,S
|
692 |
+
691,1,1,"Dick, Mr. Albert Adrian",male,31,1,0,17474,57,B20,S
|
693 |
+
692,1,3,"Karun, Miss. Manca",female,4,0,1,349256,13.4167,,C
|
694 |
+
693,1,3,"Lam, Mr. Ali",male,,0,0,1601,56.4958,,S
|
695 |
+
694,0,3,"Saad, Mr. Khalil",male,25,0,0,2672,7.225,,C
|
696 |
+
695,0,1,"Weir, Col. John",male,60,0,0,113800,26.55,,S
|
697 |
+
696,0,2,"Chapman, Mr. Charles Henry",male,52,0,0,248731,13.5,,S
|
698 |
+
697,0,3,"Kelly, Mr. James",male,44,0,0,363592,8.05,,S
|
699 |
+
698,1,3,"Mullens, Miss. Katherine ""Katie""",female,,0,0,35852,7.7333,,Q
|
700 |
+
699,0,1,"Thayer, Mr. John Borland",male,49,1,1,17421,110.8833,C68,C
|
701 |
+
700,0,3,"Humblen, Mr. Adolf Mathias Nicolai Olsen",male,42,0,0,348121,7.65,F G63,S
|
702 |
+
701,1,1,"Astor, Mrs. John Jacob (Madeleine Talmadge Force)",female,18,1,0,PC 17757,227.525,C62 C64,C
|
703 |
+
702,1,1,"Silverthorne, Mr. Spencer Victor",male,35,0,0,PC 17475,26.2875,E24,S
|
704 |
+
703,0,3,"Barbara, Miss. Saiide",female,18,0,1,2691,14.4542,,C
|
705 |
+
704,0,3,"Gallagher, Mr. Martin",male,25,0,0,36864,7.7417,,Q
|
706 |
+
705,0,3,"Hansen, Mr. Henrik Juul",male,26,1,0,350025,7.8542,,S
|
707 |
+
706,0,2,"Morley, Mr. Henry Samuel (""Mr Henry Marshall"")",male,39,0,0,250655,26,,S
|
708 |
+
707,1,2,"Kelly, Mrs. Florence ""Fannie""",female,45,0,0,223596,13.5,,S
|
709 |
+
708,1,1,"Calderhead, Mr. Edward Pennington",male,42,0,0,PC 17476,26.2875,E24,S
|
710 |
+
709,1,1,"Cleaver, Miss. Alice",female,22,0,0,113781,151.55,,S
|
711 |
+
710,1,3,"Moubarek, Master. Halim Gonios (""William George"")",male,,1,1,2661,15.2458,,C
|
712 |
+
711,1,1,"Mayne, Mlle. Berthe Antonine (""Mrs de Villiers"")",female,24,0,0,PC 17482,49.5042,C90,C
|
713 |
+
712,0,1,"Klaber, Mr. Herman",male,,0,0,113028,26.55,C124,S
|
714 |
+
713,1,1,"Taylor, Mr. Elmer Zebley",male,48,1,0,19996,52,C126,S
|
715 |
+
714,0,3,"Larsson, Mr. August Viktor",male,29,0,0,7545,9.4833,,S
|
716 |
+
715,0,2,"Greenberg, Mr. Samuel",male,52,0,0,250647,13,,S
|
717 |
+
716,0,3,"Soholt, Mr. Peter Andreas Lauritz Andersen",male,19,0,0,348124,7.65,F G73,S
|
718 |
+
717,1,1,"Endres, Miss. Caroline Louise",female,38,0,0,PC 17757,227.525,C45,C
|
719 |
+
718,1,2,"Troutt, Miss. Edwina Celia ""Winnie""",female,27,0,0,34218,10.5,E101,S
|
720 |
+
719,0,3,"McEvoy, Mr. Michael",male,,0,0,36568,15.5,,Q
|
721 |
+
720,0,3,"Johnson, Mr. Malkolm Joackim",male,33,0,0,347062,7.775,,S
|
722 |
+
721,1,2,"Harper, Miss. Annie Jessie ""Nina""",female,6,0,1,248727,33,,S
|
723 |
+
722,0,3,"Jensen, Mr. Svend Lauritz",male,17,1,0,350048,7.0542,,S
|
724 |
+
723,0,2,"Gillespie, Mr. William Henry",male,34,0,0,12233,13,,S
|
725 |
+
724,0,2,"Hodges, Mr. Henry Price",male,50,0,0,250643,13,,S
|
726 |
+
725,1,1,"Chambers, Mr. Norman Campbell",male,27,1,0,113806,53.1,E8,S
|
727 |
+
726,0,3,"Oreskovic, Mr. Luka",male,20,0,0,315094,8.6625,,S
|
728 |
+
727,1,2,"Renouf, Mrs. Peter Henry (Lillian Jefferys)",female,30,3,0,31027,21,,S
|
729 |
+
728,1,3,"Mannion, Miss. Margareth",female,,0,0,36866,7.7375,,Q
|
730 |
+
729,0,2,"Bryhl, Mr. Kurt Arnold Gottfrid",male,25,1,0,236853,26,,S
|
731 |
+
730,0,3,"Ilmakangas, Miss. Pieta Sofia",female,25,1,0,STON/O2. 3101271,7.925,,S
|
732 |
+
731,1,1,"Allen, Miss. Elisabeth Walton",female,29,0,0,24160,211.3375,B5,S
|
733 |
+
732,0,3,"Hassan, Mr. Houssein G N",male,11,0,0,2699,18.7875,,C
|
734 |
+
733,0,2,"Knight, Mr. Robert J",male,,0,0,239855,0,,S
|
735 |
+
734,0,2,"Berriman, Mr. William John",male,23,0,0,28425,13,,S
|
736 |
+
735,0,2,"Troupiansky, Mr. Moses Aaron",male,23,0,0,233639,13,,S
|
737 |
+
736,0,3,"Williams, Mr. Leslie",male,28.5,0,0,54636,16.1,,S
|
738 |
+
737,0,3,"Ford, Mrs. Edward (Margaret Ann Watson)",female,48,1,3,W./C. 6608,34.375,,S
|
739 |
+
738,1,1,"Lesurer, Mr. Gustave J",male,35,0,0,PC 17755,512.3292,B101,C
|
740 |
+
739,0,3,"Ivanoff, Mr. Kanio",male,,0,0,349201,7.8958,,S
|
741 |
+
740,0,3,"Nankoff, Mr. Minko",male,,0,0,349218,7.8958,,S
|
742 |
+
741,1,1,"Hawksford, Mr. Walter James",male,,0,0,16988,30,D45,S
|
743 |
+
742,0,1,"Cavendish, Mr. Tyrell William",male,36,1,0,19877,78.85,C46,S
|
744 |
+
743,1,1,"Ryerson, Miss. Susan Parker ""Suzette""",female,21,2,2,PC 17608,262.375,B57 B59 B63 B66,C
|
745 |
+
744,0,3,"McNamee, Mr. Neal",male,24,1,0,376566,16.1,,S
|
746 |
+
745,1,3,"Stranden, Mr. Juho",male,31,0,0,STON/O 2. 3101288,7.925,,S
|
747 |
+
746,0,1,"Crosby, Capt. Edward Gifford",male,70,1,1,WE/P 5735,71,B22,S
|
748 |
+
747,0,3,"Abbott, Mr. Rossmore Edward",male,16,1,1,C.A. 2673,20.25,,S
|
749 |
+
748,1,2,"Sinkkonen, Miss. Anna",female,30,0,0,250648,13,,S
|
750 |
+
749,0,1,"Marvin, Mr. Daniel Warner",male,19,1,0,113773,53.1,D30,S
|
751 |
+
750,0,3,"Connaghton, Mr. Michael",male,31,0,0,335097,7.75,,Q
|
752 |
+
751,1,2,"Wells, Miss. Joan",female,4,1,1,29103,23,,S
|
753 |
+
752,1,3,"Moor, Master. Meier",male,6,0,1,392096,12.475,E121,S
|
754 |
+
753,0,3,"Vande Velde, Mr. Johannes Joseph",male,33,0,0,345780,9.5,,S
|
755 |
+
754,0,3,"Jonkoff, Mr. Lalio",male,23,0,0,349204,7.8958,,S
|
756 |
+
755,1,2,"Herman, Mrs. Samuel (Jane Laver)",female,48,1,2,220845,65,,S
|
757 |
+
756,1,2,"Hamalainen, Master. Viljo",male,0.67,1,1,250649,14.5,,S
|
758 |
+
757,0,3,"Carlsson, Mr. August Sigfrid",male,28,0,0,350042,7.7958,,S
|
759 |
+
758,0,2,"Bailey, Mr. Percy Andrew",male,18,0,0,29108,11.5,,S
|
760 |
+
759,0,3,"Theobald, Mr. Thomas Leonard",male,34,0,0,363294,8.05,,S
|
761 |
+
760,1,1,"Rothes, the Countess. of (Lucy Noel Martha Dyer-Edwards)",female,33,0,0,110152,86.5,B77,S
|
762 |
+
761,0,3,"Garfirth, Mr. John",male,,0,0,358585,14.5,,S
|
763 |
+
762,0,3,"Nirva, Mr. Iisakki Antino Aijo",male,41,0,0,SOTON/O2 3101272,7.125,,S
|
764 |
+
763,1,3,"Barah, Mr. Hanna Assi",male,20,0,0,2663,7.2292,,C
|
765 |
+
764,1,1,"Carter, Mrs. William Ernest (Lucile Polk)",female,36,1,2,113760,120,B96 B98,S
|
766 |
+
765,0,3,"Eklund, Mr. Hans Linus",male,16,0,0,347074,7.775,,S
|
767 |
+
766,1,1,"Hogeboom, Mrs. John C (Anna Andrews)",female,51,1,0,13502,77.9583,D11,S
|
768 |
+
767,0,1,"Brewe, Dr. Arthur Jackson",male,,0,0,112379,39.6,,C
|
769 |
+
768,0,3,"Mangan, Miss. Mary",female,30.5,0,0,364850,7.75,,Q
|
770 |
+
769,0,3,"Moran, Mr. Daniel J",male,,1,0,371110,24.15,,Q
|
771 |
+
770,0,3,"Gronnestad, Mr. Daniel Danielsen",male,32,0,0,8471,8.3625,,S
|
772 |
+
771,0,3,"Lievens, Mr. Rene Aime",male,24,0,0,345781,9.5,,S
|
773 |
+
772,0,3,"Jensen, Mr. Niels Peder",male,48,0,0,350047,7.8542,,S
|
774 |
+
773,0,2,"Mack, Mrs. (Mary)",female,57,0,0,S.O./P.P. 3,10.5,E77,S
|
775 |
+
774,0,3,"Elias, Mr. Dibo",male,,0,0,2674,7.225,,C
|
776 |
+
775,1,2,"Hocking, Mrs. Elizabeth (Eliza Needs)",female,54,1,3,29105,23,,S
|
777 |
+
776,0,3,"Myhrman, Mr. Pehr Fabian Oliver Malkolm",male,18,0,0,347078,7.75,,S
|
778 |
+
777,0,3,"Tobin, Mr. Roger",male,,0,0,383121,7.75,F38,Q
|
779 |
+
778,1,3,"Emanuel, Miss. Virginia Ethel",female,5,0,0,364516,12.475,,S
|
780 |
+
779,0,3,"Kilgannon, Mr. Thomas J",male,,0,0,36865,7.7375,,Q
|
781 |
+
780,1,1,"Robert, Mrs. Edward Scott (Elisabeth Walton McMillan)",female,43,0,1,24160,211.3375,B3,S
|
782 |
+
781,1,3,"Ayoub, Miss. Banoura",female,13,0,0,2687,7.2292,,C
|
783 |
+
782,1,1,"Dick, Mrs. Albert Adrian (Vera Gillespie)",female,17,1,0,17474,57,B20,S
|
784 |
+
783,0,1,"Long, Mr. Milton Clyde",male,29,0,0,113501,30,D6,S
|
785 |
+
784,0,3,"Johnston, Mr. Andrew G",male,,1,2,W./C. 6607,23.45,,S
|
786 |
+
785,0,3,"Ali, Mr. William",male,25,0,0,SOTON/O.Q. 3101312,7.05,,S
|
787 |
+
786,0,3,"Harmer, Mr. Abraham (David Lishin)",male,25,0,0,374887,7.25,,S
|
788 |
+
787,1,3,"Sjoblom, Miss. Anna Sofia",female,18,0,0,3101265,7.4958,,S
|
789 |
+
788,0,3,"Rice, Master. George Hugh",male,8,4,1,382652,29.125,,Q
|
790 |
+
789,1,3,"Dean, Master. Bertram Vere",male,1,1,2,C.A. 2315,20.575,,S
|
791 |
+
790,0,1,"Guggenheim, Mr. Benjamin",male,46,0,0,PC 17593,79.2,B82 B84,C
|
792 |
+
791,0,3,"Keane, Mr. Andrew ""Andy""",male,,0,0,12460,7.75,,Q
|
793 |
+
792,0,2,"Gaskell, Mr. Alfred",male,16,0,0,239865,26,,S
|
794 |
+
793,0,3,"Sage, Miss. Stella Anna",female,,8,2,CA. 2343,69.55,,S
|
795 |
+
794,0,1,"Hoyt, Mr. William Fisher",male,,0,0,PC 17600,30.6958,,C
|
796 |
+
795,0,3,"Dantcheff, Mr. Ristiu",male,25,0,0,349203,7.8958,,S
|
797 |
+
796,0,2,"Otter, Mr. Richard",male,39,0,0,28213,13,,S
|
798 |
+
797,1,1,"Leader, Dr. Alice (Farnham)",female,49,0,0,17465,25.9292,D17,S
|
799 |
+
798,1,3,"Osman, Mrs. Mara",female,31,0,0,349244,8.6833,,S
|
800 |
+
799,0,3,"Ibrahim Shawah, Mr. Yousseff",male,30,0,0,2685,7.2292,,C
|
801 |
+
800,0,3,"Van Impe, Mrs. Jean Baptiste (Rosalie Paula Govaert)",female,30,1,1,345773,24.15,,S
|
802 |
+
801,0,2,"Ponesell, Mr. Martin",male,34,0,0,250647,13,,S
|
803 |
+
802,1,2,"Collyer, Mrs. Harvey (Charlotte Annie Tate)",female,31,1,1,C.A. 31921,26.25,,S
|
804 |
+
803,1,1,"Carter, Master. William Thornton II",male,11,1,2,113760,120,B96 B98,S
|
805 |
+
804,1,3,"Thomas, Master. Assad Alexander",male,0.42,0,1,2625,8.5167,,C
|
806 |
+
805,1,3,"Hedman, Mr. Oskar Arvid",male,27,0,0,347089,6.975,,S
|
807 |
+
806,0,3,"Johansson, Mr. Karl Johan",male,31,0,0,347063,7.775,,S
|
808 |
+
807,0,1,"Andrews, Mr. Thomas Jr",male,39,0,0,112050,0,A36,S
|
809 |
+
808,0,3,"Pettersson, Miss. Ellen Natalia",female,18,0,0,347087,7.775,,S
|
810 |
+
809,0,2,"Meyer, Mr. August",male,39,0,0,248723,13,,S
|
811 |
+
810,1,1,"Chambers, Mrs. Norman Campbell (Bertha Griggs)",female,33,1,0,113806,53.1,E8,S
|
812 |
+
811,0,3,"Alexander, Mr. William",male,26,0,0,3474,7.8875,,S
|
813 |
+
812,0,3,"Lester, Mr. James",male,39,0,0,A/4 48871,24.15,,S
|
814 |
+
813,0,2,"Slemen, Mr. Richard James",male,35,0,0,28206,10.5,,S
|
815 |
+
814,0,3,"Andersson, Miss. Ebba Iris Alfrida",female,6,4,2,347082,31.275,,S
|
816 |
+
815,0,3,"Tomlin, Mr. Ernest Portage",male,30.5,0,0,364499,8.05,,S
|
817 |
+
816,0,1,"Fry, Mr. Richard",male,,0,0,112058,0,B102,S
|
818 |
+
817,0,3,"Heininen, Miss. Wendla Maria",female,23,0,0,STON/O2. 3101290,7.925,,S
|
819 |
+
818,0,2,"Mallet, Mr. Albert",male,31,1,1,S.C./PARIS 2079,37.0042,,C
|
820 |
+
819,0,3,"Holm, Mr. John Fredrik Alexander",male,43,0,0,C 7075,6.45,,S
|
821 |
+
820,0,3,"Skoog, Master. Karl Thorsten",male,10,3,2,347088,27.9,,S
|
822 |
+
821,1,1,"Hays, Mrs. Charles Melville (Clara Jennings Gregg)",female,52,1,1,12749,93.5,B69,S
|
823 |
+
822,1,3,"Lulic, Mr. Nikola",male,27,0,0,315098,8.6625,,S
|
824 |
+
823,0,1,"Reuchlin, Jonkheer. John George",male,38,0,0,19972,0,,S
|
825 |
+
824,1,3,"Moor, Mrs. (Beila)",female,27,0,1,392096,12.475,E121,S
|
826 |
+
825,0,3,"Panula, Master. Urho Abraham",male,2,4,1,3101295,39.6875,,S
|
827 |
+
826,0,3,"Flynn, Mr. John",male,,0,0,368323,6.95,,Q
|
828 |
+
827,0,3,"Lam, Mr. Len",male,,0,0,1601,56.4958,,S
|
829 |
+
828,1,2,"Mallet, Master. Andre",male,1,0,2,S.C./PARIS 2079,37.0042,,C
|
830 |
+
829,1,3,"McCormack, Mr. Thomas Joseph",male,,0,0,367228,7.75,,Q
|
831 |
+
830,1,1,"Stone, Mrs. George Nelson (Martha Evelyn)",female,62,0,0,113572,80,B28,
|
832 |
+
831,1,3,"Yasbeck, Mrs. Antoni (Selini Alexander)",female,15,1,0,2659,14.4542,,C
|
833 |
+
832,1,2,"Richards, Master. George Sibley",male,0.83,1,1,29106,18.75,,S
|
834 |
+
833,0,3,"Saad, Mr. Amin",male,,0,0,2671,7.2292,,C
|
835 |
+
834,0,3,"Augustsson, Mr. Albert",male,23,0,0,347468,7.8542,,S
|
836 |
+
835,0,3,"Allum, Mr. Owen George",male,18,0,0,2223,8.3,,S
|
837 |
+
836,1,1,"Compton, Miss. Sara Rebecca",female,39,1,1,PC 17756,83.1583,E49,C
|
838 |
+
837,0,3,"Pasic, Mr. Jakob",male,21,0,0,315097,8.6625,,S
|
839 |
+
838,0,3,"Sirota, Mr. Maurice",male,,0,0,392092,8.05,,S
|
840 |
+
839,1,3,"Chip, Mr. Chang",male,32,0,0,1601,56.4958,,S
|
841 |
+
840,1,1,"Marechal, Mr. Pierre",male,,0,0,11774,29.7,C47,C
|
842 |
+
841,0,3,"Alhomaki, Mr. Ilmari Rudolf",male,20,0,0,SOTON/O2 3101287,7.925,,S
|
843 |
+
842,0,2,"Mudd, Mr. Thomas Charles",male,16,0,0,S.O./P.P. 3,10.5,,S
|
844 |
+
843,1,1,"Serepeca, Miss. Augusta",female,30,0,0,113798,31,,C
|
845 |
+
844,0,3,"Lemberopolous, Mr. Peter L",male,34.5,0,0,2683,6.4375,,C
|
846 |
+
845,0,3,"Culumovic, Mr. Jeso",male,17,0,0,315090,8.6625,,S
|
847 |
+
846,0,3,"Abbing, Mr. Anthony",male,42,0,0,C.A. 5547,7.55,,S
|
848 |
+
847,0,3,"Sage, Mr. Douglas Bullen",male,,8,2,CA. 2343,69.55,,S
|
849 |
+
848,0,3,"Markoff, Mr. Marin",male,35,0,0,349213,7.8958,,C
|
850 |
+
849,0,2,"Harper, Rev. John",male,28,0,1,248727,33,,S
|
851 |
+
850,1,1,"Goldenberg, Mrs. Samuel L (Edwiga Grabowska)",female,,1,0,17453,89.1042,C92,C
|
852 |
+
851,0,3,"Andersson, Master. Sigvard Harald Elias",male,4,4,2,347082,31.275,,S
|
853 |
+
852,0,3,"Svensson, Mr. Johan",male,74,0,0,347060,7.775,,S
|
854 |
+
853,0,3,"Boulos, Miss. Nourelain",female,9,1,1,2678,15.2458,,C
|
855 |
+
854,1,1,"Lines, Miss. Mary Conover",female,16,0,1,PC 17592,39.4,D28,S
|
856 |
+
855,0,2,"Carter, Mrs. Ernest Courtenay (Lilian Hughes)",female,44,1,0,244252,26,,S
|
857 |
+
856,1,3,"Aks, Mrs. Sam (Leah Rosen)",female,18,0,1,392091,9.35,,S
|
858 |
+
857,1,1,"Wick, Mrs. George Dennick (Mary Hitchcock)",female,45,1,1,36928,164.8667,,S
|
859 |
+
858,1,1,"Daly, Mr. Peter Denis ",male,51,0,0,113055,26.55,E17,S
|
860 |
+
859,1,3,"Baclini, Mrs. Solomon (Latifa Qurban)",female,24,0,3,2666,19.2583,,C
|
861 |
+
860,0,3,"Razi, Mr. Raihed",male,,0,0,2629,7.2292,,C
|
862 |
+
861,0,3,"Hansen, Mr. Claus Peter",male,41,2,0,350026,14.1083,,S
|
863 |
+
862,0,2,"Giles, Mr. Frederick Edward",male,21,1,0,28134,11.5,,S
|
864 |
+
863,1,1,"Swift, Mrs. Frederick Joel (Margaret Welles Barron)",female,48,0,0,17466,25.9292,D17,S
|
865 |
+
864,0,3,"Sage, Miss. Dorothy Edith ""Dolly""",female,,8,2,CA. 2343,69.55,,S
|
866 |
+
865,0,2,"Gill, Mr. John William",male,24,0,0,233866,13,,S
|
867 |
+
866,1,2,"Bystrom, Mrs. (Karolina)",female,42,0,0,236852,13,,S
|
868 |
+
867,1,2,"Duran y More, Miss. Asuncion",female,27,1,0,SC/PARIS 2149,13.8583,,C
|
869 |
+
868,0,1,"Roebling, Mr. Washington Augustus II",male,31,0,0,PC 17590,50.4958,A24,S
|
870 |
+
869,0,3,"van Melkebeke, Mr. Philemon",male,,0,0,345777,9.5,,S
|
871 |
+
870,1,3,"Johnson, Master. Harold Theodor",male,4,1,1,347742,11.1333,,S
|
872 |
+
871,0,3,"Balkic, Mr. Cerin",male,26,0,0,349248,7.8958,,S
|
873 |
+
872,1,1,"Beckwith, Mrs. Richard Leonard (Sallie Monypeny)",female,47,1,1,11751,52.5542,D35,S
|
874 |
+
873,0,1,"Carlsson, Mr. Frans Olof",male,33,0,0,695,5,B51 B53 B55,S
|
875 |
+
874,0,3,"Vander Cruyssen, Mr. Victor",male,47,0,0,345765,9,,S
|
876 |
+
875,1,2,"Abelson, Mrs. Samuel (Hannah Wizosky)",female,28,1,0,P/PP 3381,24,,C
|
877 |
+
876,1,3,"Najib, Miss. Adele Kiamie ""Jane""",female,15,0,0,2667,7.225,,C
|
878 |
+
877,0,3,"Gustafsson, Mr. Alfred Ossian",male,20,0,0,7534,9.8458,,S
|
879 |
+
878,0,3,"Petroff, Mr. Nedelio",male,19,0,0,349212,7.8958,,S
|
880 |
+
879,0,3,"Laleff, Mr. Kristo",male,,0,0,349217,7.8958,,S
|
881 |
+
880,1,1,"Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)",female,56,0,1,11767,83.1583,C50,C
|
882 |
+
881,1,2,"Shelley, Mrs. William (Imanita Parrish Hall)",female,25,0,1,230433,26,,S
|
883 |
+
882,0,3,"Markun, Mr. Johann",male,33,0,0,349257,7.8958,,S
|
884 |
+
883,0,3,"Dahlberg, Miss. Gerda Ulrika",female,22,0,0,7552,10.5167,,S
|
885 |
+
884,0,2,"Banfield, Mr. Frederick James",male,28,0,0,C.A./SOTON 34068,10.5,,S
|
886 |
+
885,0,3,"Sutehall, Mr. Henry Jr",male,25,0,0,SOTON/OQ 392076,7.05,,S
|
887 |
+
886,0,3,"Rice, Mrs. William (Margaret Norton)",female,39,0,5,382652,29.125,,Q
|
888 |
+
887,0,2,"Montvila, Rev. Juozas",male,27,0,0,211536,13,,S
|
889 |
+
888,1,1,"Graham, Miss. Margaret Edith",female,19,0,0,112053,30,B42,S
|
890 |
+
889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.45,,S
|
891 |
+
890,1,1,"Behr, Mr. Karl Howell",male,26,0,0,111369,30,C148,C
|
892 |
+
891,0,3,"Dooley, Mr. Patrick",male,32,0,0,370376,7.75,,Q
|
Data Analitics/Week 3/1230-Article Text-1227-1-10-20080129.pdf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:d1e6f780ee44d6ab6f73f8e05ff66e96df7c413e1680b31c768fc36b71825d71
|
3 |
+
size 341360
|
Data Analitics/Week 3/L2-Data-Science-Life-Cycle.pdf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:835d3a570a76a16cba5ec8b44c57bf31439d4435a72880d620919509215b1648
|
3 |
+
size 2109539
|
Data Analitics/Week 3/Lab1-Introduction.pdf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:483029e96a11d38f5e1416d7808bbd3764d4d7f8c66e5f36bba32c94798eae85
|
3 |
+
size 2934417
|
Data Analitics/Week 3/Lab2-1.ipynb
ADDED
The diff for this file is too large to render.
See raw diff
|
|
Data Analitics/Week 3/Lab2-Data-Understanding.pdf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:544aa775cbbe89140a45a8a2162d23cd34b4ba506f9fad5a90f85eed3f121ba6
|
3 |
+
size 750777
|
Data Analitics/Week 3/TU257-Lab2-2-Data-Exploration.ipynb
ADDED
@@ -0,0 +1,1083 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"cells": [
|
3 |
+
{
|
4 |
+
"cell_type": "markdown",
|
5 |
+
"metadata": {},
|
6 |
+
"source": [
|
7 |
+
"TU258 - Lab 2-2 - Data Exploration\n",
|
8 |
+
"\n",
|
9 |
+
"This lab gives an example of using Pandas dataframes for analysing data\n"
|
10 |
+
]
|
11 |
+
},
|
12 |
+
{
|
13 |
+
"cell_type": "code",
|
14 |
+
"execution_count": 13,
|
15 |
+
"metadata": {},
|
16 |
+
"outputs": [],
|
17 |
+
"source": [
|
18 |
+
"#import pandas\n",
|
19 |
+
"import pandas as pd\n"
|
20 |
+
]
|
21 |
+
},
|
22 |
+
{
|
23 |
+
"cell_type": "code",
|
24 |
+
"execution_count": 14,
|
25 |
+
"metadata": {},
|
26 |
+
"outputs": [],
|
27 |
+
"source": [
|
28 |
+
"#reading a CSV File into a Panda\n",
|
29 |
+
"videoReview = pd.read_csv(r\"C:\\\\Users\\\\Rafael\\\\Documents\\\\DataScience\\\\Data Analitics\\\\Week 3\\\\Video_Games_Sales_as_at_22_Dec_2016.csv\")"
|
30 |
+
]
|
31 |
+
},
|
32 |
+
{
|
33 |
+
"cell_type": "code",
|
34 |
+
"execution_count": 15,
|
35 |
+
"metadata": {},
|
36 |
+
"outputs": [
|
37 |
+
{
|
38 |
+
"name": "stdout",
|
39 |
+
"output_type": "stream",
|
40 |
+
"text": [
|
41 |
+
"# print first 3 rows\n",
|
42 |
+
" Name Platform Year_of_Release Genre Publisher NA_Sales \\\n",
|
43 |
+
"0 Wii Sports Wii 2006.0 Sports Nintendo 41.36 \n",
|
44 |
+
"1 Super Mario Bros. NES 1985.0 Platform Nintendo 29.08 \n",
|
45 |
+
"2 Mario Kart Wii Wii 2008.0 Racing Nintendo 15.68 \n",
|
46 |
+
"\n",
|
47 |
+
" EU_Sales JP_Sales Other_Sales Global_Sales Critic_Score Critic_Count \\\n",
|
48 |
+
"0 28.96 3.77 8.45 82.53 76.0 51.0 \n",
|
49 |
+
"1 3.58 6.81 0.77 40.24 NaN NaN \n",
|
50 |
+
"2 12.76 3.79 3.29 35.52 82.0 73.0 \n",
|
51 |
+
"\n",
|
52 |
+
" User_Score User_Count Developer Rating \n",
|
53 |
+
"0 8 322.0 Nintendo E \n",
|
54 |
+
"1 NaN NaN NaN NaN \n",
|
55 |
+
"2 8.3 709.0 Nintendo E \n"
|
56 |
+
]
|
57 |
+
}
|
58 |
+
],
|
59 |
+
"source": [
|
60 |
+
"print('# print first 3 rows')\n",
|
61 |
+
"print(videoReview[:3])\n",
|
62 |
+
"\n"
|
63 |
+
]
|
64 |
+
},
|
65 |
+
{
|
66 |
+
"cell_type": "code",
|
67 |
+
"execution_count": 16,
|
68 |
+
"metadata": {},
|
69 |
+
"outputs": [
|
70 |
+
{
|
71 |
+
"data": {
|
72 |
+
"text/html": [
|
73 |
+
"<div>\n",
|
74 |
+
"<style scoped>\n",
|
75 |
+
" .dataframe tbody tr th:only-of-type {\n",
|
76 |
+
" vertical-align: middle;\n",
|
77 |
+
" }\n",
|
78 |
+
"\n",
|
79 |
+
" .dataframe tbody tr th {\n",
|
80 |
+
" vertical-align: top;\n",
|
81 |
+
" }\n",
|
82 |
+
"\n",
|
83 |
+
" .dataframe thead th {\n",
|
84 |
+
" text-align: right;\n",
|
85 |
+
" }\n",
|
86 |
+
"</style>\n",
|
87 |
+
"<table border=\"1\" class=\"dataframe\">\n",
|
88 |
+
" <thead>\n",
|
89 |
+
" <tr style=\"text-align: right;\">\n",
|
90 |
+
" <th></th>\n",
|
91 |
+
" <th>Name</th>\n",
|
92 |
+
" <th>Platform</th>\n",
|
93 |
+
" <th>Year_of_Release</th>\n",
|
94 |
+
" <th>Genre</th>\n",
|
95 |
+
" <th>Publisher</th>\n",
|
96 |
+
" <th>NA_Sales</th>\n",
|
97 |
+
" <th>EU_Sales</th>\n",
|
98 |
+
" <th>JP_Sales</th>\n",
|
99 |
+
" <th>Other_Sales</th>\n",
|
100 |
+
" <th>Global_Sales</th>\n",
|
101 |
+
" <th>Critic_Score</th>\n",
|
102 |
+
" <th>Critic_Count</th>\n",
|
103 |
+
" <th>User_Score</th>\n",
|
104 |
+
" <th>User_Count</th>\n",
|
105 |
+
" <th>Developer</th>\n",
|
106 |
+
" <th>Rating</th>\n",
|
107 |
+
" </tr>\n",
|
108 |
+
" </thead>\n",
|
109 |
+
" <tbody>\n",
|
110 |
+
" <tr>\n",
|
111 |
+
" <th>0</th>\n",
|
112 |
+
" <td>Wii Sports</td>\n",
|
113 |
+
" <td>Wii</td>\n",
|
114 |
+
" <td>2006.0</td>\n",
|
115 |
+
" <td>Sports</td>\n",
|
116 |
+
" <td>Nintendo</td>\n",
|
117 |
+
" <td>41.36</td>\n",
|
118 |
+
" <td>28.96</td>\n",
|
119 |
+
" <td>3.77</td>\n",
|
120 |
+
" <td>8.45</td>\n",
|
121 |
+
" <td>82.53</td>\n",
|
122 |
+
" <td>76.0</td>\n",
|
123 |
+
" <td>51.0</td>\n",
|
124 |
+
" <td>8</td>\n",
|
125 |
+
" <td>322.0</td>\n",
|
126 |
+
" <td>Nintendo</td>\n",
|
127 |
+
" <td>E</td>\n",
|
128 |
+
" </tr>\n",
|
129 |
+
" <tr>\n",
|
130 |
+
" <th>1</th>\n",
|
131 |
+
" <td>Super Mario Bros.</td>\n",
|
132 |
+
" <td>NES</td>\n",
|
133 |
+
" <td>1985.0</td>\n",
|
134 |
+
" <td>Platform</td>\n",
|
135 |
+
" <td>Nintendo</td>\n",
|
136 |
+
" <td>29.08</td>\n",
|
137 |
+
" <td>3.58</td>\n",
|
138 |
+
" <td>6.81</td>\n",
|
139 |
+
" <td>0.77</td>\n",
|
140 |
+
" <td>40.24</td>\n",
|
141 |
+
" <td>NaN</td>\n",
|
142 |
+
" <td>NaN</td>\n",
|
143 |
+
" <td>NaN</td>\n",
|
144 |
+
" <td>NaN</td>\n",
|
145 |
+
" <td>NaN</td>\n",
|
146 |
+
" <td>NaN</td>\n",
|
147 |
+
" </tr>\n",
|
148 |
+
" <tr>\n",
|
149 |
+
" <th>2</th>\n",
|
150 |
+
" <td>Mario Kart Wii</td>\n",
|
151 |
+
" <td>Wii</td>\n",
|
152 |
+
" <td>2008.0</td>\n",
|
153 |
+
" <td>Racing</td>\n",
|
154 |
+
" <td>Nintendo</td>\n",
|
155 |
+
" <td>15.68</td>\n",
|
156 |
+
" <td>12.76</td>\n",
|
157 |
+
" <td>3.79</td>\n",
|
158 |
+
" <td>3.29</td>\n",
|
159 |
+
" <td>35.52</td>\n",
|
160 |
+
" <td>82.0</td>\n",
|
161 |
+
" <td>73.0</td>\n",
|
162 |
+
" <td>8.3</td>\n",
|
163 |
+
" <td>709.0</td>\n",
|
164 |
+
" <td>Nintendo</td>\n",
|
165 |
+
" <td>E</td>\n",
|
166 |
+
" </tr>\n",
|
167 |
+
" </tbody>\n",
|
168 |
+
"</table>\n",
|
169 |
+
"</div>"
|
170 |
+
],
|
171 |
+
"text/plain": [
|
172 |
+
" Name Platform Year_of_Release Genre Publisher NA_Sales \\\n",
|
173 |
+
"0 Wii Sports Wii 2006.0 Sports Nintendo 41.36 \n",
|
174 |
+
"1 Super Mario Bros. NES 1985.0 Platform Nintendo 29.08 \n",
|
175 |
+
"2 Mario Kart Wii Wii 2008.0 Racing Nintendo 15.68 \n",
|
176 |
+
"\n",
|
177 |
+
" EU_Sales JP_Sales Other_Sales Global_Sales Critic_Score Critic_Count \\\n",
|
178 |
+
"0 28.96 3.77 8.45 82.53 76.0 51.0 \n",
|
179 |
+
"1 3.58 6.81 0.77 40.24 NaN NaN \n",
|
180 |
+
"2 12.76 3.79 3.29 35.52 82.0 73.0 \n",
|
181 |
+
"\n",
|
182 |
+
" User_Score User_Count Developer Rating \n",
|
183 |
+
"0 8 322.0 Nintendo E \n",
|
184 |
+
"1 NaN NaN NaN NaN \n",
|
185 |
+
"2 8.3 709.0 Nintendo E "
|
186 |
+
]
|
187 |
+
},
|
188 |
+
"execution_count": 16,
|
189 |
+
"metadata": {},
|
190 |
+
"output_type": "execute_result"
|
191 |
+
}
|
192 |
+
],
|
193 |
+
"source": [
|
194 |
+
"videoReview.head(3)"
|
195 |
+
]
|
196 |
+
},
|
197 |
+
{
|
198 |
+
"cell_type": "code",
|
199 |
+
"execution_count": 17,
|
200 |
+
"metadata": {},
|
201 |
+
"outputs": [
|
202 |
+
{
|
203 |
+
"name": "stdout",
|
204 |
+
"output_type": "stream",
|
205 |
+
"text": [
|
206 |
+
"----------\n",
|
207 |
+
"# print columns\n",
|
208 |
+
"0 Wii Sports\n",
|
209 |
+
"1 Super Mario Bros.\n",
|
210 |
+
"2 Mario Kart Wii\n",
|
211 |
+
"3 Wii Sports Resort\n",
|
212 |
+
"4 Pokemon Red/Pokemon Blue\n",
|
213 |
+
" ... \n",
|
214 |
+
"16714 Samurai Warriors: Sanada Maru\n",
|
215 |
+
"16715 LMA Manager 2007\n",
|
216 |
+
"16716 Haitaka no Psychedelica\n",
|
217 |
+
"16717 Spirits & Spells\n",
|
218 |
+
"16718 Winning Post 8 2016\n",
|
219 |
+
"Name: Name, Length: 16719, dtype: object\n"
|
220 |
+
]
|
221 |
+
}
|
222 |
+
],
|
223 |
+
"source": [
|
224 |
+
"print('----------')\n",
|
225 |
+
"print('# print columns')\n",
|
226 |
+
"print(videoReview['Name'])\n",
|
227 |
+
"\n"
|
228 |
+
]
|
229 |
+
},
|
230 |
+
{
|
231 |
+
"cell_type": "code",
|
232 |
+
"execution_count": 18,
|
233 |
+
"metadata": {},
|
234 |
+
"outputs": [
|
235 |
+
{
|
236 |
+
"data": {
|
237 |
+
"text/plain": [
|
238 |
+
"0 Wii Sports\n",
|
239 |
+
"1 Super Mario Bros.\n",
|
240 |
+
"2 Mario Kart Wii\n",
|
241 |
+
"3 Wii Sports Resort\n",
|
242 |
+
"4 Pokemon Red/Pokemon Blue\n",
|
243 |
+
" ... \n",
|
244 |
+
"16714 Samurai Warriors: Sanada Maru\n",
|
245 |
+
"16715 LMA Manager 2007\n",
|
246 |
+
"16716 Haitaka no Psychedelica\n",
|
247 |
+
"16717 Spirits & Spells\n",
|
248 |
+
"16718 Winning Post 8 2016\n",
|
249 |
+
"Name: Name, Length: 16719, dtype: object"
|
250 |
+
]
|
251 |
+
},
|
252 |
+
"execution_count": 18,
|
253 |
+
"metadata": {},
|
254 |
+
"output_type": "execute_result"
|
255 |
+
}
|
256 |
+
],
|
257 |
+
"source": [
|
258 |
+
"videoReview['Name']"
|
259 |
+
]
|
260 |
+
},
|
261 |
+
{
|
262 |
+
"cell_type": "code",
|
263 |
+
"execution_count": 19,
|
264 |
+
"metadata": {},
|
265 |
+
"outputs": [
|
266 |
+
{
|
267 |
+
"name": "stdout",
|
268 |
+
"output_type": "stream",
|
269 |
+
"text": [
|
270 |
+
"----------\n",
|
271 |
+
"# print columns, first 5 rows\n",
|
272 |
+
"0 Wii Sports\n",
|
273 |
+
"1 Super Mario Bros.\n",
|
274 |
+
"2 Mario Kart Wii\n",
|
275 |
+
"3 Wii Sports Resort\n",
|
276 |
+
"4 Pokemon Red/Pokemon Blue\n",
|
277 |
+
"Name: Name, dtype: object\n"
|
278 |
+
]
|
279 |
+
}
|
280 |
+
],
|
281 |
+
"source": [
|
282 |
+
"print('----------')\n",
|
283 |
+
"print('# print columns, first 5 rows')\n",
|
284 |
+
"print(videoReview['Name'][:5])\n",
|
285 |
+
"\n",
|
286 |
+
"#videoReview['Name'].head(5)\n",
|
287 |
+
"#videoReview['Name'].tail(5)\n"
|
288 |
+
]
|
289 |
+
},
|
290 |
+
{
|
291 |
+
"cell_type": "code",
|
292 |
+
"execution_count": 20,
|
293 |
+
"metadata": {},
|
294 |
+
"outputs": [
|
295 |
+
{
|
296 |
+
"name": "stdout",
|
297 |
+
"output_type": "stream",
|
298 |
+
"text": [
|
299 |
+
"----------\n",
|
300 |
+
"# Platform #\n",
|
301 |
+
"Platform\n",
|
302 |
+
"PS2 2161\n",
|
303 |
+
"DS 2152\n",
|
304 |
+
"PS3 1331\n",
|
305 |
+
"Wii 1320\n",
|
306 |
+
"X360 1262\n",
|
307 |
+
"PSP 1209\n",
|
308 |
+
"PS 1197\n",
|
309 |
+
"PC 974\n",
|
310 |
+
"XB 824\n",
|
311 |
+
"GBA 822\n",
|
312 |
+
"GC 556\n",
|
313 |
+
"3DS 520\n",
|
314 |
+
"PSV 432\n",
|
315 |
+
"PS4 393\n",
|
316 |
+
"N64 319\n",
|
317 |
+
"XOne 247\n",
|
318 |
+
"SNES 239\n",
|
319 |
+
"SAT 173\n",
|
320 |
+
"WiiU 147\n",
|
321 |
+
"2600 133\n",
|
322 |
+
"NES 98\n",
|
323 |
+
"GB 98\n",
|
324 |
+
"DC 52\n",
|
325 |
+
"GEN 29\n",
|
326 |
+
"NG 12\n",
|
327 |
+
"SCD 6\n",
|
328 |
+
"WS 6\n",
|
329 |
+
"3DO 3\n",
|
330 |
+
"TG16 2\n",
|
331 |
+
"GG 1\n",
|
332 |
+
"PCFX 1\n",
|
333 |
+
"Name: count, dtype: int64\n"
|
334 |
+
]
|
335 |
+
}
|
336 |
+
],
|
337 |
+
"source": [
|
338 |
+
"print('----------')\n",
|
339 |
+
"print('# Platform #')\n",
|
340 |
+
"print(videoReview['Platform'].value_counts())\n",
|
341 |
+
"\n"
|
342 |
+
]
|
343 |
+
},
|
344 |
+
{
|
345 |
+
"cell_type": "code",
|
346 |
+
"execution_count": 21,
|
347 |
+
"metadata": {},
|
348 |
+
"outputs": [
|
349 |
+
{
|
350 |
+
"name": "stdout",
|
351 |
+
"output_type": "stream",
|
352 |
+
"text": [
|
353 |
+
"----------\n",
|
354 |
+
"#shape\n",
|
355 |
+
"Number of rows = 16719\n",
|
356 |
+
"Number of columns = 16\n",
|
357 |
+
"Shape = (16719, 16)\n"
|
358 |
+
]
|
359 |
+
}
|
360 |
+
],
|
361 |
+
"source": [
|
362 |
+
"print('----------')\n",
|
363 |
+
"print('#shape')\n",
|
364 |
+
"print('Number of rows = ', videoReview.shape[0])\n",
|
365 |
+
"print('Number of columns = ', videoReview.shape[1])\n",
|
366 |
+
"print('Shape = ', videoReview.shape)\n",
|
367 |
+
"\n"
|
368 |
+
]
|
369 |
+
},
|
370 |
+
{
|
371 |
+
"cell_type": "code",
|
372 |
+
"execution_count": 22,
|
373 |
+
"metadata": {},
|
374 |
+
"outputs": [
|
375 |
+
{
|
376 |
+
"name": "stdout",
|
377 |
+
"output_type": "stream",
|
378 |
+
"text": [
|
379 |
+
"----------\n",
|
380 |
+
"# Name #\n",
|
381 |
+
"Name\n",
|
382 |
+
"Need for Speed: Most Wanted 12\n",
|
383 |
+
"FIFA 14 9\n",
|
384 |
+
"Ratatouille 9\n",
|
385 |
+
"LEGO Marvel Super Heroes 9\n",
|
386 |
+
"Madden NFL 07 9\n",
|
387 |
+
" ..\n",
|
388 |
+
"Jewels of the Tropical Lost Island 1\n",
|
389 |
+
"Sherlock Holmes and the Mystery of Osborne House 1\n",
|
390 |
+
"The King of Fighters '95 (CD) 1\n",
|
391 |
+
"Megamind: Mega Team Unite 1\n",
|
392 |
+
"Haitaka no Psychedelica 1\n",
|
393 |
+
"Name: count, Length: 11562, dtype: int64\n"
|
394 |
+
]
|
395 |
+
}
|
396 |
+
],
|
397 |
+
"source": [
|
398 |
+
"print('----------')\n",
|
399 |
+
"print('# Name #')\n",
|
400 |
+
"print(videoReview['Name'].value_counts())\n",
|
401 |
+
"\n"
|
402 |
+
]
|
403 |
+
},
|
404 |
+
{
|
405 |
+
"cell_type": "code",
|
406 |
+
"execution_count": 23,
|
407 |
+
"metadata": {},
|
408 |
+
"outputs": [
|
409 |
+
{
|
410 |
+
"name": "stdout",
|
411 |
+
"output_type": "stream",
|
412 |
+
"text": [
|
413 |
+
"----------\n",
|
414 |
+
"#head\n",
|
415 |
+
" Name Platform Year_of_Release Genre Publisher \\\n",
|
416 |
+
"0 Wii Sports Wii 2006.0 Sports Nintendo \n",
|
417 |
+
"1 Super Mario Bros. NES 1985.0 Platform Nintendo \n",
|
418 |
+
"2 Mario Kart Wii Wii 2008.0 Racing Nintendo \n",
|
419 |
+
"3 Wii Sports Resort Wii 2009.0 Sports Nintendo \n",
|
420 |
+
"4 Pokemon Red/Pokemon Blue GB 1996.0 Role-Playing Nintendo \n",
|
421 |
+
"\n",
|
422 |
+
" NA_Sales EU_Sales JP_Sales Other_Sales Global_Sales Critic_Score \\\n",
|
423 |
+
"0 41.36 28.96 3.77 8.45 82.53 76.0 \n",
|
424 |
+
"1 29.08 3.58 6.81 0.77 40.24 NaN \n",
|
425 |
+
"2 15.68 12.76 3.79 3.29 35.52 82.0 \n",
|
426 |
+
"3 15.61 10.93 3.28 2.95 32.77 80.0 \n",
|
427 |
+
"4 11.27 8.89 10.22 1.00 31.37 NaN \n",
|
428 |
+
"\n",
|
429 |
+
" Critic_Count User_Score User_Count Developer Rating \n",
|
430 |
+
"0 51.0 8 322.0 Nintendo E \n",
|
431 |
+
"1 NaN NaN NaN NaN NaN \n",
|
432 |
+
"2 73.0 8.3 709.0 Nintendo E \n",
|
433 |
+
"3 73.0 8 192.0 Nintendo E \n",
|
434 |
+
"4 NaN NaN NaN NaN NaN \n",
|
435 |
+
" Name Platform Year_of_Release Genre Publisher \\\n",
|
436 |
+
"0 Wii Sports Wii 2006.0 Sports Nintendo \n",
|
437 |
+
"1 Super Mario Bros. NES 1985.0 Platform Nintendo \n",
|
438 |
+
"2 Mario Kart Wii Wii 2008.0 Racing Nintendo \n",
|
439 |
+
"3 Wii Sports Resort Wii 2009.0 Sports Nintendo \n",
|
440 |
+
"4 Pokemon Red/Pokemon Blue GB 1996.0 Role-Playing Nintendo \n",
|
441 |
+
"5 Tetris GB 1989.0 Puzzle Nintendo \n",
|
442 |
+
"6 New Super Mario Bros. DS 2006.0 Platform Nintendo \n",
|
443 |
+
"7 Wii Play Wii 2006.0 Misc Nintendo \n",
|
444 |
+
"\n",
|
445 |
+
" NA_Sales EU_Sales JP_Sales Other_Sales Global_Sales Critic_Score \\\n",
|
446 |
+
"0 41.36 28.96 3.77 8.45 82.53 76.0 \n",
|
447 |
+
"1 29.08 3.58 6.81 0.77 40.24 NaN \n",
|
448 |
+
"2 15.68 12.76 3.79 3.29 35.52 82.0 \n",
|
449 |
+
"3 15.61 10.93 3.28 2.95 32.77 80.0 \n",
|
450 |
+
"4 11.27 8.89 10.22 1.00 31.37 NaN \n",
|
451 |
+
"5 23.20 2.26 4.22 0.58 30.26 NaN \n",
|
452 |
+
"6 11.28 9.14 6.50 2.88 29.80 89.0 \n",
|
453 |
+
"7 13.96 9.18 2.93 2.84 28.92 58.0 \n",
|
454 |
+
"\n",
|
455 |
+
" Critic_Count User_Score User_Count Developer Rating \n",
|
456 |
+
"0 51.0 8 322.0 Nintendo E \n",
|
457 |
+
"1 NaN NaN NaN NaN NaN \n",
|
458 |
+
"2 73.0 8.3 709.0 Nintendo E \n",
|
459 |
+
"3 73.0 8 192.0 Nintendo E \n",
|
460 |
+
"4 NaN NaN NaN NaN NaN \n",
|
461 |
+
"5 NaN NaN NaN NaN NaN \n",
|
462 |
+
"6 65.0 8.5 431.0 Nintendo E \n",
|
463 |
+
"7 41.0 6.6 129.0 Nintendo E \n"
|
464 |
+
]
|
465 |
+
}
|
466 |
+
],
|
467 |
+
"source": [
|
468 |
+
"print('----------')\n",
|
469 |
+
"print('#head')\n",
|
470 |
+
"print(videoReview.head())\n",
|
471 |
+
"print(videoReview.head(8))\n",
|
472 |
+
"\n"
|
473 |
+
]
|
474 |
+
},
|
475 |
+
{
|
476 |
+
"cell_type": "code",
|
477 |
+
"execution_count": 24,
|
478 |
+
"metadata": {},
|
479 |
+
"outputs": [
|
480 |
+
{
|
481 |
+
"name": "stdout",
|
482 |
+
"output_type": "stream",
|
483 |
+
"text": [
|
484 |
+
"----------\n",
|
485 |
+
"#tail\n",
|
486 |
+
" Name Platform Year_of_Release Genre \\\n",
|
487 |
+
"16714 Samurai Warriors: Sanada Maru PS3 2016.0 Action \n",
|
488 |
+
"16715 LMA Manager 2007 X360 2006.0 Sports \n",
|
489 |
+
"16716 Haitaka no Psychedelica PSV 2016.0 Adventure \n",
|
490 |
+
"16717 Spirits & Spells GBA 2003.0 Platform \n",
|
491 |
+
"16718 Winning Post 8 2016 PSV 2016.0 Simulation \n",
|
492 |
+
"\n",
|
493 |
+
" Publisher NA_Sales EU_Sales JP_Sales Other_Sales Global_Sales \\\n",
|
494 |
+
"16714 Tecmo Koei 0.00 0.00 0.01 0.0 0.01 \n",
|
495 |
+
"16715 Codemasters 0.00 0.01 0.00 0.0 0.01 \n",
|
496 |
+
"16716 Idea Factory 0.00 0.00 0.01 0.0 0.01 \n",
|
497 |
+
"16717 Wanadoo 0.01 0.00 0.00 0.0 0.01 \n",
|
498 |
+
"16718 Tecmo Koei 0.00 0.00 0.01 0.0 0.01 \n",
|
499 |
+
"\n",
|
500 |
+
" Critic_Score Critic_Count User_Score User_Count Developer Rating \n",
|
501 |
+
"16714 NaN NaN NaN NaN NaN NaN \n",
|
502 |
+
"16715 NaN NaN NaN NaN NaN NaN \n",
|
503 |
+
"16716 NaN NaN NaN NaN NaN NaN \n",
|
504 |
+
"16717 NaN NaN NaN NaN NaN NaN \n",
|
505 |
+
"16718 NaN NaN NaN NaN NaN NaN \n",
|
506 |
+
" Name Platform \\\n",
|
507 |
+
"16711 Aiyoku no Eustia PSV \n",
|
508 |
+
"16712 Woody Woodpecker in Crazy Castle 5 GBA \n",
|
509 |
+
"16713 SCORE International Baja 1000: The Official Game PS2 \n",
|
510 |
+
"16714 Samurai Warriors: Sanada Maru PS3 \n",
|
511 |
+
"16715 LMA Manager 2007 X360 \n",
|
512 |
+
"16716 Haitaka no Psychedelica PSV \n",
|
513 |
+
"16717 Spirits & Spells GBA \n",
|
514 |
+
"16718 Winning Post 8 2016 PSV \n",
|
515 |
+
"\n",
|
516 |
+
" Year_of_Release Genre Publisher NA_Sales EU_Sales \\\n",
|
517 |
+
"16711 2014.0 Misc dramatic create 0.00 0.00 \n",
|
518 |
+
"16712 2002.0 Platform Kemco 0.01 0.00 \n",
|
519 |
+
"16713 2008.0 Racing Activision 0.00 0.00 \n",
|
520 |
+
"16714 2016.0 Action Tecmo Koei 0.00 0.00 \n",
|
521 |
+
"16715 2006.0 Sports Codemasters 0.00 0.01 \n",
|
522 |
+
"16716 2016.0 Adventure Idea Factory 0.00 0.00 \n",
|
523 |
+
"16717 2003.0 Platform Wanadoo 0.01 0.00 \n",
|
524 |
+
"16718 2016.0 Simulation Tecmo Koei 0.00 0.00 \n",
|
525 |
+
"\n",
|
526 |
+
" JP_Sales Other_Sales Global_Sales Critic_Score Critic_Count \\\n",
|
527 |
+
"16711 0.01 0.0 0.01 NaN NaN \n",
|
528 |
+
"16712 0.00 0.0 0.01 NaN NaN \n",
|
529 |
+
"16713 0.00 0.0 0.01 NaN NaN \n",
|
530 |
+
"16714 0.01 0.0 0.01 NaN NaN \n",
|
531 |
+
"16715 0.00 0.0 0.01 NaN NaN \n",
|
532 |
+
"16716 0.01 0.0 0.01 NaN NaN \n",
|
533 |
+
"16717 0.00 0.0 0.01 NaN NaN \n",
|
534 |
+
"16718 0.01 0.0 0.01 NaN NaN \n",
|
535 |
+
"\n",
|
536 |
+
" User_Score User_Count Developer Rating \n",
|
537 |
+
"16711 NaN NaN NaN NaN \n",
|
538 |
+
"16712 NaN NaN NaN NaN \n",
|
539 |
+
"16713 NaN NaN NaN NaN \n",
|
540 |
+
"16714 NaN NaN NaN NaN \n",
|
541 |
+
"16715 NaN NaN NaN NaN \n",
|
542 |
+
"16716 NaN NaN NaN NaN \n",
|
543 |
+
"16717 NaN NaN NaN NaN \n",
|
544 |
+
"16718 NaN NaN NaN NaN \n"
|
545 |
+
]
|
546 |
+
}
|
547 |
+
],
|
548 |
+
"source": [
|
549 |
+
"print('----------')\n",
|
550 |
+
"print('#tail')\n",
|
551 |
+
"print(videoReview.tail())\n",
|
552 |
+
"print(videoReview.tail(8))\n",
|
553 |
+
"\n"
|
554 |
+
]
|
555 |
+
},
|
556 |
+
{
|
557 |
+
"cell_type": "code",
|
558 |
+
"execution_count": 25,
|
559 |
+
"metadata": {},
|
560 |
+
"outputs": [
|
561 |
+
{
|
562 |
+
"name": "stdout",
|
563 |
+
"output_type": "stream",
|
564 |
+
"text": [
|
565 |
+
"----------\n",
|
566 |
+
"#Describe\n",
|
567 |
+
" Year_of_Release NA_Sales EU_Sales JP_Sales \\\n",
|
568 |
+
"count 16450.000000 16719.000000 16719.000000 16719.000000 \n",
|
569 |
+
"mean 2006.487356 0.263330 0.145025 0.077602 \n",
|
570 |
+
"std 5.878995 0.813514 0.503283 0.308818 \n",
|
571 |
+
"min 1980.000000 0.000000 0.000000 0.000000 \n",
|
572 |
+
"25% 2003.000000 0.000000 0.000000 0.000000 \n",
|
573 |
+
"50% 2007.000000 0.080000 0.020000 0.000000 \n",
|
574 |
+
"75% 2010.000000 0.240000 0.110000 0.040000 \n",
|
575 |
+
"max 2020.000000 41.360000 28.960000 10.220000 \n",
|
576 |
+
"\n",
|
577 |
+
" Other_Sales Global_Sales Critic_Score Critic_Count User_Count \n",
|
578 |
+
"count 16719.000000 16719.000000 8137.000000 8137.000000 7590.000000 \n",
|
579 |
+
"mean 0.047332 0.533543 68.967679 26.360821 162.229908 \n",
|
580 |
+
"std 0.186710 1.547935 13.938165 18.980495 561.282326 \n",
|
581 |
+
"min 0.000000 0.010000 13.000000 3.000000 4.000000 \n",
|
582 |
+
"25% 0.000000 0.060000 60.000000 12.000000 10.000000 \n",
|
583 |
+
"50% 0.010000 0.170000 71.000000 21.000000 24.000000 \n",
|
584 |
+
"75% 0.030000 0.470000 79.000000 36.000000 81.000000 \n",
|
585 |
+
"max 10.570000 82.530000 98.000000 113.000000 10665.000000 \n",
|
586 |
+
"count 16719\n",
|
587 |
+
"unique 31\n",
|
588 |
+
"top PS2\n",
|
589 |
+
"freq 2161\n",
|
590 |
+
"Name: Platform, dtype: object\n",
|
591 |
+
"count 16450.000000\n",
|
592 |
+
"mean 2006.487356\n",
|
593 |
+
"std 5.878995\n",
|
594 |
+
"min 1980.000000\n",
|
595 |
+
"25% 2003.000000\n",
|
596 |
+
"50% 2007.000000\n",
|
597 |
+
"75% 2010.000000\n",
|
598 |
+
"max 2020.000000\n",
|
599 |
+
"Name: Year_of_Release, dtype: float64\n"
|
600 |
+
]
|
601 |
+
}
|
602 |
+
],
|
603 |
+
"source": [
|
604 |
+
"print('----------')\n",
|
605 |
+
"print('#Describe')\n",
|
606 |
+
"print(videoReview.describe()) # calculates measures of central tendency\n",
|
607 |
+
"print(videoReview['Platform'].describe())\n",
|
608 |
+
"print(videoReview['Year_of_Release'].describe())\n",
|
609 |
+
"\n"
|
610 |
+
]
|
611 |
+
},
|
612 |
+
{
|
613 |
+
"cell_type": "code",
|
614 |
+
"execution_count": 26,
|
615 |
+
"metadata": {},
|
616 |
+
"outputs": [
|
617 |
+
{
|
618 |
+
"name": "stdout",
|
619 |
+
"output_type": "stream",
|
620 |
+
"text": [
|
621 |
+
"----------\n",
|
622 |
+
"#info\n",
|
623 |
+
"<class 'pandas.core.frame.DataFrame'>\n",
|
624 |
+
"RangeIndex: 16719 entries, 0 to 16718\n",
|
625 |
+
"Data columns (total 16 columns):\n",
|
626 |
+
" # Column Non-Null Count Dtype \n",
|
627 |
+
"--- ------ -------------- ----- \n",
|
628 |
+
" 0 Name 16717 non-null object \n",
|
629 |
+
" 1 Platform 16719 non-null object \n",
|
630 |
+
" 2 Year_of_Release 16450 non-null float64\n",
|
631 |
+
" 3 Genre 16717 non-null object \n",
|
632 |
+
" 4 Publisher 16665 non-null object \n",
|
633 |
+
" 5 NA_Sales 16719 non-null float64\n",
|
634 |
+
" 6 EU_Sales 16719 non-null float64\n",
|
635 |
+
" 7 JP_Sales 16719 non-null float64\n",
|
636 |
+
" 8 Other_Sales 16719 non-null float64\n",
|
637 |
+
" 9 Global_Sales 16719 non-null float64\n",
|
638 |
+
" 10 Critic_Score 8137 non-null float64\n",
|
639 |
+
" 11 Critic_Count 8137 non-null float64\n",
|
640 |
+
" 12 User_Score 10015 non-null object \n",
|
641 |
+
" 13 User_Count 7590 non-null float64\n",
|
642 |
+
" 14 Developer 10096 non-null object \n",
|
643 |
+
" 15 Rating 9950 non-null object \n",
|
644 |
+
"dtypes: float64(9), object(7)\n",
|
645 |
+
"memory usage: 2.0+ MB\n",
|
646 |
+
"None\n"
|
647 |
+
]
|
648 |
+
}
|
649 |
+
],
|
650 |
+
"source": [
|
651 |
+
"print('----------')\n",
|
652 |
+
"print('#info')\n",
|
653 |
+
"print(videoReview.info()) # memory footprint and datatypes"
|
654 |
+
]
|
655 |
+
},
|
656 |
+
{
|
657 |
+
"cell_type": "code",
|
658 |
+
"execution_count": 27,
|
659 |
+
"metadata": {},
|
660 |
+
"outputs": [
|
661 |
+
{
|
662 |
+
"name": "stdout",
|
663 |
+
"output_type": "stream",
|
664 |
+
"text": [
|
665 |
+
"----------\n",
|
666 |
+
"Transpose - Describe\n",
|
667 |
+
" count mean std min 25% 50% \\\n",
|
668 |
+
"Year_of_Release 16450.0 2006.487356 5.878995 1980.00 2003.00 2007.00 \n",
|
669 |
+
"NA_Sales 16719.0 0.263330 0.813514 0.00 0.00 0.08 \n",
|
670 |
+
"EU_Sales 16719.0 0.145025 0.503283 0.00 0.00 0.02 \n",
|
671 |
+
"JP_Sales 16719.0 0.077602 0.308818 0.00 0.00 0.00 \n",
|
672 |
+
"Other_Sales 16719.0 0.047332 0.186710 0.00 0.00 0.01 \n",
|
673 |
+
"Global_Sales 16719.0 0.533543 1.547935 0.01 0.06 0.17 \n",
|
674 |
+
"Critic_Score 8137.0 68.967679 13.938165 13.00 60.00 71.00 \n",
|
675 |
+
"Critic_Count 8137.0 26.360821 18.980495 3.00 12.00 21.00 \n",
|
676 |
+
"User_Count 7590.0 162.229908 561.282326 4.00 10.00 24.00 \n",
|
677 |
+
"\n",
|
678 |
+
" 75% max \n",
|
679 |
+
"Year_of_Release 2010.00 2020.00 \n",
|
680 |
+
"NA_Sales 0.24 41.36 \n",
|
681 |
+
"EU_Sales 0.11 28.96 \n",
|
682 |
+
"JP_Sales 0.04 10.22 \n",
|
683 |
+
"Other_Sales 0.03 10.57 \n",
|
684 |
+
"Global_Sales 0.47 82.53 \n",
|
685 |
+
"Critic_Score 79.00 98.00 \n",
|
686 |
+
"Critic_Count 36.00 113.00 \n",
|
687 |
+
"User_Count 81.00 10665.00 \n"
|
688 |
+
]
|
689 |
+
}
|
690 |
+
],
|
691 |
+
"source": [
|
692 |
+
"print('----------')\n",
|
693 |
+
"print('Transpose - Describe')\n",
|
694 |
+
"print(videoReview.describe().transpose())\n",
|
695 |
+
"\n"
|
696 |
+
]
|
697 |
+
},
|
698 |
+
{
|
699 |
+
"cell_type": "code",
|
700 |
+
"execution_count": 28,
|
701 |
+
"metadata": {},
|
702 |
+
"outputs": [
|
703 |
+
{
|
704 |
+
"name": "stdout",
|
705 |
+
"output_type": "stream",
|
706 |
+
"text": [
|
707 |
+
"----------\n",
|
708 |
+
"Iterate some rows from DF\n",
|
709 |
+
"#### Printing row ####\n",
|
710 |
+
"Wii Sports\n",
|
711 |
+
"#### Printing row ####\n",
|
712 |
+
"Super Mario Bros.\n",
|
713 |
+
"#### Printing row ####\n",
|
714 |
+
"Mario Kart Wii\n",
|
715 |
+
"#### Printing row ####\n",
|
716 |
+
"Wii Sports Resort\n",
|
717 |
+
"#### Printing row ####\n",
|
718 |
+
"Pokemon Red/Pokemon Blue\n",
|
719 |
+
"#### Printing row ####\n",
|
720 |
+
"Tetris\n",
|
721 |
+
"#### Printing row ####\n",
|
722 |
+
"New Super Mario Bros.\n",
|
723 |
+
"#### Printing row ####\n",
|
724 |
+
"Wii Play\n",
|
725 |
+
"#### Printing row ####\n",
|
726 |
+
"New Super Mario Bros. Wii\n"
|
727 |
+
]
|
728 |
+
}
|
729 |
+
],
|
730 |
+
"source": [
|
731 |
+
"print('----------')\n",
|
732 |
+
"print('Iterate some rows from DF')\n",
|
733 |
+
"for i, row in videoReview[:9].iterrows():\n",
|
734 |
+
" print('#### Printing row ####')\n",
|
735 |
+
" print(row['Name'])\n",
|
736 |
+
"\n"
|
737 |
+
]
|
738 |
+
},
|
739 |
+
{
|
740 |
+
"cell_type": "code",
|
741 |
+
"execution_count": 29,
|
742 |
+
"metadata": {},
|
743 |
+
"outputs": [
|
744 |
+
{
|
745 |
+
"name": "stdout",
|
746 |
+
"output_type": "stream",
|
747 |
+
"text": [
|
748 |
+
"----------\n",
|
749 |
+
"Group by Year, Platform by Count : \n",
|
750 |
+
" Name Genre Publisher NA_Sales EU_Sales \\\n",
|
751 |
+
"Year_of_Release Platform \n",
|
752 |
+
"1980.0 2600 9 9 9 9 9 \n",
|
753 |
+
"1981.0 2600 46 46 46 46 46 \n",
|
754 |
+
"1982.0 2600 36 36 36 36 36 \n",
|
755 |
+
"1983.0 2600 11 11 11 11 11 \n",
|
756 |
+
" NES 6 6 6 6 6 \n",
|
757 |
+
"... ... ... ... ... ... \n",
|
758 |
+
"2016.0 X360 13 13 13 13 13 \n",
|
759 |
+
" XOne 87 87 87 87 87 \n",
|
760 |
+
"2017.0 PS4 1 1 1 1 1 \n",
|
761 |
+
" PSV 2 2 2 2 2 \n",
|
762 |
+
"2020.0 DS 1 1 1 1 1 \n",
|
763 |
+
"\n",
|
764 |
+
" JP_Sales Other_Sales Global_Sales Critic_Score \\\n",
|
765 |
+
"Year_of_Release Platform \n",
|
766 |
+
"1980.0 2600 9 9 9 0 \n",
|
767 |
+
"1981.0 2600 46 46 46 0 \n",
|
768 |
+
"1982.0 2600 36 36 36 0 \n",
|
769 |
+
"1983.0 2600 11 11 11 0 \n",
|
770 |
+
" NES 6 6 6 0 \n",
|
771 |
+
"... ... ... ... ... \n",
|
772 |
+
"2016.0 X360 13 13 13 0 \n",
|
773 |
+
" XOne 87 87 87 60 \n",
|
774 |
+
"2017.0 PS4 1 1 1 0 \n",
|
775 |
+
" PSV 2 2 2 0 \n",
|
776 |
+
"2020.0 DS 1 1 1 0 \n",
|
777 |
+
"\n",
|
778 |
+
" Critic_Count User_Score User_Count Developer \\\n",
|
779 |
+
"Year_of_Release Platform \n",
|
780 |
+
"1980.0 2600 0 0 0 0 \n",
|
781 |
+
"1981.0 2600 0 0 0 0 \n",
|
782 |
+
"1982.0 2600 0 0 0 0 \n",
|
783 |
+
"1983.0 2600 0 0 0 0 \n",
|
784 |
+
" NES 0 0 0 0 \n",
|
785 |
+
"... ... ... ... ... \n",
|
786 |
+
"2016.0 X360 0 12 7 12 \n",
|
787 |
+
" XOne 60 74 66 74 \n",
|
788 |
+
"2017.0 PS4 0 0 0 0 \n",
|
789 |
+
" PSV 0 0 0 0 \n",
|
790 |
+
"2020.0 DS 0 1 0 1 \n",
|
791 |
+
"\n",
|
792 |
+
" Rating \n",
|
793 |
+
"Year_of_Release Platform \n",
|
794 |
+
"1980.0 2600 0 \n",
|
795 |
+
"1981.0 2600 0 \n",
|
796 |
+
"1982.0 2600 0 \n",
|
797 |
+
"1983.0 2600 0 \n",
|
798 |
+
" NES 0 \n",
|
799 |
+
"... ... \n",
|
800 |
+
"2016.0 X360 12 \n",
|
801 |
+
" XOne 71 \n",
|
802 |
+
"2017.0 PS4 0 \n",
|
803 |
+
" PSV 0 \n",
|
804 |
+
"2020.0 DS 1 \n",
|
805 |
+
"\n",
|
806 |
+
"[241 rows x 14 columns]\n"
|
807 |
+
]
|
808 |
+
}
|
809 |
+
],
|
810 |
+
"source": [
|
811 |
+
"#Subsetting and ordering Pandas\n",
|
812 |
+
"print('----------')\n",
|
813 |
+
"print('Group by Year, Platform by Count : ')\n",
|
814 |
+
"print(videoReview.groupby(['Year_of_Release','Platform']).count())\n",
|
815 |
+
"\n"
|
816 |
+
]
|
817 |
+
},
|
818 |
+
{
|
819 |
+
"cell_type": "code",
|
820 |
+
"execution_count": 30,
|
821 |
+
"metadata": {},
|
822 |
+
"outputs": [
|
823 |
+
{
|
824 |
+
"name": "stdout",
|
825 |
+
"output_type": "stream",
|
826 |
+
"text": [
|
827 |
+
"Group by : for Year=2016 group by Platform and count Global Sales\n",
|
828 |
+
"Platform\n",
|
829 |
+
"3DS 15.14\n",
|
830 |
+
"PC 5.27\n",
|
831 |
+
"PS3 3.58\n",
|
832 |
+
"PS4 69.29\n",
|
833 |
+
"PSV 4.27\n",
|
834 |
+
"Wii 0.18\n",
|
835 |
+
"WiiU 4.58\n",
|
836 |
+
"X360 1.52\n",
|
837 |
+
"XOne 26.27\n",
|
838 |
+
"Name: Global_Sales, dtype: float64\n"
|
839 |
+
]
|
840 |
+
}
|
841 |
+
],
|
842 |
+
"source": [
|
843 |
+
"print('Group by : for Year=2016 group by Platform and count Global Sales')\n",
|
844 |
+
"print(videoReview[videoReview.Year_of_Release==2016.0].groupby('Platform')['Global_Sales'].sum())"
|
845 |
+
]
|
846 |
+
},
|
847 |
+
{
|
848 |
+
"cell_type": "code",
|
849 |
+
"execution_count": 31,
|
850 |
+
"metadata": {},
|
851 |
+
"outputs": [
|
852 |
+
{
|
853 |
+
"name": "stdout",
|
854 |
+
"output_type": "stream",
|
855 |
+
"text": [
|
856 |
+
"----------\n",
|
857 |
+
"Sorting and Ordering\n",
|
858 |
+
" Name Platform Year_of_Release Genre Publisher \\\n",
|
859 |
+
"15 Wii Fit Plus Wii 2009.0 Sports Nintendo \n",
|
860 |
+
"8 New Super Mario Bros. Wii Wii 2009.0 Platform Nintendo \n",
|
861 |
+
"7 Wii Play Wii 2006.0 Misc Nintendo \n",
|
862 |
+
"3 Wii Sports Resort Wii 2009.0 Sports Nintendo \n",
|
863 |
+
"2 Mario Kart Wii Wii 2008.0 Racing Nintendo \n",
|
864 |
+
"0 Wii Sports Wii 2006.0 Sports Nintendo \n",
|
865 |
+
"\n",
|
866 |
+
" NA_Sales EU_Sales JP_Sales Other_Sales Global_Sales Critic_Score \\\n",
|
867 |
+
"15 9.01 8.49 2.53 1.77 21.79 80.0 \n",
|
868 |
+
"8 14.44 6.94 4.70 2.24 28.32 87.0 \n",
|
869 |
+
"7 13.96 9.18 2.93 2.84 28.92 58.0 \n",
|
870 |
+
"3 15.61 10.93 3.28 2.95 32.77 80.0 \n",
|
871 |
+
"2 15.68 12.76 3.79 3.29 35.52 82.0 \n",
|
872 |
+
"0 41.36 28.96 3.77 8.45 82.53 76.0 \n",
|
873 |
+
"\n",
|
874 |
+
" Critic_Count User_Score User_Count Developer Rating \n",
|
875 |
+
"15 33.0 7.4 52.0 Nintendo E \n",
|
876 |
+
"8 80.0 8.4 594.0 Nintendo E \n",
|
877 |
+
"7 41.0 6.6 129.0 Nintendo E \n",
|
878 |
+
"3 73.0 8 192.0 Nintendo E \n",
|
879 |
+
"2 73.0 8.3 709.0 Nintendo E \n",
|
880 |
+
"0 51.0 8 322.0 Nintendo E \n"
|
881 |
+
]
|
882 |
+
}
|
883 |
+
],
|
884 |
+
"source": [
|
885 |
+
"#More Panda functions - Sorting & Ordering\n",
|
886 |
+
"print('----------')\n",
|
887 |
+
"print('Sorting and Ordering')\n",
|
888 |
+
"df = videoReview[(videoReview.Platform=='Wii') & (videoReview.NA_Sales>9)]\n",
|
889 |
+
"print(df.sort_values('Global_Sales', ascending=True))"
|
890 |
+
]
|
891 |
+
},
|
892 |
+
{
|
893 |
+
"cell_type": "code",
|
894 |
+
"execution_count": 34,
|
895 |
+
"metadata": {},
|
896 |
+
"outputs": [],
|
897 |
+
"source": [
|
898 |
+
"#Writing a Panda to a CSV file\n",
|
899 |
+
"df.to_csv('/Users/Rafael/video_games_wii.csv', sep=',')"
|
900 |
+
]
|
901 |
+
},
|
902 |
+
{
|
903 |
+
"cell_type": "code",
|
904 |
+
"execution_count": 37,
|
905 |
+
"metadata": {},
|
906 |
+
"outputs": [
|
907 |
+
{
|
908 |
+
"name": "stdout",
|
909 |
+
"output_type": "stream",
|
910 |
+
"text": [
|
911 |
+
" Unnamed: 0 Name Platform Year_of_Release Genre \\\n",
|
912 |
+
"0 0 Wii Sports Wii 2006.0 Sports \n",
|
913 |
+
"1 2 Mario Kart Wii Wii 2008.0 Racing \n",
|
914 |
+
"2 3 Wii Sports Resort Wii 2009.0 Sports \n",
|
915 |
+
"3 7 Wii Play Wii 2006.0 Misc \n",
|
916 |
+
"4 8 New Super Mario Bros. Wii Wii 2009.0 Platform \n",
|
917 |
+
"\n",
|
918 |
+
" Publisher NA_Sales EU_Sales JP_Sales Other_Sales Global_Sales \\\n",
|
919 |
+
"0 Nintendo 41.36 28.96 3.77 8.45 82.53 \n",
|
920 |
+
"1 Nintendo 15.68 12.76 3.79 3.29 35.52 \n",
|
921 |
+
"2 Nintendo 15.61 10.93 3.28 2.95 32.77 \n",
|
922 |
+
"3 Nintendo 13.96 9.18 2.93 2.84 28.92 \n",
|
923 |
+
"4 Nintendo 14.44 6.94 4.70 2.24 28.32 \n",
|
924 |
+
"\n",
|
925 |
+
" Critic_Score Critic_Count User_Score User_Count Developer Rating \n",
|
926 |
+
"0 76.0 51.0 8.0 322.0 Nintendo E \n",
|
927 |
+
"1 82.0 73.0 8.3 709.0 Nintendo E \n",
|
928 |
+
"2 80.0 73.0 8.0 192.0 Nintendo E \n",
|
929 |
+
"3 58.0 41.0 6.6 129.0 Nintendo E \n",
|
930 |
+
"4 87.0 80.0 8.4 594.0 Nintendo E \n"
|
931 |
+
]
|
932 |
+
}
|
933 |
+
],
|
934 |
+
"source": [
|
935 |
+
"#Go inspect the CSV file created.\n",
|
936 |
+
"#Is it what you expected?\n",
|
937 |
+
"#Could the output be formatted differently?\n",
|
938 |
+
"#If so, look up the Pandas 'to_csv' function to see what you can change\n",
|
939 |
+
"df_loaded = pd.read_csv('/Users/Rafael/video_games_wii.csv', sep=',');\n",
|
940 |
+
"print(df_loaded.head())"
|
941 |
+
]
|
942 |
+
},
|
943 |
+
{
|
944 |
+
"cell_type": "code",
|
945 |
+
"execution_count": 41,
|
946 |
+
"metadata": {},
|
947 |
+
"outputs": [
|
948 |
+
{
|
949 |
+
"ename": "NameError",
|
950 |
+
"evalue": "name 'Get' is not defined",
|
951 |
+
"output_type": "error",
|
952 |
+
"traceback": [
|
953 |
+
"\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
|
954 |
+
"\u001b[1;31mNameError\u001b[0m Traceback (most recent call last)",
|
955 |
+
"Cell \u001b[1;32mIn[41], line 1\u001b[0m\n\u001b[1;32m----> 1\u001b[0m Get\u001b[38;5;241m-\u001b[39mContent \u001b[38;5;241m/\u001b[39mUsers\u001b[38;5;241m/\u001b[39mRafael\u001b[38;5;241m/\u001b[39mvideo_games_wii\u001b[38;5;241m.\u001b[39mcsv\n",
|
956 |
+
"\u001b[1;31mNameError\u001b[0m: name 'Get' is not defined"
|
957 |
+
]
|
958 |
+
}
|
959 |
+
],
|
960 |
+
"source": [
|
961 |
+
"Get-Content /Users/Rafael/video_games_wii.csv"
|
962 |
+
]
|
963 |
+
},
|
964 |
+
{
|
965 |
+
"cell_type": "code",
|
966 |
+
"execution_count": 40,
|
967 |
+
"metadata": {},
|
968 |
+
"outputs": [],
|
969 |
+
"source": [
|
970 |
+
"df.to_csv('/Users/Rafael/video_games_wii.csv', sep=',', index=False)"
|
971 |
+
]
|
972 |
+
},
|
973 |
+
{
|
974 |
+
"cell_type": "code",
|
975 |
+
"execution_count": 42,
|
976 |
+
"metadata": {},
|
977 |
+
"outputs": [
|
978 |
+
{
|
979 |
+
"name": "stdout",
|
980 |
+
"output_type": "stream",
|
981 |
+
"text": [
|
982 |
+
"----------\n",
|
983 |
+
"Plotting - Histogram\n"
|
984 |
+
]
|
985 |
+
},
|
986 |
+
{
|
987 |
+
"data": {
|
988 |
+
"text/plain": [
|
989 |
+
"<Axes: ylabel='Frequency'>"
|
990 |
+
]
|
991 |
+
},
|
992 |
+
"execution_count": 42,
|
993 |
+
"metadata": {},
|
994 |
+
"output_type": "execute_result"
|
995 |
+
},
|
996 |
+
{
|
997 |
+
"data": {
|
998 |
+
"image/png": "",
|
999 |
+
"text/plain": [
|
1000 |
+
"<Figure size 640x480 with 1 Axes>"
|
1001 |
+
]
|
1002 |
+
},
|
1003 |
+
"metadata": {},
|
1004 |
+
"output_type": "display_data"
|
1005 |
+
}
|
1006 |
+
],
|
1007 |
+
"source": [
|
1008 |
+
"#Creating Graphs for a Panda\n",
|
1009 |
+
"#plotting\n",
|
1010 |
+
"print('----------')\n",
|
1011 |
+
"print('Plotting - Histogram')\n",
|
1012 |
+
"videoReview['Year_of_Release'].plot(kind='hist')"
|
1013 |
+
]
|
1014 |
+
},
|
1015 |
+
{
|
1016 |
+
"cell_type": "code",
|
1017 |
+
"execution_count": null,
|
1018 |
+
"metadata": {},
|
1019 |
+
"outputs": [],
|
1020 |
+
"source": [
|
1021 |
+
"#If the chart does not appear in the above cell, just go back to that cell and rerun. It should appear now."
|
1022 |
+
]
|
1023 |
+
},
|
1024 |
+
{
|
1025 |
+
"cell_type": "code",
|
1026 |
+
"execution_count": null,
|
1027 |
+
"metadata": {},
|
1028 |
+
"outputs": [],
|
1029 |
+
"source": [
|
1030 |
+
"#Can you create any other plots?"
|
1031 |
+
]
|
1032 |
+
},
|
1033 |
+
{
|
1034 |
+
"cell_type": "code",
|
1035 |
+
"execution_count": null,
|
1036 |
+
"metadata": {},
|
1037 |
+
"outputs": [],
|
1038 |
+
"source": []
|
1039 |
+
},
|
1040 |
+
{
|
1041 |
+
"cell_type": "code",
|
1042 |
+
"execution_count": null,
|
1043 |
+
"metadata": {},
|
1044 |
+
"outputs": [],
|
1045 |
+
"source": []
|
1046 |
+
},
|
1047 |
+
{
|
1048 |
+
"cell_type": "code",
|
1049 |
+
"execution_count": null,
|
1050 |
+
"metadata": {},
|
1051 |
+
"outputs": [],
|
1052 |
+
"source": []
|
1053 |
+
},
|
1054 |
+
{
|
1055 |
+
"cell_type": "code",
|
1056 |
+
"execution_count": null,
|
1057 |
+
"metadata": {},
|
1058 |
+
"outputs": [],
|
1059 |
+
"source": []
|
1060 |
+
}
|
1061 |
+
],
|
1062 |
+
"metadata": {
|
1063 |
+
"kernelspec": {
|
1064 |
+
"display_name": "Python 3 (ipykernel)",
|
1065 |
+
"language": "python",
|
1066 |
+
"name": "python3"
|
1067 |
+
},
|
1068 |
+
"language_info": {
|
1069 |
+
"codemirror_mode": {
|
1070 |
+
"name": "ipython",
|
1071 |
+
"version": 3
|
1072 |
+
},
|
1073 |
+
"file_extension": ".py",
|
1074 |
+
"mimetype": "text/x-python",
|
1075 |
+
"name": "python",
|
1076 |
+
"nbconvert_exporter": "python",
|
1077 |
+
"pygments_lexer": "ipython3",
|
1078 |
+
"version": "3.12.9"
|
1079 |
+
}
|
1080 |
+
},
|
1081 |
+
"nbformat": 4,
|
1082 |
+
"nbformat_minor": 4
|
1083 |
+
}
|
Data Analitics/Week 3/TU257_Lab1-Introduction.ipynb
ADDED
@@ -0,0 +1,235 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"cells": [
|
3 |
+
{
|
4 |
+
"cell_type": "code",
|
5 |
+
"execution_count": 1,
|
6 |
+
"metadata": {},
|
7 |
+
"outputs": [
|
8 |
+
{
|
9 |
+
"name": "stdout",
|
10 |
+
"output_type": "stream",
|
11 |
+
"text": [
|
12 |
+
"3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 16:52:21) \n",
|
13 |
+
"[Clang 6.0 (clang-600.0.57)]\n",
|
14 |
+
"sys.version_info(major=3, minor=7, micro=3, releaselevel='final', serial=0)\n"
|
15 |
+
]
|
16 |
+
}
|
17 |
+
],
|
18 |
+
"source": [
|
19 |
+
"import sys\n",
|
20 |
+
"import platform\n",
|
21 |
+
"\n",
|
22 |
+
"#Print Python version details\n",
|
23 |
+
"print(sys.version)\n",
|
24 |
+
"print(sys.version_info)\n"
|
25 |
+
]
|
26 |
+
},
|
27 |
+
{
|
28 |
+
"cell_type": "code",
|
29 |
+
"execution_count": 2,
|
30 |
+
"metadata": {},
|
31 |
+
"outputs": [
|
32 |
+
{
|
33 |
+
"name": "stdout",
|
34 |
+
"output_type": "stream",
|
35 |
+
"text": [
|
36 |
+
"x86_64\n",
|
37 |
+
"Darwin Kernel Version 19.6.0: Tue Jun 21 21:18:39 PDT 2022; root:xnu-6153.141.66~1/RELEASE_X86_64\n",
|
38 |
+
"Darwin\n",
|
39 |
+
"i386\n"
|
40 |
+
]
|
41 |
+
}
|
42 |
+
],
|
43 |
+
"source": [
|
44 |
+
"#Print details about your computer\n",
|
45 |
+
"\n",
|
46 |
+
"print(platform.machine())\n",
|
47 |
+
"print(platform.version())\n",
|
48 |
+
"print(platform.system())\n",
|
49 |
+
"print(platform.processor())"
|
50 |
+
]
|
51 |
+
},
|
52 |
+
{
|
53 |
+
"cell_type": "code",
|
54 |
+
"execution_count": null,
|
55 |
+
"metadata": {},
|
56 |
+
"outputs": [],
|
57 |
+
"source": [
|
58 |
+
"#Write code to print your name"
|
59 |
+
]
|
60 |
+
},
|
61 |
+
{
|
62 |
+
"cell_type": "code",
|
63 |
+
"execution_count": null,
|
64 |
+
"metadata": {},
|
65 |
+
"outputs": [],
|
66 |
+
"source": []
|
67 |
+
},
|
68 |
+
{
|
69 |
+
"cell_type": "code",
|
70 |
+
"execution_count": null,
|
71 |
+
"metadata": {},
|
72 |
+
"outputs": [],
|
73 |
+
"source": [
|
74 |
+
"#Crate a variable containing your name, and print it to the screen"
|
75 |
+
]
|
76 |
+
},
|
77 |
+
{
|
78 |
+
"cell_type": "code",
|
79 |
+
"execution_count": null,
|
80 |
+
"metadata": {},
|
81 |
+
"outputs": [],
|
82 |
+
"source": []
|
83 |
+
},
|
84 |
+
{
|
85 |
+
"cell_type": "code",
|
86 |
+
"execution_count": null,
|
87 |
+
"metadata": {},
|
88 |
+
"outputs": [],
|
89 |
+
"source": [
|
90 |
+
"#Use the input command to allow a user to enter a value\n",
|
91 |
+
"#for eample, use the input command to ask the use to enter a string\n",
|
92 |
+
"#save the inputted value to a variable\n",
|
93 |
+
"#print the variable"
|
94 |
+
]
|
95 |
+
},
|
96 |
+
{
|
97 |
+
"cell_type": "code",
|
98 |
+
"execution_count": null,
|
99 |
+
"metadata": {},
|
100 |
+
"outputs": [],
|
101 |
+
"source": []
|
102 |
+
},
|
103 |
+
{
|
104 |
+
"cell_type": "code",
|
105 |
+
"execution_count": null,
|
106 |
+
"metadata": {},
|
107 |
+
"outputs": [],
|
108 |
+
"source": [
|
109 |
+
"#Use an IF condition to decide what mesage to print\n",
|
110 |
+
"#Use the input command to ask the use to enter a number\n",
|
111 |
+
"#Use the IF condition to determine if the number if Postive or Negative"
|
112 |
+
]
|
113 |
+
},
|
114 |
+
{
|
115 |
+
"cell_type": "code",
|
116 |
+
"execution_count": null,
|
117 |
+
"metadata": {},
|
118 |
+
"outputs": [],
|
119 |
+
"source": []
|
120 |
+
},
|
121 |
+
{
|
122 |
+
"cell_type": "code",
|
123 |
+
"execution_count": null,
|
124 |
+
"metadata": {},
|
125 |
+
"outputs": [],
|
126 |
+
"source": [
|
127 |
+
"#Add an 'else' condition to the above IF condition\n",
|
128 |
+
"#You can decide what the statement should be, what message should be printed?\n"
|
129 |
+
]
|
130 |
+
},
|
131 |
+
{
|
132 |
+
"cell_type": "code",
|
133 |
+
"execution_count": null,
|
134 |
+
"metadata": {},
|
135 |
+
"outputs": [],
|
136 |
+
"source": []
|
137 |
+
},
|
138 |
+
{
|
139 |
+
"cell_type": "code",
|
140 |
+
"execution_count": 2,
|
141 |
+
"metadata": {},
|
142 |
+
"outputs": [
|
143 |
+
{
|
144 |
+
"name": "stdout",
|
145 |
+
"output_type": "stream",
|
146 |
+
"text": [
|
147 |
+
"0\n",
|
148 |
+
"1\n",
|
149 |
+
"2\n",
|
150 |
+
"3\n",
|
151 |
+
"4\n"
|
152 |
+
]
|
153 |
+
}
|
154 |
+
],
|
155 |
+
"source": [
|
156 |
+
"#the following will print a range of values\n",
|
157 |
+
"for i in range(0,5):\n",
|
158 |
+
" print(i)"
|
159 |
+
]
|
160 |
+
},
|
161 |
+
{
|
162 |
+
"cell_type": "code",
|
163 |
+
"execution_count": null,
|
164 |
+
"metadata": {},
|
165 |
+
"outputs": [],
|
166 |
+
"source": [
|
167 |
+
"#Take a copy of this code and paste it into the next cell\n",
|
168 |
+
"#Change the code to add two variables to contain a number, assign a number to these variables\n",
|
169 |
+
"#Change the code in the 'for' loop to print the numbers between the values in the two variables"
|
170 |
+
]
|
171 |
+
},
|
172 |
+
{
|
173 |
+
"cell_type": "code",
|
174 |
+
"execution_count": null,
|
175 |
+
"metadata": {},
|
176 |
+
"outputs": [],
|
177 |
+
"source": []
|
178 |
+
},
|
179 |
+
{
|
180 |
+
"cell_type": "code",
|
181 |
+
"execution_count": null,
|
182 |
+
"metadata": {},
|
183 |
+
"outputs": [],
|
184 |
+
"source": [
|
185 |
+
"#Take a copy of the code in the previous cell\n",
|
186 |
+
"#Expand it to ask the user to input two values.\n",
|
187 |
+
"#Change the loop to print the values between these two values"
|
188 |
+
]
|
189 |
+
},
|
190 |
+
{
|
191 |
+
"cell_type": "code",
|
192 |
+
"execution_count": null,
|
193 |
+
"metadata": {},
|
194 |
+
"outputs": [],
|
195 |
+
"source": []
|
196 |
+
},
|
197 |
+
{
|
198 |
+
"cell_type": "code",
|
199 |
+
"execution_count": null,
|
200 |
+
"metadata": {},
|
201 |
+
"outputs": [],
|
202 |
+
"source": [
|
203 |
+
"#what else can you do?"
|
204 |
+
]
|
205 |
+
},
|
206 |
+
{
|
207 |
+
"cell_type": "code",
|
208 |
+
"execution_count": null,
|
209 |
+
"metadata": {},
|
210 |
+
"outputs": [],
|
211 |
+
"source": []
|
212 |
+
}
|
213 |
+
],
|
214 |
+
"metadata": {
|
215 |
+
"kernelspec": {
|
216 |
+
"display_name": "Python 3",
|
217 |
+
"language": "python",
|
218 |
+
"name": "python3"
|
219 |
+
},
|
220 |
+
"language_info": {
|
221 |
+
"codemirror_mode": {
|
222 |
+
"name": "ipython",
|
223 |
+
"version": 3
|
224 |
+
},
|
225 |
+
"file_extension": ".py",
|
226 |
+
"mimetype": "text/x-python",
|
227 |
+
"name": "python",
|
228 |
+
"nbconvert_exporter": "python",
|
229 |
+
"pygments_lexer": "ipython3",
|
230 |
+
"version": "3.7.3"
|
231 |
+
}
|
232 |
+
},
|
233 |
+
"nbformat": 4,
|
234 |
+
"nbformat_minor": 4
|
235 |
+
}
|
Data Analitics/Week 3/Video_Games_Sales_as_at_22_Dec_2016.csv
ADDED
The diff for this file is too large to render.
See raw diff
|
|
Data Analitics/Week 3/train.csv
ADDED
@@ -0,0 +1,892 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
|
2 |
+
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S
|
3 |
+
2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38,1,0,PC 17599,71.2833,C85,C
|
4 |
+
3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2. 3101282,7.925,,S
|
5 |
+
4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35,1,0,113803,53.1,C123,S
|
6 |
+
5,0,3,"Allen, Mr. William Henry",male,35,0,0,373450,8.05,,S
|
7 |
+
6,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q
|
8 |
+
7,0,1,"McCarthy, Mr. Timothy J",male,54,0,0,17463,51.8625,E46,S
|
9 |
+
8,0,3,"Palsson, Master. Gosta Leonard",male,2,3,1,349909,21.075,,S
|
10 |
+
9,1,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female,27,0,2,347742,11.1333,,S
|
11 |
+
10,1,2,"Nasser, Mrs. Nicholas (Adele Achem)",female,14,1,0,237736,30.0708,,C
|
12 |
+
11,1,3,"Sandstrom, Miss. Marguerite Rut",female,4,1,1,PP 9549,16.7,G6,S
|
13 |
+
12,1,1,"Bonnell, Miss. Elizabeth",female,58,0,0,113783,26.55,C103,S
|
14 |
+
13,0,3,"Saundercock, Mr. William Henry",male,20,0,0,A/5. 2151,8.05,,S
|
15 |
+
14,0,3,"Andersson, Mr. Anders Johan",male,39,1,5,347082,31.275,,S
|
16 |
+
15,0,3,"Vestrom, Miss. Hulda Amanda Adolfina",female,14,0,0,350406,7.8542,,S
|
17 |
+
16,1,2,"Hewlett, Mrs. (Mary D Kingcome) ",female,55,0,0,248706,16,,S
|
18 |
+
17,0,3,"Rice, Master. Eugene",male,2,4,1,382652,29.125,,Q
|
19 |
+
18,1,2,"Williams, Mr. Charles Eugene",male,,0,0,244373,13,,S
|
20 |
+
19,0,3,"Vander Planke, Mrs. Julius (Emelia Maria Vandemoortele)",female,31,1,0,345763,18,,S
|
21 |
+
20,1,3,"Masselmani, Mrs. Fatima",female,,0,0,2649,7.225,,C
|
22 |
+
21,0,2,"Fynney, Mr. Joseph J",male,35,0,0,239865,26,,S
|
23 |
+
22,1,2,"Beesley, Mr. Lawrence",male,34,0,0,248698,13,D56,S
|
24 |
+
23,1,3,"McGowan, Miss. Anna ""Annie""",female,15,0,0,330923,8.0292,,Q
|
25 |
+
24,1,1,"Sloper, Mr. William Thompson",male,28,0,0,113788,35.5,A6,S
|
26 |
+
25,0,3,"Palsson, Miss. Torborg Danira",female,8,3,1,349909,21.075,,S
|
27 |
+
26,1,3,"Asplund, Mrs. Carl Oscar (Selma Augusta Emilia Johansson)",female,38,1,5,347077,31.3875,,S
|
28 |
+
27,0,3,"Emir, Mr. Farred Chehab",male,,0,0,2631,7.225,,C
|
29 |
+
28,0,1,"Fortune, Mr. Charles Alexander",male,19,3,2,19950,263,C23 C25 C27,S
|
30 |
+
29,1,3,"O'Dwyer, Miss. Ellen ""Nellie""",female,,0,0,330959,7.8792,,Q
|
31 |
+
30,0,3,"Todoroff, Mr. Lalio",male,,0,0,349216,7.8958,,S
|
32 |
+
31,0,1,"Uruchurtu, Don. Manuel E",male,40,0,0,PC 17601,27.7208,,C
|
33 |
+
32,1,1,"Spencer, Mrs. William Augustus (Marie Eugenie)",female,,1,0,PC 17569,146.5208,B78,C
|
34 |
+
33,1,3,"Glynn, Miss. Mary Agatha",female,,0,0,335677,7.75,,Q
|
35 |
+
34,0,2,"Wheadon, Mr. Edward H",male,66,0,0,C.A. 24579,10.5,,S
|
36 |
+
35,0,1,"Meyer, Mr. Edgar Joseph",male,28,1,0,PC 17604,82.1708,,C
|
37 |
+
36,0,1,"Holverson, Mr. Alexander Oskar",male,42,1,0,113789,52,,S
|
38 |
+
37,1,3,"Mamee, Mr. Hanna",male,,0,0,2677,7.2292,,C
|
39 |
+
38,0,3,"Cann, Mr. Ernest Charles",male,21,0,0,A./5. 2152,8.05,,S
|
40 |
+
39,0,3,"Vander Planke, Miss. Augusta Maria",female,18,2,0,345764,18,,S
|
41 |
+
40,1,3,"Nicola-Yarred, Miss. Jamila",female,14,1,0,2651,11.2417,,C
|
42 |
+
41,0,3,"Ahlin, Mrs. Johan (Johanna Persdotter Larsson)",female,40,1,0,7546,9.475,,S
|
43 |
+
42,0,2,"Turpin, Mrs. William John Robert (Dorothy Ann Wonnacott)",female,27,1,0,11668,21,,S
|
44 |
+
43,0,3,"Kraeff, Mr. Theodor",male,,0,0,349253,7.8958,,C
|
45 |
+
44,1,2,"Laroche, Miss. Simonne Marie Anne Andree",female,3,1,2,SC/Paris 2123,41.5792,,C
|
46 |
+
45,1,3,"Devaney, Miss. Margaret Delia",female,19,0,0,330958,7.8792,,Q
|
47 |
+
46,0,3,"Rogers, Mr. William John",male,,0,0,S.C./A.4. 23567,8.05,,S
|
48 |
+
47,0,3,"Lennon, Mr. Denis",male,,1,0,370371,15.5,,Q
|
49 |
+
48,1,3,"O'Driscoll, Miss. Bridget",female,,0,0,14311,7.75,,Q
|
50 |
+
49,0,3,"Samaan, Mr. Youssef",male,,2,0,2662,21.6792,,C
|
51 |
+
50,0,3,"Arnold-Franchi, Mrs. Josef (Josefine Franchi)",female,18,1,0,349237,17.8,,S
|
52 |
+
51,0,3,"Panula, Master. Juha Niilo",male,7,4,1,3101295,39.6875,,S
|
53 |
+
52,0,3,"Nosworthy, Mr. Richard Cater",male,21,0,0,A/4. 39886,7.8,,S
|
54 |
+
53,1,1,"Harper, Mrs. Henry Sleeper (Myna Haxtun)",female,49,1,0,PC 17572,76.7292,D33,C
|
55 |
+
54,1,2,"Faunthorpe, Mrs. Lizzie (Elizabeth Anne Wilkinson)",female,29,1,0,2926,26,,S
|
56 |
+
55,0,1,"Ostby, Mr. Engelhart Cornelius",male,65,0,1,113509,61.9792,B30,C
|
57 |
+
56,1,1,"Woolner, Mr. Hugh",male,,0,0,19947,35.5,C52,S
|
58 |
+
57,1,2,"Rugg, Miss. Emily",female,21,0,0,C.A. 31026,10.5,,S
|
59 |
+
58,0,3,"Novel, Mr. Mansouer",male,28.5,0,0,2697,7.2292,,C
|
60 |
+
59,1,2,"West, Miss. Constance Mirium",female,5,1,2,C.A. 34651,27.75,,S
|
61 |
+
60,0,3,"Goodwin, Master. William Frederick",male,11,5,2,CA 2144,46.9,,S
|
62 |
+
61,0,3,"Sirayanian, Mr. Orsen",male,22,0,0,2669,7.2292,,C
|
63 |
+
62,1,1,"Icard, Miss. Amelie",female,38,0,0,113572,80,B28,
|
64 |
+
63,0,1,"Harris, Mr. Henry Birkhardt",male,45,1,0,36973,83.475,C83,S
|
65 |
+
64,0,3,"Skoog, Master. Harald",male,4,3,2,347088,27.9,,S
|
66 |
+
65,0,1,"Stewart, Mr. Albert A",male,,0,0,PC 17605,27.7208,,C
|
67 |
+
66,1,3,"Moubarek, Master. Gerios",male,,1,1,2661,15.2458,,C
|
68 |
+
67,1,2,"Nye, Mrs. (Elizabeth Ramell)",female,29,0,0,C.A. 29395,10.5,F33,S
|
69 |
+
68,0,3,"Crease, Mr. Ernest James",male,19,0,0,S.P. 3464,8.1583,,S
|
70 |
+
69,1,3,"Andersson, Miss. Erna Alexandra",female,17,4,2,3101281,7.925,,S
|
71 |
+
70,0,3,"Kink, Mr. Vincenz",male,26,2,0,315151,8.6625,,S
|
72 |
+
71,0,2,"Jenkin, Mr. Stephen Curnow",male,32,0,0,C.A. 33111,10.5,,S
|
73 |
+
72,0,3,"Goodwin, Miss. Lillian Amy",female,16,5,2,CA 2144,46.9,,S
|
74 |
+
73,0,2,"Hood, Mr. Ambrose Jr",male,21,0,0,S.O.C. 14879,73.5,,S
|
75 |
+
74,0,3,"Chronopoulos, Mr. Apostolos",male,26,1,0,2680,14.4542,,C
|
76 |
+
75,1,3,"Bing, Mr. Lee",male,32,0,0,1601,56.4958,,S
|
77 |
+
76,0,3,"Moen, Mr. Sigurd Hansen",male,25,0,0,348123,7.65,F G73,S
|
78 |
+
77,0,3,"Staneff, Mr. Ivan",male,,0,0,349208,7.8958,,S
|
79 |
+
78,0,3,"Moutal, Mr. Rahamin Haim",male,,0,0,374746,8.05,,S
|
80 |
+
79,1,2,"Caldwell, Master. Alden Gates",male,0.83,0,2,248738,29,,S
|
81 |
+
80,1,3,"Dowdell, Miss. Elizabeth",female,30,0,0,364516,12.475,,S
|
82 |
+
81,0,3,"Waelens, Mr. Achille",male,22,0,0,345767,9,,S
|
83 |
+
82,1,3,"Sheerlinck, Mr. Jan Baptist",male,29,0,0,345779,9.5,,S
|
84 |
+
83,1,3,"McDermott, Miss. Brigdet Delia",female,,0,0,330932,7.7875,,Q
|
85 |
+
84,0,1,"Carrau, Mr. Francisco M",male,28,0,0,113059,47.1,,S
|
86 |
+
85,1,2,"Ilett, Miss. Bertha",female,17,0,0,SO/C 14885,10.5,,S
|
87 |
+
86,1,3,"Backstrom, Mrs. Karl Alfred (Maria Mathilda Gustafsson)",female,33,3,0,3101278,15.85,,S
|
88 |
+
87,0,3,"Ford, Mr. William Neal",male,16,1,3,W./C. 6608,34.375,,S
|
89 |
+
88,0,3,"Slocovski, Mr. Selman Francis",male,,0,0,SOTON/OQ 392086,8.05,,S
|
90 |
+
89,1,1,"Fortune, Miss. Mabel Helen",female,23,3,2,19950,263,C23 C25 C27,S
|
91 |
+
90,0,3,"Celotti, Mr. Francesco",male,24,0,0,343275,8.05,,S
|
92 |
+
91,0,3,"Christmann, Mr. Emil",male,29,0,0,343276,8.05,,S
|
93 |
+
92,0,3,"Andreasson, Mr. Paul Edvin",male,20,0,0,347466,7.8542,,S
|
94 |
+
93,0,1,"Chaffee, Mr. Herbert Fuller",male,46,1,0,W.E.P. 5734,61.175,E31,S
|
95 |
+
94,0,3,"Dean, Mr. Bertram Frank",male,26,1,2,C.A. 2315,20.575,,S
|
96 |
+
95,0,3,"Coxon, Mr. Daniel",male,59,0,0,364500,7.25,,S
|
97 |
+
96,0,3,"Shorney, Mr. Charles Joseph",male,,0,0,374910,8.05,,S
|
98 |
+
97,0,1,"Goldschmidt, Mr. George B",male,71,0,0,PC 17754,34.6542,A5,C
|
99 |
+
98,1,1,"Greenfield, Mr. William Bertram",male,23,0,1,PC 17759,63.3583,D10 D12,C
|
100 |
+
99,1,2,"Doling, Mrs. John T (Ada Julia Bone)",female,34,0,1,231919,23,,S
|
101 |
+
100,0,2,"Kantor, Mr. Sinai",male,34,1,0,244367,26,,S
|
102 |
+
101,0,3,"Petranec, Miss. Matilda",female,28,0,0,349245,7.8958,,S
|
103 |
+
102,0,3,"Petroff, Mr. Pastcho (""Pentcho"")",male,,0,0,349215,7.8958,,S
|
104 |
+
103,0,1,"White, Mr. Richard Frasar",male,21,0,1,35281,77.2875,D26,S
|
105 |
+
104,0,3,"Johansson, Mr. Gustaf Joel",male,33,0,0,7540,8.6542,,S
|
106 |
+
105,0,3,"Gustafsson, Mr. Anders Vilhelm",male,37,2,0,3101276,7.925,,S
|
107 |
+
106,0,3,"Mionoff, Mr. Stoytcho",male,28,0,0,349207,7.8958,,S
|
108 |
+
107,1,3,"Salkjelsvik, Miss. Anna Kristine",female,21,0,0,343120,7.65,,S
|
109 |
+
108,1,3,"Moss, Mr. Albert Johan",male,,0,0,312991,7.775,,S
|
110 |
+
109,0,3,"Rekic, Mr. Tido",male,38,0,0,349249,7.8958,,S
|
111 |
+
110,1,3,"Moran, Miss. Bertha",female,,1,0,371110,24.15,,Q
|
112 |
+
111,0,1,"Porter, Mr. Walter Chamberlain",male,47,0,0,110465,52,C110,S
|
113 |
+
112,0,3,"Zabour, Miss. Hileni",female,14.5,1,0,2665,14.4542,,C
|
114 |
+
113,0,3,"Barton, Mr. David John",male,22,0,0,324669,8.05,,S
|
115 |
+
114,0,3,"Jussila, Miss. Katriina",female,20,1,0,4136,9.825,,S
|
116 |
+
115,0,3,"Attalah, Miss. Malake",female,17,0,0,2627,14.4583,,C
|
117 |
+
116,0,3,"Pekoniemi, Mr. Edvard",male,21,0,0,STON/O 2. 3101294,7.925,,S
|
118 |
+
117,0,3,"Connors, Mr. Patrick",male,70.5,0,0,370369,7.75,,Q
|
119 |
+
118,0,2,"Turpin, Mr. William John Robert",male,29,1,0,11668,21,,S
|
120 |
+
119,0,1,"Baxter, Mr. Quigg Edmond",male,24,0,1,PC 17558,247.5208,B58 B60,C
|
121 |
+
120,0,3,"Andersson, Miss. Ellis Anna Maria",female,2,4,2,347082,31.275,,S
|
122 |
+
121,0,2,"Hickman, Mr. Stanley George",male,21,2,0,S.O.C. 14879,73.5,,S
|
123 |
+
122,0,3,"Moore, Mr. Leonard Charles",male,,0,0,A4. 54510,8.05,,S
|
124 |
+
123,0,2,"Nasser, Mr. Nicholas",male,32.5,1,0,237736,30.0708,,C
|
125 |
+
124,1,2,"Webber, Miss. Susan",female,32.5,0,0,27267,13,E101,S
|
126 |
+
125,0,1,"White, Mr. Percival Wayland",male,54,0,1,35281,77.2875,D26,S
|
127 |
+
126,1,3,"Nicola-Yarred, Master. Elias",male,12,1,0,2651,11.2417,,C
|
128 |
+
127,0,3,"McMahon, Mr. Martin",male,,0,0,370372,7.75,,Q
|
129 |
+
128,1,3,"Madsen, Mr. Fridtjof Arne",male,24,0,0,C 17369,7.1417,,S
|
130 |
+
129,1,3,"Peter, Miss. Anna",female,,1,1,2668,22.3583,F E69,C
|
131 |
+
130,0,3,"Ekstrom, Mr. Johan",male,45,0,0,347061,6.975,,S
|
132 |
+
131,0,3,"Drazenoic, Mr. Jozef",male,33,0,0,349241,7.8958,,C
|
133 |
+
132,0,3,"Coelho, Mr. Domingos Fernandeo",male,20,0,0,SOTON/O.Q. 3101307,7.05,,S
|
134 |
+
133,0,3,"Robins, Mrs. Alexander A (Grace Charity Laury)",female,47,1,0,A/5. 3337,14.5,,S
|
135 |
+
134,1,2,"Weisz, Mrs. Leopold (Mathilde Francoise Pede)",female,29,1,0,228414,26,,S
|
136 |
+
135,0,2,"Sobey, Mr. Samuel James Hayden",male,25,0,0,C.A. 29178,13,,S
|
137 |
+
136,0,2,"Richard, Mr. Emile",male,23,0,0,SC/PARIS 2133,15.0458,,C
|
138 |
+
137,1,1,"Newsom, Miss. Helen Monypeny",female,19,0,2,11752,26.2833,D47,S
|
139 |
+
138,0,1,"Futrelle, Mr. Jacques Heath",male,37,1,0,113803,53.1,C123,S
|
140 |
+
139,0,3,"Osen, Mr. Olaf Elon",male,16,0,0,7534,9.2167,,S
|
141 |
+
140,0,1,"Giglio, Mr. Victor",male,24,0,0,PC 17593,79.2,B86,C
|
142 |
+
141,0,3,"Boulos, Mrs. Joseph (Sultana)",female,,0,2,2678,15.2458,,C
|
143 |
+
142,1,3,"Nysten, Miss. Anna Sofia",female,22,0,0,347081,7.75,,S
|
144 |
+
143,1,3,"Hakkarainen, Mrs. Pekka Pietari (Elin Matilda Dolck)",female,24,1,0,STON/O2. 3101279,15.85,,S
|
145 |
+
144,0,3,"Burke, Mr. Jeremiah",male,19,0,0,365222,6.75,,Q
|
146 |
+
145,0,2,"Andrew, Mr. Edgardo Samuel",male,18,0,0,231945,11.5,,S
|
147 |
+
146,0,2,"Nicholls, Mr. Joseph Charles",male,19,1,1,C.A. 33112,36.75,,S
|
148 |
+
147,1,3,"Andersson, Mr. August Edvard (""Wennerstrom"")",male,27,0,0,350043,7.7958,,S
|
149 |
+
148,0,3,"Ford, Miss. Robina Maggie ""Ruby""",female,9,2,2,W./C. 6608,34.375,,S
|
150 |
+
149,0,2,"Navratil, Mr. Michel (""Louis M Hoffman"")",male,36.5,0,2,230080,26,F2,S
|
151 |
+
150,0,2,"Byles, Rev. Thomas Roussel Davids",male,42,0,0,244310,13,,S
|
152 |
+
151,0,2,"Bateman, Rev. Robert James",male,51,0,0,S.O.P. 1166,12.525,,S
|
153 |
+
152,1,1,"Pears, Mrs. Thomas (Edith Wearne)",female,22,1,0,113776,66.6,C2,S
|
154 |
+
153,0,3,"Meo, Mr. Alfonzo",male,55.5,0,0,A.5. 11206,8.05,,S
|
155 |
+
154,0,3,"van Billiard, Mr. Austin Blyler",male,40.5,0,2,A/5. 851,14.5,,S
|
156 |
+
155,0,3,"Olsen, Mr. Ole Martin",male,,0,0,Fa 265302,7.3125,,S
|
157 |
+
156,0,1,"Williams, Mr. Charles Duane",male,51,0,1,PC 17597,61.3792,,C
|
158 |
+
157,1,3,"Gilnagh, Miss. Katherine ""Katie""",female,16,0,0,35851,7.7333,,Q
|
159 |
+
158,0,3,"Corn, Mr. Harry",male,30,0,0,SOTON/OQ 392090,8.05,,S
|
160 |
+
159,0,3,"Smiljanic, Mr. Mile",male,,0,0,315037,8.6625,,S
|
161 |
+
160,0,3,"Sage, Master. Thomas Henry",male,,8,2,CA. 2343,69.55,,S
|
162 |
+
161,0,3,"Cribb, Mr. John Hatfield",male,44,0,1,371362,16.1,,S
|
163 |
+
162,1,2,"Watt, Mrs. James (Elizabeth ""Bessie"" Inglis Milne)",female,40,0,0,C.A. 33595,15.75,,S
|
164 |
+
163,0,3,"Bengtsson, Mr. John Viktor",male,26,0,0,347068,7.775,,S
|
165 |
+
164,0,3,"Calic, Mr. Jovo",male,17,0,0,315093,8.6625,,S
|
166 |
+
165,0,3,"Panula, Master. Eino Viljami",male,1,4,1,3101295,39.6875,,S
|
167 |
+
166,1,3,"Goldsmith, Master. Frank John William ""Frankie""",male,9,0,2,363291,20.525,,S
|
168 |
+
167,1,1,"Chibnall, Mrs. (Edith Martha Bowerman)",female,,0,1,113505,55,E33,S
|
169 |
+
168,0,3,"Skoog, Mrs. William (Anna Bernhardina Karlsson)",female,45,1,4,347088,27.9,,S
|
170 |
+
169,0,1,"Baumann, Mr. John D",male,,0,0,PC 17318,25.925,,S
|
171 |
+
170,0,3,"Ling, Mr. Lee",male,28,0,0,1601,56.4958,,S
|
172 |
+
171,0,1,"Van der hoef, Mr. Wyckoff",male,61,0,0,111240,33.5,B19,S
|
173 |
+
172,0,3,"Rice, Master. Arthur",male,4,4,1,382652,29.125,,Q
|
174 |
+
173,1,3,"Johnson, Miss. Eleanor Ileen",female,1,1,1,347742,11.1333,,S
|
175 |
+
174,0,3,"Sivola, Mr. Antti Wilhelm",male,21,0,0,STON/O 2. 3101280,7.925,,S
|
176 |
+
175,0,1,"Smith, Mr. James Clinch",male,56,0,0,17764,30.6958,A7,C
|
177 |
+
176,0,3,"Klasen, Mr. Klas Albin",male,18,1,1,350404,7.8542,,S
|
178 |
+
177,0,3,"Lefebre, Master. Henry Forbes",male,,3,1,4133,25.4667,,S
|
179 |
+
178,0,1,"Isham, Miss. Ann Elizabeth",female,50,0,0,PC 17595,28.7125,C49,C
|
180 |
+
179,0,2,"Hale, Mr. Reginald",male,30,0,0,250653,13,,S
|
181 |
+
180,0,3,"Leonard, Mr. Lionel",male,36,0,0,LINE,0,,S
|
182 |
+
181,0,3,"Sage, Miss. Constance Gladys",female,,8,2,CA. 2343,69.55,,S
|
183 |
+
182,0,2,"Pernot, Mr. Rene",male,,0,0,SC/PARIS 2131,15.05,,C
|
184 |
+
183,0,3,"Asplund, Master. Clarence Gustaf Hugo",male,9,4,2,347077,31.3875,,S
|
185 |
+
184,1,2,"Becker, Master. Richard F",male,1,2,1,230136,39,F4,S
|
186 |
+
185,1,3,"Kink-Heilmann, Miss. Luise Gretchen",female,4,0,2,315153,22.025,,S
|
187 |
+
186,0,1,"Rood, Mr. Hugh Roscoe",male,,0,0,113767,50,A32,S
|
188 |
+
187,1,3,"O'Brien, Mrs. Thomas (Johanna ""Hannah"" Godfrey)",female,,1,0,370365,15.5,,Q
|
189 |
+
188,1,1,"Romaine, Mr. Charles Hallace (""Mr C Rolmane"")",male,45,0,0,111428,26.55,,S
|
190 |
+
189,0,3,"Bourke, Mr. John",male,40,1,1,364849,15.5,,Q
|
191 |
+
190,0,3,"Turcin, Mr. Stjepan",male,36,0,0,349247,7.8958,,S
|
192 |
+
191,1,2,"Pinsky, Mrs. (Rosa)",female,32,0,0,234604,13,,S
|
193 |
+
192,0,2,"Carbines, Mr. William",male,19,0,0,28424,13,,S
|
194 |
+
193,1,3,"Andersen-Jensen, Miss. Carla Christine Nielsine",female,19,1,0,350046,7.8542,,S
|
195 |
+
194,1,2,"Navratil, Master. Michel M",male,3,1,1,230080,26,F2,S
|
196 |
+
195,1,1,"Brown, Mrs. James Joseph (Margaret Tobin)",female,44,0,0,PC 17610,27.7208,B4,C
|
197 |
+
196,1,1,"Lurette, Miss. Elise",female,58,0,0,PC 17569,146.5208,B80,C
|
198 |
+
197,0,3,"Mernagh, Mr. Robert",male,,0,0,368703,7.75,,Q
|
199 |
+
198,0,3,"Olsen, Mr. Karl Siegwart Andreas",male,42,0,1,4579,8.4042,,S
|
200 |
+
199,1,3,"Madigan, Miss. Margaret ""Maggie""",female,,0,0,370370,7.75,,Q
|
201 |
+
200,0,2,"Yrois, Miss. Henriette (""Mrs Harbeck"")",female,24,0,0,248747,13,,S
|
202 |
+
201,0,3,"Vande Walle, Mr. Nestor Cyriel",male,28,0,0,345770,9.5,,S
|
203 |
+
202,0,3,"Sage, Mr. Frederick",male,,8,2,CA. 2343,69.55,,S
|
204 |
+
203,0,3,"Johanson, Mr. Jakob Alfred",male,34,0,0,3101264,6.4958,,S
|
205 |
+
204,0,3,"Youseff, Mr. Gerious",male,45.5,0,0,2628,7.225,,C
|
206 |
+
205,1,3,"Cohen, Mr. Gurshon ""Gus""",male,18,0,0,A/5 3540,8.05,,S
|
207 |
+
206,0,3,"Strom, Miss. Telma Matilda",female,2,0,1,347054,10.4625,G6,S
|
208 |
+
207,0,3,"Backstrom, Mr. Karl Alfred",male,32,1,0,3101278,15.85,,S
|
209 |
+
208,1,3,"Albimona, Mr. Nassef Cassem",male,26,0,0,2699,18.7875,,C
|
210 |
+
209,1,3,"Carr, Miss. Helen ""Ellen""",female,16,0,0,367231,7.75,,Q
|
211 |
+
210,1,1,"Blank, Mr. Henry",male,40,0,0,112277,31,A31,C
|
212 |
+
211,0,3,"Ali, Mr. Ahmed",male,24,0,0,SOTON/O.Q. 3101311,7.05,,S
|
213 |
+
212,1,2,"Cameron, Miss. Clear Annie",female,35,0,0,F.C.C. 13528,21,,S
|
214 |
+
213,0,3,"Perkin, Mr. John Henry",male,22,0,0,A/5 21174,7.25,,S
|
215 |
+
214,0,2,"Givard, Mr. Hans Kristensen",male,30,0,0,250646,13,,S
|
216 |
+
215,0,3,"Kiernan, Mr. Philip",male,,1,0,367229,7.75,,Q
|
217 |
+
216,1,1,"Newell, Miss. Madeleine",female,31,1,0,35273,113.275,D36,C
|
218 |
+
217,1,3,"Honkanen, Miss. Eliina",female,27,0,0,STON/O2. 3101283,7.925,,S
|
219 |
+
218,0,2,"Jacobsohn, Mr. Sidney Samuel",male,42,1,0,243847,27,,S
|
220 |
+
219,1,1,"Bazzani, Miss. Albina",female,32,0,0,11813,76.2917,D15,C
|
221 |
+
220,0,2,"Harris, Mr. Walter",male,30,0,0,W/C 14208,10.5,,S
|
222 |
+
221,1,3,"Sunderland, Mr. Victor Francis",male,16,0,0,SOTON/OQ 392089,8.05,,S
|
223 |
+
222,0,2,"Bracken, Mr. James H",male,27,0,0,220367,13,,S
|
224 |
+
223,0,3,"Green, Mr. George Henry",male,51,0,0,21440,8.05,,S
|
225 |
+
224,0,3,"Nenkoff, Mr. Christo",male,,0,0,349234,7.8958,,S
|
226 |
+
225,1,1,"Hoyt, Mr. Frederick Maxfield",male,38,1,0,19943,90,C93,S
|
227 |
+
226,0,3,"Berglund, Mr. Karl Ivar Sven",male,22,0,0,PP 4348,9.35,,S
|
228 |
+
227,1,2,"Mellors, Mr. William John",male,19,0,0,SW/PP 751,10.5,,S
|
229 |
+
228,0,3,"Lovell, Mr. John Hall (""Henry"")",male,20.5,0,0,A/5 21173,7.25,,S
|
230 |
+
229,0,2,"Fahlstrom, Mr. Arne Jonas",male,18,0,0,236171,13,,S
|
231 |
+
230,0,3,"Lefebre, Miss. Mathilde",female,,3,1,4133,25.4667,,S
|
232 |
+
231,1,1,"Harris, Mrs. Henry Birkhardt (Irene Wallach)",female,35,1,0,36973,83.475,C83,S
|
233 |
+
232,0,3,"Larsson, Mr. Bengt Edvin",male,29,0,0,347067,7.775,,S
|
234 |
+
233,0,2,"Sjostedt, Mr. Ernst Adolf",male,59,0,0,237442,13.5,,S
|
235 |
+
234,1,3,"Asplund, Miss. Lillian Gertrud",female,5,4,2,347077,31.3875,,S
|
236 |
+
235,0,2,"Leyson, Mr. Robert William Norman",male,24,0,0,C.A. 29566,10.5,,S
|
237 |
+
236,0,3,"Harknett, Miss. Alice Phoebe",female,,0,0,W./C. 6609,7.55,,S
|
238 |
+
237,0,2,"Hold, Mr. Stephen",male,44,1,0,26707,26,,S
|
239 |
+
238,1,2,"Collyer, Miss. Marjorie ""Lottie""",female,8,0,2,C.A. 31921,26.25,,S
|
240 |
+
239,0,2,"Pengelly, Mr. Frederick William",male,19,0,0,28665,10.5,,S
|
241 |
+
240,0,2,"Hunt, Mr. George Henry",male,33,0,0,SCO/W 1585,12.275,,S
|
242 |
+
241,0,3,"Zabour, Miss. Thamine",female,,1,0,2665,14.4542,,C
|
243 |
+
242,1,3,"Murphy, Miss. Katherine ""Kate""",female,,1,0,367230,15.5,,Q
|
244 |
+
243,0,2,"Coleridge, Mr. Reginald Charles",male,29,0,0,W./C. 14263,10.5,,S
|
245 |
+
244,0,3,"Maenpaa, Mr. Matti Alexanteri",male,22,0,0,STON/O 2. 3101275,7.125,,S
|
246 |
+
245,0,3,"Attalah, Mr. Sleiman",male,30,0,0,2694,7.225,,C
|
247 |
+
246,0,1,"Minahan, Dr. William Edward",male,44,2,0,19928,90,C78,Q
|
248 |
+
247,0,3,"Lindahl, Miss. Agda Thorilda Viktoria",female,25,0,0,347071,7.775,,S
|
249 |
+
248,1,2,"Hamalainen, Mrs. William (Anna)",female,24,0,2,250649,14.5,,S
|
250 |
+
249,1,1,"Beckwith, Mr. Richard Leonard",male,37,1,1,11751,52.5542,D35,S
|
251 |
+
250,0,2,"Carter, Rev. Ernest Courtenay",male,54,1,0,244252,26,,S
|
252 |
+
251,0,3,"Reed, Mr. James George",male,,0,0,362316,7.25,,S
|
253 |
+
252,0,3,"Strom, Mrs. Wilhelm (Elna Matilda Persson)",female,29,1,1,347054,10.4625,G6,S
|
254 |
+
253,0,1,"Stead, Mr. William Thomas",male,62,0,0,113514,26.55,C87,S
|
255 |
+
254,0,3,"Lobb, Mr. William Arthur",male,30,1,0,A/5. 3336,16.1,,S
|
256 |
+
255,0,3,"Rosblom, Mrs. Viktor (Helena Wilhelmina)",female,41,0,2,370129,20.2125,,S
|
257 |
+
256,1,3,"Touma, Mrs. Darwis (Hanne Youssef Razi)",female,29,0,2,2650,15.2458,,C
|
258 |
+
257,1,1,"Thorne, Mrs. Gertrude Maybelle",female,,0,0,PC 17585,79.2,,C
|
259 |
+
258,1,1,"Cherry, Miss. Gladys",female,30,0,0,110152,86.5,B77,S
|
260 |
+
259,1,1,"Ward, Miss. Anna",female,35,0,0,PC 17755,512.3292,,C
|
261 |
+
260,1,2,"Parrish, Mrs. (Lutie Davis)",female,50,0,1,230433,26,,S
|
262 |
+
261,0,3,"Smith, Mr. Thomas",male,,0,0,384461,7.75,,Q
|
263 |
+
262,1,3,"Asplund, Master. Edvin Rojj Felix",male,3,4,2,347077,31.3875,,S
|
264 |
+
263,0,1,"Taussig, Mr. Emil",male,52,1,1,110413,79.65,E67,S
|
265 |
+
264,0,1,"Harrison, Mr. William",male,40,0,0,112059,0,B94,S
|
266 |
+
265,0,3,"Henry, Miss. Delia",female,,0,0,382649,7.75,,Q
|
267 |
+
266,0,2,"Reeves, Mr. David",male,36,0,0,C.A. 17248,10.5,,S
|
268 |
+
267,0,3,"Panula, Mr. Ernesti Arvid",male,16,4,1,3101295,39.6875,,S
|
269 |
+
268,1,3,"Persson, Mr. Ernst Ulrik",male,25,1,0,347083,7.775,,S
|
270 |
+
269,1,1,"Graham, Mrs. William Thompson (Edith Junkins)",female,58,0,1,PC 17582,153.4625,C125,S
|
271 |
+
270,1,1,"Bissette, Miss. Amelia",female,35,0,0,PC 17760,135.6333,C99,S
|
272 |
+
271,0,1,"Cairns, Mr. Alexander",male,,0,0,113798,31,,S
|
273 |
+
272,1,3,"Tornquist, Mr. William Henry",male,25,0,0,LINE,0,,S
|
274 |
+
273,1,2,"Mellinger, Mrs. (Elizabeth Anne Maidment)",female,41,0,1,250644,19.5,,S
|
275 |
+
274,0,1,"Natsch, Mr. Charles H",male,37,0,1,PC 17596,29.7,C118,C
|
276 |
+
275,1,3,"Healy, Miss. Hanora ""Nora""",female,,0,0,370375,7.75,,Q
|
277 |
+
276,1,1,"Andrews, Miss. Kornelia Theodosia",female,63,1,0,13502,77.9583,D7,S
|
278 |
+
277,0,3,"Lindblom, Miss. Augusta Charlotta",female,45,0,0,347073,7.75,,S
|
279 |
+
278,0,2,"Parkes, Mr. Francis ""Frank""",male,,0,0,239853,0,,S
|
280 |
+
279,0,3,"Rice, Master. Eric",male,7,4,1,382652,29.125,,Q
|
281 |
+
280,1,3,"Abbott, Mrs. Stanton (Rosa Hunt)",female,35,1,1,C.A. 2673,20.25,,S
|
282 |
+
281,0,3,"Duane, Mr. Frank",male,65,0,0,336439,7.75,,Q
|
283 |
+
282,0,3,"Olsson, Mr. Nils Johan Goransson",male,28,0,0,347464,7.8542,,S
|
284 |
+
283,0,3,"de Pelsmaeker, Mr. Alfons",male,16,0,0,345778,9.5,,S
|
285 |
+
284,1,3,"Dorking, Mr. Edward Arthur",male,19,0,0,A/5. 10482,8.05,,S
|
286 |
+
285,0,1,"Smith, Mr. Richard William",male,,0,0,113056,26,A19,S
|
287 |
+
286,0,3,"Stankovic, Mr. Ivan",male,33,0,0,349239,8.6625,,C
|
288 |
+
287,1,3,"de Mulder, Mr. Theodore",male,30,0,0,345774,9.5,,S
|
289 |
+
288,0,3,"Naidenoff, Mr. Penko",male,22,0,0,349206,7.8958,,S
|
290 |
+
289,1,2,"Hosono, Mr. Masabumi",male,42,0,0,237798,13,,S
|
291 |
+
290,1,3,"Connolly, Miss. Kate",female,22,0,0,370373,7.75,,Q
|
292 |
+
291,1,1,"Barber, Miss. Ellen ""Nellie""",female,26,0,0,19877,78.85,,S
|
293 |
+
292,1,1,"Bishop, Mrs. Dickinson H (Helen Walton)",female,19,1,0,11967,91.0792,B49,C
|
294 |
+
293,0,2,"Levy, Mr. Rene Jacques",male,36,0,0,SC/Paris 2163,12.875,D,C
|
295 |
+
294,0,3,"Haas, Miss. Aloisia",female,24,0,0,349236,8.85,,S
|
296 |
+
295,0,3,"Mineff, Mr. Ivan",male,24,0,0,349233,7.8958,,S
|
297 |
+
296,0,1,"Lewy, Mr. Ervin G",male,,0,0,PC 17612,27.7208,,C
|
298 |
+
297,0,3,"Hanna, Mr. Mansour",male,23.5,0,0,2693,7.2292,,C
|
299 |
+
298,0,1,"Allison, Miss. Helen Loraine",female,2,1,2,113781,151.55,C22 C26,S
|
300 |
+
299,1,1,"Saalfeld, Mr. Adolphe",male,,0,0,19988,30.5,C106,S
|
301 |
+
300,1,1,"Baxter, Mrs. James (Helene DeLaudeniere Chaput)",female,50,0,1,PC 17558,247.5208,B58 B60,C
|
302 |
+
301,1,3,"Kelly, Miss. Anna Katherine ""Annie Kate""",female,,0,0,9234,7.75,,Q
|
303 |
+
302,1,3,"McCoy, Mr. Bernard",male,,2,0,367226,23.25,,Q
|
304 |
+
303,0,3,"Johnson, Mr. William Cahoone Jr",male,19,0,0,LINE,0,,S
|
305 |
+
304,1,2,"Keane, Miss. Nora A",female,,0,0,226593,12.35,E101,Q
|
306 |
+
305,0,3,"Williams, Mr. Howard Hugh ""Harry""",male,,0,0,A/5 2466,8.05,,S
|
307 |
+
306,1,1,"Allison, Master. Hudson Trevor",male,0.92,1,2,113781,151.55,C22 C26,S
|
308 |
+
307,1,1,"Fleming, Miss. Margaret",female,,0,0,17421,110.8833,,C
|
309 |
+
308,1,1,"Penasco y Castellana, Mrs. Victor de Satode (Maria Josefa Perez de Soto y Vallejo)",female,17,1,0,PC 17758,108.9,C65,C
|
310 |
+
309,0,2,"Abelson, Mr. Samuel",male,30,1,0,P/PP 3381,24,,C
|
311 |
+
310,1,1,"Francatelli, Miss. Laura Mabel",female,30,0,0,PC 17485,56.9292,E36,C
|
312 |
+
311,1,1,"Hays, Miss. Margaret Bechstein",female,24,0,0,11767,83.1583,C54,C
|
313 |
+
312,1,1,"Ryerson, Miss. Emily Borie",female,18,2,2,PC 17608,262.375,B57 B59 B63 B66,C
|
314 |
+
313,0,2,"Lahtinen, Mrs. William (Anna Sylfven)",female,26,1,1,250651,26,,S
|
315 |
+
314,0,3,"Hendekovic, Mr. Ignjac",male,28,0,0,349243,7.8958,,S
|
316 |
+
315,0,2,"Hart, Mr. Benjamin",male,43,1,1,F.C.C. 13529,26.25,,S
|
317 |
+
316,1,3,"Nilsson, Miss. Helmina Josefina",female,26,0,0,347470,7.8542,,S
|
318 |
+
317,1,2,"Kantor, Mrs. Sinai (Miriam Sternin)",female,24,1,0,244367,26,,S
|
319 |
+
318,0,2,"Moraweck, Dr. Ernest",male,54,0,0,29011,14,,S
|
320 |
+
319,1,1,"Wick, Miss. Mary Natalie",female,31,0,2,36928,164.8667,C7,S
|
321 |
+
320,1,1,"Spedden, Mrs. Frederic Oakley (Margaretta Corning Stone)",female,40,1,1,16966,134.5,E34,C
|
322 |
+
321,0,3,"Dennis, Mr. Samuel",male,22,0,0,A/5 21172,7.25,,S
|
323 |
+
322,0,3,"Danoff, Mr. Yoto",male,27,0,0,349219,7.8958,,S
|
324 |
+
323,1,2,"Slayter, Miss. Hilda Mary",female,30,0,0,234818,12.35,,Q
|
325 |
+
324,1,2,"Caldwell, Mrs. Albert Francis (Sylvia Mae Harbaugh)",female,22,1,1,248738,29,,S
|
326 |
+
325,0,3,"Sage, Mr. George John Jr",male,,8,2,CA. 2343,69.55,,S
|
327 |
+
326,1,1,"Young, Miss. Marie Grice",female,36,0,0,PC 17760,135.6333,C32,C
|
328 |
+
327,0,3,"Nysveen, Mr. Johan Hansen",male,61,0,0,345364,6.2375,,S
|
329 |
+
328,1,2,"Ball, Mrs. (Ada E Hall)",female,36,0,0,28551,13,D,S
|
330 |
+
329,1,3,"Goldsmith, Mrs. Frank John (Emily Alice Brown)",female,31,1,1,363291,20.525,,S
|
331 |
+
330,1,1,"Hippach, Miss. Jean Gertrude",female,16,0,1,111361,57.9792,B18,C
|
332 |
+
331,1,3,"McCoy, Miss. Agnes",female,,2,0,367226,23.25,,Q
|
333 |
+
332,0,1,"Partner, Mr. Austen",male,45.5,0,0,113043,28.5,C124,S
|
334 |
+
333,0,1,"Graham, Mr. George Edward",male,38,0,1,PC 17582,153.4625,C91,S
|
335 |
+
334,0,3,"Vander Planke, Mr. Leo Edmondus",male,16,2,0,345764,18,,S
|
336 |
+
335,1,1,"Frauenthal, Mrs. Henry William (Clara Heinsheimer)",female,,1,0,PC 17611,133.65,,S
|
337 |
+
336,0,3,"Denkoff, Mr. Mitto",male,,0,0,349225,7.8958,,S
|
338 |
+
337,0,1,"Pears, Mr. Thomas Clinton",male,29,1,0,113776,66.6,C2,S
|
339 |
+
338,1,1,"Burns, Miss. Elizabeth Margaret",female,41,0,0,16966,134.5,E40,C
|
340 |
+
339,1,3,"Dahl, Mr. Karl Edwart",male,45,0,0,7598,8.05,,S
|
341 |
+
340,0,1,"Blackwell, Mr. Stephen Weart",male,45,0,0,113784,35.5,T,S
|
342 |
+
341,1,2,"Navratil, Master. Edmond Roger",male,2,1,1,230080,26,F2,S
|
343 |
+
342,1,1,"Fortune, Miss. Alice Elizabeth",female,24,3,2,19950,263,C23 C25 C27,S
|
344 |
+
343,0,2,"Collander, Mr. Erik Gustaf",male,28,0,0,248740,13,,S
|
345 |
+
344,0,2,"Sedgwick, Mr. Charles Frederick Waddington",male,25,0,0,244361,13,,S
|
346 |
+
345,0,2,"Fox, Mr. Stanley Hubert",male,36,0,0,229236,13,,S
|
347 |
+
346,1,2,"Brown, Miss. Amelia ""Mildred""",female,24,0,0,248733,13,F33,S
|
348 |
+
347,1,2,"Smith, Miss. Marion Elsie",female,40,0,0,31418,13,,S
|
349 |
+
348,1,3,"Davison, Mrs. Thomas Henry (Mary E Finck)",female,,1,0,386525,16.1,,S
|
350 |
+
349,1,3,"Coutts, Master. William Loch ""William""",male,3,1,1,C.A. 37671,15.9,,S
|
351 |
+
350,0,3,"Dimic, Mr. Jovan",male,42,0,0,315088,8.6625,,S
|
352 |
+
351,0,3,"Odahl, Mr. Nils Martin",male,23,0,0,7267,9.225,,S
|
353 |
+
352,0,1,"Williams-Lambert, Mr. Fletcher Fellows",male,,0,0,113510,35,C128,S
|
354 |
+
353,0,3,"Elias, Mr. Tannous",male,15,1,1,2695,7.2292,,C
|
355 |
+
354,0,3,"Arnold-Franchi, Mr. Josef",male,25,1,0,349237,17.8,,S
|
356 |
+
355,0,3,"Yousif, Mr. Wazli",male,,0,0,2647,7.225,,C
|
357 |
+
356,0,3,"Vanden Steen, Mr. Leo Peter",male,28,0,0,345783,9.5,,S
|
358 |
+
357,1,1,"Bowerman, Miss. Elsie Edith",female,22,0,1,113505,55,E33,S
|
359 |
+
358,0,2,"Funk, Miss. Annie Clemmer",female,38,0,0,237671,13,,S
|
360 |
+
359,1,3,"McGovern, Miss. Mary",female,,0,0,330931,7.8792,,Q
|
361 |
+
360,1,3,"Mockler, Miss. Helen Mary ""Ellie""",female,,0,0,330980,7.8792,,Q
|
362 |
+
361,0,3,"Skoog, Mr. Wilhelm",male,40,1,4,347088,27.9,,S
|
363 |
+
362,0,2,"del Carlo, Mr. Sebastiano",male,29,1,0,SC/PARIS 2167,27.7208,,C
|
364 |
+
363,0,3,"Barbara, Mrs. (Catherine David)",female,45,0,1,2691,14.4542,,C
|
365 |
+
364,0,3,"Asim, Mr. Adola",male,35,0,0,SOTON/O.Q. 3101310,7.05,,S
|
366 |
+
365,0,3,"O'Brien, Mr. Thomas",male,,1,0,370365,15.5,,Q
|
367 |
+
366,0,3,"Adahl, Mr. Mauritz Nils Martin",male,30,0,0,C 7076,7.25,,S
|
368 |
+
367,1,1,"Warren, Mrs. Frank Manley (Anna Sophia Atkinson)",female,60,1,0,110813,75.25,D37,C
|
369 |
+
368,1,3,"Moussa, Mrs. (Mantoura Boulos)",female,,0,0,2626,7.2292,,C
|
370 |
+
369,1,3,"Jermyn, Miss. Annie",female,,0,0,14313,7.75,,Q
|
371 |
+
370,1,1,"Aubart, Mme. Leontine Pauline",female,24,0,0,PC 17477,69.3,B35,C
|
372 |
+
371,1,1,"Harder, Mr. George Achilles",male,25,1,0,11765,55.4417,E50,C
|
373 |
+
372,0,3,"Wiklund, Mr. Jakob Alfred",male,18,1,0,3101267,6.4958,,S
|
374 |
+
373,0,3,"Beavan, Mr. William Thomas",male,19,0,0,323951,8.05,,S
|
375 |
+
374,0,1,"Ringhini, Mr. Sante",male,22,0,0,PC 17760,135.6333,,C
|
376 |
+
375,0,3,"Palsson, Miss. Stina Viola",female,3,3,1,349909,21.075,,S
|
377 |
+
376,1,1,"Meyer, Mrs. Edgar Joseph (Leila Saks)",female,,1,0,PC 17604,82.1708,,C
|
378 |
+
377,1,3,"Landergren, Miss. Aurora Adelia",female,22,0,0,C 7077,7.25,,S
|
379 |
+
378,0,1,"Widener, Mr. Harry Elkins",male,27,0,2,113503,211.5,C82,C
|
380 |
+
379,0,3,"Betros, Mr. Tannous",male,20,0,0,2648,4.0125,,C
|
381 |
+
380,0,3,"Gustafsson, Mr. Karl Gideon",male,19,0,0,347069,7.775,,S
|
382 |
+
381,1,1,"Bidois, Miss. Rosalie",female,42,0,0,PC 17757,227.525,,C
|
383 |
+
382,1,3,"Nakid, Miss. Maria (""Mary"")",female,1,0,2,2653,15.7417,,C
|
384 |
+
383,0,3,"Tikkanen, Mr. Juho",male,32,0,0,STON/O 2. 3101293,7.925,,S
|
385 |
+
384,1,1,"Holverson, Mrs. Alexander Oskar (Mary Aline Towner)",female,35,1,0,113789,52,,S
|
386 |
+
385,0,3,"Plotcharsky, Mr. Vasil",male,,0,0,349227,7.8958,,S
|
387 |
+
386,0,2,"Davies, Mr. Charles Henry",male,18,0,0,S.O.C. 14879,73.5,,S
|
388 |
+
387,0,3,"Goodwin, Master. Sidney Leonard",male,1,5,2,CA 2144,46.9,,S
|
389 |
+
388,1,2,"Buss, Miss. Kate",female,36,0,0,27849,13,,S
|
390 |
+
389,0,3,"Sadlier, Mr. Matthew",male,,0,0,367655,7.7292,,Q
|
391 |
+
390,1,2,"Lehmann, Miss. Bertha",female,17,0,0,SC 1748,12,,C
|
392 |
+
391,1,1,"Carter, Mr. William Ernest",male,36,1,2,113760,120,B96 B98,S
|
393 |
+
392,1,3,"Jansson, Mr. Carl Olof",male,21,0,0,350034,7.7958,,S
|
394 |
+
393,0,3,"Gustafsson, Mr. Johan Birger",male,28,2,0,3101277,7.925,,S
|
395 |
+
394,1,1,"Newell, Miss. Marjorie",female,23,1,0,35273,113.275,D36,C
|
396 |
+
395,1,3,"Sandstrom, Mrs. Hjalmar (Agnes Charlotta Bengtsson)",female,24,0,2,PP 9549,16.7,G6,S
|
397 |
+
396,0,3,"Johansson, Mr. Erik",male,22,0,0,350052,7.7958,,S
|
398 |
+
397,0,3,"Olsson, Miss. Elina",female,31,0,0,350407,7.8542,,S
|
399 |
+
398,0,2,"McKane, Mr. Peter David",male,46,0,0,28403,26,,S
|
400 |
+
399,0,2,"Pain, Dr. Alfred",male,23,0,0,244278,10.5,,S
|
401 |
+
400,1,2,"Trout, Mrs. William H (Jessie L)",female,28,0,0,240929,12.65,,S
|
402 |
+
401,1,3,"Niskanen, Mr. Juha",male,39,0,0,STON/O 2. 3101289,7.925,,S
|
403 |
+
402,0,3,"Adams, Mr. John",male,26,0,0,341826,8.05,,S
|
404 |
+
403,0,3,"Jussila, Miss. Mari Aina",female,21,1,0,4137,9.825,,S
|
405 |
+
404,0,3,"Hakkarainen, Mr. Pekka Pietari",male,28,1,0,STON/O2. 3101279,15.85,,S
|
406 |
+
405,0,3,"Oreskovic, Miss. Marija",female,20,0,0,315096,8.6625,,S
|
407 |
+
406,0,2,"Gale, Mr. Shadrach",male,34,1,0,28664,21,,S
|
408 |
+
407,0,3,"Widegren, Mr. Carl/Charles Peter",male,51,0,0,347064,7.75,,S
|
409 |
+
408,1,2,"Richards, Master. William Rowe",male,3,1,1,29106,18.75,,S
|
410 |
+
409,0,3,"Birkeland, Mr. Hans Martin Monsen",male,21,0,0,312992,7.775,,S
|
411 |
+
410,0,3,"Lefebre, Miss. Ida",female,,3,1,4133,25.4667,,S
|
412 |
+
411,0,3,"Sdycoff, Mr. Todor",male,,0,0,349222,7.8958,,S
|
413 |
+
412,0,3,"Hart, Mr. Henry",male,,0,0,394140,6.8583,,Q
|
414 |
+
413,1,1,"Minahan, Miss. Daisy E",female,33,1,0,19928,90,C78,Q
|
415 |
+
414,0,2,"Cunningham, Mr. Alfred Fleming",male,,0,0,239853,0,,S
|
416 |
+
415,1,3,"Sundman, Mr. Johan Julian",male,44,0,0,STON/O 2. 3101269,7.925,,S
|
417 |
+
416,0,3,"Meek, Mrs. Thomas (Annie Louise Rowley)",female,,0,0,343095,8.05,,S
|
418 |
+
417,1,2,"Drew, Mrs. James Vivian (Lulu Thorne Christian)",female,34,1,1,28220,32.5,,S
|
419 |
+
418,1,2,"Silven, Miss. Lyyli Karoliina",female,18,0,2,250652,13,,S
|
420 |
+
419,0,2,"Matthews, Mr. William John",male,30,0,0,28228,13,,S
|
421 |
+
420,0,3,"Van Impe, Miss. Catharina",female,10,0,2,345773,24.15,,S
|
422 |
+
421,0,3,"Gheorgheff, Mr. Stanio",male,,0,0,349254,7.8958,,C
|
423 |
+
422,0,3,"Charters, Mr. David",male,21,0,0,A/5. 13032,7.7333,,Q
|
424 |
+
423,0,3,"Zimmerman, Mr. Leo",male,29,0,0,315082,7.875,,S
|
425 |
+
424,0,3,"Danbom, Mrs. Ernst Gilbert (Anna Sigrid Maria Brogren)",female,28,1,1,347080,14.4,,S
|
426 |
+
425,0,3,"Rosblom, Mr. Viktor Richard",male,18,1,1,370129,20.2125,,S
|
427 |
+
426,0,3,"Wiseman, Mr. Phillippe",male,,0,0,A/4. 34244,7.25,,S
|
428 |
+
427,1,2,"Clarke, Mrs. Charles V (Ada Maria Winfield)",female,28,1,0,2003,26,,S
|
429 |
+
428,1,2,"Phillips, Miss. Kate Florence (""Mrs Kate Louise Phillips Marshall"")",female,19,0,0,250655,26,,S
|
430 |
+
429,0,3,"Flynn, Mr. James",male,,0,0,364851,7.75,,Q
|
431 |
+
430,1,3,"Pickard, Mr. Berk (Berk Trembisky)",male,32,0,0,SOTON/O.Q. 392078,8.05,E10,S
|
432 |
+
431,1,1,"Bjornstrom-Steffansson, Mr. Mauritz Hakan",male,28,0,0,110564,26.55,C52,S
|
433 |
+
432,1,3,"Thorneycroft, Mrs. Percival (Florence Kate White)",female,,1,0,376564,16.1,,S
|
434 |
+
433,1,2,"Louch, Mrs. Charles Alexander (Alice Adelaide Slow)",female,42,1,0,SC/AH 3085,26,,S
|
435 |
+
434,0,3,"Kallio, Mr. Nikolai Erland",male,17,0,0,STON/O 2. 3101274,7.125,,S
|
436 |
+
435,0,1,"Silvey, Mr. William Baird",male,50,1,0,13507,55.9,E44,S
|
437 |
+
436,1,1,"Carter, Miss. Lucile Polk",female,14,1,2,113760,120,B96 B98,S
|
438 |
+
437,0,3,"Ford, Miss. Doolina Margaret ""Daisy""",female,21,2,2,W./C. 6608,34.375,,S
|
439 |
+
438,1,2,"Richards, Mrs. Sidney (Emily Hocking)",female,24,2,3,29106,18.75,,S
|
440 |
+
439,0,1,"Fortune, Mr. Mark",male,64,1,4,19950,263,C23 C25 C27,S
|
441 |
+
440,0,2,"Kvillner, Mr. Johan Henrik Johannesson",male,31,0,0,C.A. 18723,10.5,,S
|
442 |
+
441,1,2,"Hart, Mrs. Benjamin (Esther Ada Bloomfield)",female,45,1,1,F.C.C. 13529,26.25,,S
|
443 |
+
442,0,3,"Hampe, Mr. Leon",male,20,0,0,345769,9.5,,S
|
444 |
+
443,0,3,"Petterson, Mr. Johan Emil",male,25,1,0,347076,7.775,,S
|
445 |
+
444,1,2,"Reynaldo, Ms. Encarnacion",female,28,0,0,230434,13,,S
|
446 |
+
445,1,3,"Johannesen-Bratthammer, Mr. Bernt",male,,0,0,65306,8.1125,,S
|
447 |
+
446,1,1,"Dodge, Master. Washington",male,4,0,2,33638,81.8583,A34,S
|
448 |
+
447,1,2,"Mellinger, Miss. Madeleine Violet",female,13,0,1,250644,19.5,,S
|
449 |
+
448,1,1,"Seward, Mr. Frederic Kimber",male,34,0,0,113794,26.55,,S
|
450 |
+
449,1,3,"Baclini, Miss. Marie Catherine",female,5,2,1,2666,19.2583,,C
|
451 |
+
450,1,1,"Peuchen, Major. Arthur Godfrey",male,52,0,0,113786,30.5,C104,S
|
452 |
+
451,0,2,"West, Mr. Edwy Arthur",male,36,1,2,C.A. 34651,27.75,,S
|
453 |
+
452,0,3,"Hagland, Mr. Ingvald Olai Olsen",male,,1,0,65303,19.9667,,S
|
454 |
+
453,0,1,"Foreman, Mr. Benjamin Laventall",male,30,0,0,113051,27.75,C111,C
|
455 |
+
454,1,1,"Goldenberg, Mr. Samuel L",male,49,1,0,17453,89.1042,C92,C
|
456 |
+
455,0,3,"Peduzzi, Mr. Joseph",male,,0,0,A/5 2817,8.05,,S
|
457 |
+
456,1,3,"Jalsevac, Mr. Ivan",male,29,0,0,349240,7.8958,,C
|
458 |
+
457,0,1,"Millet, Mr. Francis Davis",male,65,0,0,13509,26.55,E38,S
|
459 |
+
458,1,1,"Kenyon, Mrs. Frederick R (Marion)",female,,1,0,17464,51.8625,D21,S
|
460 |
+
459,1,2,"Toomey, Miss. Ellen",female,50,0,0,F.C.C. 13531,10.5,,S
|
461 |
+
460,0,3,"O'Connor, Mr. Maurice",male,,0,0,371060,7.75,,Q
|
462 |
+
461,1,1,"Anderson, Mr. Harry",male,48,0,0,19952,26.55,E12,S
|
463 |
+
462,0,3,"Morley, Mr. William",male,34,0,0,364506,8.05,,S
|
464 |
+
463,0,1,"Gee, Mr. Arthur H",male,47,0,0,111320,38.5,E63,S
|
465 |
+
464,0,2,"Milling, Mr. Jacob Christian",male,48,0,0,234360,13,,S
|
466 |
+
465,0,3,"Maisner, Mr. Simon",male,,0,0,A/S 2816,8.05,,S
|
467 |
+
466,0,3,"Goncalves, Mr. Manuel Estanslas",male,38,0,0,SOTON/O.Q. 3101306,7.05,,S
|
468 |
+
467,0,2,"Campbell, Mr. William",male,,0,0,239853,0,,S
|
469 |
+
468,0,1,"Smart, Mr. John Montgomery",male,56,0,0,113792,26.55,,S
|
470 |
+
469,0,3,"Scanlan, Mr. James",male,,0,0,36209,7.725,,Q
|
471 |
+
470,1,3,"Baclini, Miss. Helene Barbara",female,0.75,2,1,2666,19.2583,,C
|
472 |
+
471,0,3,"Keefe, Mr. Arthur",male,,0,0,323592,7.25,,S
|
473 |
+
472,0,3,"Cacic, Mr. Luka",male,38,0,0,315089,8.6625,,S
|
474 |
+
473,1,2,"West, Mrs. Edwy Arthur (Ada Mary Worth)",female,33,1,2,C.A. 34651,27.75,,S
|
475 |
+
474,1,2,"Jerwan, Mrs. Amin S (Marie Marthe Thuillard)",female,23,0,0,SC/AH Basle 541,13.7917,D,C
|
476 |
+
475,0,3,"Strandberg, Miss. Ida Sofia",female,22,0,0,7553,9.8375,,S
|
477 |
+
476,0,1,"Clifford, Mr. George Quincy",male,,0,0,110465,52,A14,S
|
478 |
+
477,0,2,"Renouf, Mr. Peter Henry",male,34,1,0,31027,21,,S
|
479 |
+
478,0,3,"Braund, Mr. Lewis Richard",male,29,1,0,3460,7.0458,,S
|
480 |
+
479,0,3,"Karlsson, Mr. Nils August",male,22,0,0,350060,7.5208,,S
|
481 |
+
480,1,3,"Hirvonen, Miss. Hildur E",female,2,0,1,3101298,12.2875,,S
|
482 |
+
481,0,3,"Goodwin, Master. Harold Victor",male,9,5,2,CA 2144,46.9,,S
|
483 |
+
482,0,2,"Frost, Mr. Anthony Wood ""Archie""",male,,0,0,239854,0,,S
|
484 |
+
483,0,3,"Rouse, Mr. Richard Henry",male,50,0,0,A/5 3594,8.05,,S
|
485 |
+
484,1,3,"Turkula, Mrs. (Hedwig)",female,63,0,0,4134,9.5875,,S
|
486 |
+
485,1,1,"Bishop, Mr. Dickinson H",male,25,1,0,11967,91.0792,B49,C
|
487 |
+
486,0,3,"Lefebre, Miss. Jeannie",female,,3,1,4133,25.4667,,S
|
488 |
+
487,1,1,"Hoyt, Mrs. Frederick Maxfield (Jane Anne Forby)",female,35,1,0,19943,90,C93,S
|
489 |
+
488,0,1,"Kent, Mr. Edward Austin",male,58,0,0,11771,29.7,B37,C
|
490 |
+
489,0,3,"Somerton, Mr. Francis William",male,30,0,0,A.5. 18509,8.05,,S
|
491 |
+
490,1,3,"Coutts, Master. Eden Leslie ""Neville""",male,9,1,1,C.A. 37671,15.9,,S
|
492 |
+
491,0,3,"Hagland, Mr. Konrad Mathias Reiersen",male,,1,0,65304,19.9667,,S
|
493 |
+
492,0,3,"Windelov, Mr. Einar",male,21,0,0,SOTON/OQ 3101317,7.25,,S
|
494 |
+
493,0,1,"Molson, Mr. Harry Markland",male,55,0,0,113787,30.5,C30,S
|
495 |
+
494,0,1,"Artagaveytia, Mr. Ramon",male,71,0,0,PC 17609,49.5042,,C
|
496 |
+
495,0,3,"Stanley, Mr. Edward Roland",male,21,0,0,A/4 45380,8.05,,S
|
497 |
+
496,0,3,"Yousseff, Mr. Gerious",male,,0,0,2627,14.4583,,C
|
498 |
+
497,1,1,"Eustis, Miss. Elizabeth Mussey",female,54,1,0,36947,78.2667,D20,C
|
499 |
+
498,0,3,"Shellard, Mr. Frederick William",male,,0,0,C.A. 6212,15.1,,S
|
500 |
+
499,0,1,"Allison, Mrs. Hudson J C (Bessie Waldo Daniels)",female,25,1,2,113781,151.55,C22 C26,S
|
501 |
+
500,0,3,"Svensson, Mr. Olof",male,24,0,0,350035,7.7958,,S
|
502 |
+
501,0,3,"Calic, Mr. Petar",male,17,0,0,315086,8.6625,,S
|
503 |
+
502,0,3,"Canavan, Miss. Mary",female,21,0,0,364846,7.75,,Q
|
504 |
+
503,0,3,"O'Sullivan, Miss. Bridget Mary",female,,0,0,330909,7.6292,,Q
|
505 |
+
504,0,3,"Laitinen, Miss. Kristina Sofia",female,37,0,0,4135,9.5875,,S
|
506 |
+
505,1,1,"Maioni, Miss. Roberta",female,16,0,0,110152,86.5,B79,S
|
507 |
+
506,0,1,"Penasco y Castellana, Mr. Victor de Satode",male,18,1,0,PC 17758,108.9,C65,C
|
508 |
+
507,1,2,"Quick, Mrs. Frederick Charles (Jane Richards)",female,33,0,2,26360,26,,S
|
509 |
+
508,1,1,"Bradley, Mr. George (""George Arthur Brayton"")",male,,0,0,111427,26.55,,S
|
510 |
+
509,0,3,"Olsen, Mr. Henry Margido",male,28,0,0,C 4001,22.525,,S
|
511 |
+
510,1,3,"Lang, Mr. Fang",male,26,0,0,1601,56.4958,,S
|
512 |
+
511,1,3,"Daly, Mr. Eugene Patrick",male,29,0,0,382651,7.75,,Q
|
513 |
+
512,0,3,"Webber, Mr. James",male,,0,0,SOTON/OQ 3101316,8.05,,S
|
514 |
+
513,1,1,"McGough, Mr. James Robert",male,36,0,0,PC 17473,26.2875,E25,S
|
515 |
+
514,1,1,"Rothschild, Mrs. Martin (Elizabeth L. Barrett)",female,54,1,0,PC 17603,59.4,,C
|
516 |
+
515,0,3,"Coleff, Mr. Satio",male,24,0,0,349209,7.4958,,S
|
517 |
+
516,0,1,"Walker, Mr. William Anderson",male,47,0,0,36967,34.0208,D46,S
|
518 |
+
517,1,2,"Lemore, Mrs. (Amelia Milley)",female,34,0,0,C.A. 34260,10.5,F33,S
|
519 |
+
518,0,3,"Ryan, Mr. Patrick",male,,0,0,371110,24.15,,Q
|
520 |
+
519,1,2,"Angle, Mrs. William A (Florence ""Mary"" Agnes Hughes)",female,36,1,0,226875,26,,S
|
521 |
+
520,0,3,"Pavlovic, Mr. Stefo",male,32,0,0,349242,7.8958,,S
|
522 |
+
521,1,1,"Perreault, Miss. Anne",female,30,0,0,12749,93.5,B73,S
|
523 |
+
522,0,3,"Vovk, Mr. Janko",male,22,0,0,349252,7.8958,,S
|
524 |
+
523,0,3,"Lahoud, Mr. Sarkis",male,,0,0,2624,7.225,,C
|
525 |
+
524,1,1,"Hippach, Mrs. Louis Albert (Ida Sophia Fischer)",female,44,0,1,111361,57.9792,B18,C
|
526 |
+
525,0,3,"Kassem, Mr. Fared",male,,0,0,2700,7.2292,,C
|
527 |
+
526,0,3,"Farrell, Mr. James",male,40.5,0,0,367232,7.75,,Q
|
528 |
+
527,1,2,"Ridsdale, Miss. Lucy",female,50,0,0,W./C. 14258,10.5,,S
|
529 |
+
528,0,1,"Farthing, Mr. John",male,,0,0,PC 17483,221.7792,C95,S
|
530 |
+
529,0,3,"Salonen, Mr. Johan Werner",male,39,0,0,3101296,7.925,,S
|
531 |
+
530,0,2,"Hocking, Mr. Richard George",male,23,2,1,29104,11.5,,S
|
532 |
+
531,1,2,"Quick, Miss. Phyllis May",female,2,1,1,26360,26,,S
|
533 |
+
532,0,3,"Toufik, Mr. Nakli",male,,0,0,2641,7.2292,,C
|
534 |
+
533,0,3,"Elias, Mr. Joseph Jr",male,17,1,1,2690,7.2292,,C
|
535 |
+
534,1,3,"Peter, Mrs. Catherine (Catherine Rizk)",female,,0,2,2668,22.3583,,C
|
536 |
+
535,0,3,"Cacic, Miss. Marija",female,30,0,0,315084,8.6625,,S
|
537 |
+
536,1,2,"Hart, Miss. Eva Miriam",female,7,0,2,F.C.C. 13529,26.25,,S
|
538 |
+
537,0,1,"Butt, Major. Archibald Willingham",male,45,0,0,113050,26.55,B38,S
|
539 |
+
538,1,1,"LeRoy, Miss. Bertha",female,30,0,0,PC 17761,106.425,,C
|
540 |
+
539,0,3,"Risien, Mr. Samuel Beard",male,,0,0,364498,14.5,,S
|
541 |
+
540,1,1,"Frolicher, Miss. Hedwig Margaritha",female,22,0,2,13568,49.5,B39,C
|
542 |
+
541,1,1,"Crosby, Miss. Harriet R",female,36,0,2,WE/P 5735,71,B22,S
|
543 |
+
542,0,3,"Andersson, Miss. Ingeborg Constanzia",female,9,4,2,347082,31.275,,S
|
544 |
+
543,0,3,"Andersson, Miss. Sigrid Elisabeth",female,11,4,2,347082,31.275,,S
|
545 |
+
544,1,2,"Beane, Mr. Edward",male,32,1,0,2908,26,,S
|
546 |
+
545,0,1,"Douglas, Mr. Walter Donald",male,50,1,0,PC 17761,106.425,C86,C
|
547 |
+
546,0,1,"Nicholson, Mr. Arthur Ernest",male,64,0,0,693,26,,S
|
548 |
+
547,1,2,"Beane, Mrs. Edward (Ethel Clarke)",female,19,1,0,2908,26,,S
|
549 |
+
548,1,2,"Padro y Manent, Mr. Julian",male,,0,0,SC/PARIS 2146,13.8625,,C
|
550 |
+
549,0,3,"Goldsmith, Mr. Frank John",male,33,1,1,363291,20.525,,S
|
551 |
+
550,1,2,"Davies, Master. John Morgan Jr",male,8,1,1,C.A. 33112,36.75,,S
|
552 |
+
551,1,1,"Thayer, Mr. John Borland Jr",male,17,0,2,17421,110.8833,C70,C
|
553 |
+
552,0,2,"Sharp, Mr. Percival James R",male,27,0,0,244358,26,,S
|
554 |
+
553,0,3,"O'Brien, Mr. Timothy",male,,0,0,330979,7.8292,,Q
|
555 |
+
554,1,3,"Leeni, Mr. Fahim (""Philip Zenni"")",male,22,0,0,2620,7.225,,C
|
556 |
+
555,1,3,"Ohman, Miss. Velin",female,22,0,0,347085,7.775,,S
|
557 |
+
556,0,1,"Wright, Mr. George",male,62,0,0,113807,26.55,,S
|
558 |
+
557,1,1,"Duff Gordon, Lady. (Lucille Christiana Sutherland) (""Mrs Morgan"")",female,48,1,0,11755,39.6,A16,C
|
559 |
+
558,0,1,"Robbins, Mr. Victor",male,,0,0,PC 17757,227.525,,C
|
560 |
+
559,1,1,"Taussig, Mrs. Emil (Tillie Mandelbaum)",female,39,1,1,110413,79.65,E67,S
|
561 |
+
560,1,3,"de Messemaeker, Mrs. Guillaume Joseph (Emma)",female,36,1,0,345572,17.4,,S
|
562 |
+
561,0,3,"Morrow, Mr. Thomas Rowan",male,,0,0,372622,7.75,,Q
|
563 |
+
562,0,3,"Sivic, Mr. Husein",male,40,0,0,349251,7.8958,,S
|
564 |
+
563,0,2,"Norman, Mr. Robert Douglas",male,28,0,0,218629,13.5,,S
|
565 |
+
564,0,3,"Simmons, Mr. John",male,,0,0,SOTON/OQ 392082,8.05,,S
|
566 |
+
565,0,3,"Meanwell, Miss. (Marion Ogden)",female,,0,0,SOTON/O.Q. 392087,8.05,,S
|
567 |
+
566,0,3,"Davies, Mr. Alfred J",male,24,2,0,A/4 48871,24.15,,S
|
568 |
+
567,0,3,"Stoytcheff, Mr. Ilia",male,19,0,0,349205,7.8958,,S
|
569 |
+
568,0,3,"Palsson, Mrs. Nils (Alma Cornelia Berglund)",female,29,0,4,349909,21.075,,S
|
570 |
+
569,0,3,"Doharr, Mr. Tannous",male,,0,0,2686,7.2292,,C
|
571 |
+
570,1,3,"Jonsson, Mr. Carl",male,32,0,0,350417,7.8542,,S
|
572 |
+
571,1,2,"Harris, Mr. George",male,62,0,0,S.W./PP 752,10.5,,S
|
573 |
+
572,1,1,"Appleton, Mrs. Edward Dale (Charlotte Lamson)",female,53,2,0,11769,51.4792,C101,S
|
574 |
+
573,1,1,"Flynn, Mr. John Irwin (""Irving"")",male,36,0,0,PC 17474,26.3875,E25,S
|
575 |
+
574,1,3,"Kelly, Miss. Mary",female,,0,0,14312,7.75,,Q
|
576 |
+
575,0,3,"Rush, Mr. Alfred George John",male,16,0,0,A/4. 20589,8.05,,S
|
577 |
+
576,0,3,"Patchett, Mr. George",male,19,0,0,358585,14.5,,S
|
578 |
+
577,1,2,"Garside, Miss. Ethel",female,34,0,0,243880,13,,S
|
579 |
+
578,1,1,"Silvey, Mrs. William Baird (Alice Munger)",female,39,1,0,13507,55.9,E44,S
|
580 |
+
579,0,3,"Caram, Mrs. Joseph (Maria Elias)",female,,1,0,2689,14.4583,,C
|
581 |
+
580,1,3,"Jussila, Mr. Eiriik",male,32,0,0,STON/O 2. 3101286,7.925,,S
|
582 |
+
581,1,2,"Christy, Miss. Julie Rachel",female,25,1,1,237789,30,,S
|
583 |
+
582,1,1,"Thayer, Mrs. John Borland (Marian Longstreth Morris)",female,39,1,1,17421,110.8833,C68,C
|
584 |
+
583,0,2,"Downton, Mr. William James",male,54,0,0,28403,26,,S
|
585 |
+
584,0,1,"Ross, Mr. John Hugo",male,36,0,0,13049,40.125,A10,C
|
586 |
+
585,0,3,"Paulner, Mr. Uscher",male,,0,0,3411,8.7125,,C
|
587 |
+
586,1,1,"Taussig, Miss. Ruth",female,18,0,2,110413,79.65,E68,S
|
588 |
+
587,0,2,"Jarvis, Mr. John Denzil",male,47,0,0,237565,15,,S
|
589 |
+
588,1,1,"Frolicher-Stehli, Mr. Maxmillian",male,60,1,1,13567,79.2,B41,C
|
590 |
+
589,0,3,"Gilinski, Mr. Eliezer",male,22,0,0,14973,8.05,,S
|
591 |
+
590,0,3,"Murdlin, Mr. Joseph",male,,0,0,A./5. 3235,8.05,,S
|
592 |
+
591,0,3,"Rintamaki, Mr. Matti",male,35,0,0,STON/O 2. 3101273,7.125,,S
|
593 |
+
592,1,1,"Stephenson, Mrs. Walter Bertram (Martha Eustis)",female,52,1,0,36947,78.2667,D20,C
|
594 |
+
593,0,3,"Elsbury, Mr. William James",male,47,0,0,A/5 3902,7.25,,S
|
595 |
+
594,0,3,"Bourke, Miss. Mary",female,,0,2,364848,7.75,,Q
|
596 |
+
595,0,2,"Chapman, Mr. John Henry",male,37,1,0,SC/AH 29037,26,,S
|
597 |
+
596,0,3,"Van Impe, Mr. Jean Baptiste",male,36,1,1,345773,24.15,,S
|
598 |
+
597,1,2,"Leitch, Miss. Jessie Wills",female,,0,0,248727,33,,S
|
599 |
+
598,0,3,"Johnson, Mr. Alfred",male,49,0,0,LINE,0,,S
|
600 |
+
599,0,3,"Boulos, Mr. Hanna",male,,0,0,2664,7.225,,C
|
601 |
+
600,1,1,"Duff Gordon, Sir. Cosmo Edmund (""Mr Morgan"")",male,49,1,0,PC 17485,56.9292,A20,C
|
602 |
+
601,1,2,"Jacobsohn, Mrs. Sidney Samuel (Amy Frances Christy)",female,24,2,1,243847,27,,S
|
603 |
+
602,0,3,"Slabenoff, Mr. Petco",male,,0,0,349214,7.8958,,S
|
604 |
+
603,0,1,"Harrington, Mr. Charles H",male,,0,0,113796,42.4,,S
|
605 |
+
604,0,3,"Torber, Mr. Ernst William",male,44,0,0,364511,8.05,,S
|
606 |
+
605,1,1,"Homer, Mr. Harry (""Mr E Haven"")",male,35,0,0,111426,26.55,,C
|
607 |
+
606,0,3,"Lindell, Mr. Edvard Bengtsson",male,36,1,0,349910,15.55,,S
|
608 |
+
607,0,3,"Karaic, Mr. Milan",male,30,0,0,349246,7.8958,,S
|
609 |
+
608,1,1,"Daniel, Mr. Robert Williams",male,27,0,0,113804,30.5,,S
|
610 |
+
609,1,2,"Laroche, Mrs. Joseph (Juliette Marie Louise Lafargue)",female,22,1,2,SC/Paris 2123,41.5792,,C
|
611 |
+
610,1,1,"Shutes, Miss. Elizabeth W",female,40,0,0,PC 17582,153.4625,C125,S
|
612 |
+
611,0,3,"Andersson, Mrs. Anders Johan (Alfrida Konstantia Brogren)",female,39,1,5,347082,31.275,,S
|
613 |
+
612,0,3,"Jardin, Mr. Jose Neto",male,,0,0,SOTON/O.Q. 3101305,7.05,,S
|
614 |
+
613,1,3,"Murphy, Miss. Margaret Jane",female,,1,0,367230,15.5,,Q
|
615 |
+
614,0,3,"Horgan, Mr. John",male,,0,0,370377,7.75,,Q
|
616 |
+
615,0,3,"Brocklebank, Mr. William Alfred",male,35,0,0,364512,8.05,,S
|
617 |
+
616,1,2,"Herman, Miss. Alice",female,24,1,2,220845,65,,S
|
618 |
+
617,0,3,"Danbom, Mr. Ernst Gilbert",male,34,1,1,347080,14.4,,S
|
619 |
+
618,0,3,"Lobb, Mrs. William Arthur (Cordelia K Stanlick)",female,26,1,0,A/5. 3336,16.1,,S
|
620 |
+
619,1,2,"Becker, Miss. Marion Louise",female,4,2,1,230136,39,F4,S
|
621 |
+
620,0,2,"Gavey, Mr. Lawrence",male,26,0,0,31028,10.5,,S
|
622 |
+
621,0,3,"Yasbeck, Mr. Antoni",male,27,1,0,2659,14.4542,,C
|
623 |
+
622,1,1,"Kimball, Mr. Edwin Nelson Jr",male,42,1,0,11753,52.5542,D19,S
|
624 |
+
623,1,3,"Nakid, Mr. Sahid",male,20,1,1,2653,15.7417,,C
|
625 |
+
624,0,3,"Hansen, Mr. Henry Damsgaard",male,21,0,0,350029,7.8542,,S
|
626 |
+
625,0,3,"Bowen, Mr. David John ""Dai""",male,21,0,0,54636,16.1,,S
|
627 |
+
626,0,1,"Sutton, Mr. Frederick",male,61,0,0,36963,32.3208,D50,S
|
628 |
+
627,0,2,"Kirkland, Rev. Charles Leonard",male,57,0,0,219533,12.35,,Q
|
629 |
+
628,1,1,"Longley, Miss. Gretchen Fiske",female,21,0,0,13502,77.9583,D9,S
|
630 |
+
629,0,3,"Bostandyeff, Mr. Guentcho",male,26,0,0,349224,7.8958,,S
|
631 |
+
630,0,3,"O'Connell, Mr. Patrick D",male,,0,0,334912,7.7333,,Q
|
632 |
+
631,1,1,"Barkworth, Mr. Algernon Henry Wilson",male,80,0,0,27042,30,A23,S
|
633 |
+
632,0,3,"Lundahl, Mr. Johan Svensson",male,51,0,0,347743,7.0542,,S
|
634 |
+
633,1,1,"Stahelin-Maeglin, Dr. Max",male,32,0,0,13214,30.5,B50,C
|
635 |
+
634,0,1,"Parr, Mr. William Henry Marsh",male,,0,0,112052,0,,S
|
636 |
+
635,0,3,"Skoog, Miss. Mabel",female,9,3,2,347088,27.9,,S
|
637 |
+
636,1,2,"Davis, Miss. Mary",female,28,0,0,237668,13,,S
|
638 |
+
637,0,3,"Leinonen, Mr. Antti Gustaf",male,32,0,0,STON/O 2. 3101292,7.925,,S
|
639 |
+
638,0,2,"Collyer, Mr. Harvey",male,31,1,1,C.A. 31921,26.25,,S
|
640 |
+
639,0,3,"Panula, Mrs. Juha (Maria Emilia Ojala)",female,41,0,5,3101295,39.6875,,S
|
641 |
+
640,0,3,"Thorneycroft, Mr. Percival",male,,1,0,376564,16.1,,S
|
642 |
+
641,0,3,"Jensen, Mr. Hans Peder",male,20,0,0,350050,7.8542,,S
|
643 |
+
642,1,1,"Sagesser, Mlle. Emma",female,24,0,0,PC 17477,69.3,B35,C
|
644 |
+
643,0,3,"Skoog, Miss. Margit Elizabeth",female,2,3,2,347088,27.9,,S
|
645 |
+
644,1,3,"Foo, Mr. Choong",male,,0,0,1601,56.4958,,S
|
646 |
+
645,1,3,"Baclini, Miss. Eugenie",female,0.75,2,1,2666,19.2583,,C
|
647 |
+
646,1,1,"Harper, Mr. Henry Sleeper",male,48,1,0,PC 17572,76.7292,D33,C
|
648 |
+
647,0,3,"Cor, Mr. Liudevit",male,19,0,0,349231,7.8958,,S
|
649 |
+
648,1,1,"Simonius-Blumer, Col. Oberst Alfons",male,56,0,0,13213,35.5,A26,C
|
650 |
+
649,0,3,"Willey, Mr. Edward",male,,0,0,S.O./P.P. 751,7.55,,S
|
651 |
+
650,1,3,"Stanley, Miss. Amy Zillah Elsie",female,23,0,0,CA. 2314,7.55,,S
|
652 |
+
651,0,3,"Mitkoff, Mr. Mito",male,,0,0,349221,7.8958,,S
|
653 |
+
652,1,2,"Doling, Miss. Elsie",female,18,0,1,231919,23,,S
|
654 |
+
653,0,3,"Kalvik, Mr. Johannes Halvorsen",male,21,0,0,8475,8.4333,,S
|
655 |
+
654,1,3,"O'Leary, Miss. Hanora ""Norah""",female,,0,0,330919,7.8292,,Q
|
656 |
+
655,0,3,"Hegarty, Miss. Hanora ""Nora""",female,18,0,0,365226,6.75,,Q
|
657 |
+
656,0,2,"Hickman, Mr. Leonard Mark",male,24,2,0,S.O.C. 14879,73.5,,S
|
658 |
+
657,0,3,"Radeff, Mr. Alexander",male,,0,0,349223,7.8958,,S
|
659 |
+
658,0,3,"Bourke, Mrs. John (Catherine)",female,32,1,1,364849,15.5,,Q
|
660 |
+
659,0,2,"Eitemiller, Mr. George Floyd",male,23,0,0,29751,13,,S
|
661 |
+
660,0,1,"Newell, Mr. Arthur Webster",male,58,0,2,35273,113.275,D48,C
|
662 |
+
661,1,1,"Frauenthal, Dr. Henry William",male,50,2,0,PC 17611,133.65,,S
|
663 |
+
662,0,3,"Badt, Mr. Mohamed",male,40,0,0,2623,7.225,,C
|
664 |
+
663,0,1,"Colley, Mr. Edward Pomeroy",male,47,0,0,5727,25.5875,E58,S
|
665 |
+
664,0,3,"Coleff, Mr. Peju",male,36,0,0,349210,7.4958,,S
|
666 |
+
665,1,3,"Lindqvist, Mr. Eino William",male,20,1,0,STON/O 2. 3101285,7.925,,S
|
667 |
+
666,0,2,"Hickman, Mr. Lewis",male,32,2,0,S.O.C. 14879,73.5,,S
|
668 |
+
667,0,2,"Butler, Mr. Reginald Fenton",male,25,0,0,234686,13,,S
|
669 |
+
668,0,3,"Rommetvedt, Mr. Knud Paust",male,,0,0,312993,7.775,,S
|
670 |
+
669,0,3,"Cook, Mr. Jacob",male,43,0,0,A/5 3536,8.05,,S
|
671 |
+
670,1,1,"Taylor, Mrs. Elmer Zebley (Juliet Cummins Wright)",female,,1,0,19996,52,C126,S
|
672 |
+
671,1,2,"Brown, Mrs. Thomas William Solomon (Elizabeth Catherine Ford)",female,40,1,1,29750,39,,S
|
673 |
+
672,0,1,"Davidson, Mr. Thornton",male,31,1,0,F.C. 12750,52,B71,S
|
674 |
+
673,0,2,"Mitchell, Mr. Henry Michael",male,70,0,0,C.A. 24580,10.5,,S
|
675 |
+
674,1,2,"Wilhelms, Mr. Charles",male,31,0,0,244270,13,,S
|
676 |
+
675,0,2,"Watson, Mr. Ennis Hastings",male,,0,0,239856,0,,S
|
677 |
+
676,0,3,"Edvardsson, Mr. Gustaf Hjalmar",male,18,0,0,349912,7.775,,S
|
678 |
+
677,0,3,"Sawyer, Mr. Frederick Charles",male,24.5,0,0,342826,8.05,,S
|
679 |
+
678,1,3,"Turja, Miss. Anna Sofia",female,18,0,0,4138,9.8417,,S
|
680 |
+
679,0,3,"Goodwin, Mrs. Frederick (Augusta Tyler)",female,43,1,6,CA 2144,46.9,,S
|
681 |
+
680,1,1,"Cardeza, Mr. Thomas Drake Martinez",male,36,0,1,PC 17755,512.3292,B51 B53 B55,C
|
682 |
+
681,0,3,"Peters, Miss. Katie",female,,0,0,330935,8.1375,,Q
|
683 |
+
682,1,1,"Hassab, Mr. Hammad",male,27,0,0,PC 17572,76.7292,D49,C
|
684 |
+
683,0,3,"Olsvigen, Mr. Thor Anderson",male,20,0,0,6563,9.225,,S
|
685 |
+
684,0,3,"Goodwin, Mr. Charles Edward",male,14,5,2,CA 2144,46.9,,S
|
686 |
+
685,0,2,"Brown, Mr. Thomas William Solomon",male,60,1,1,29750,39,,S
|
687 |
+
686,0,2,"Laroche, Mr. Joseph Philippe Lemercier",male,25,1,2,SC/Paris 2123,41.5792,,C
|
688 |
+
687,0,3,"Panula, Mr. Jaako Arnold",male,14,4,1,3101295,39.6875,,S
|
689 |
+
688,0,3,"Dakic, Mr. Branko",male,19,0,0,349228,10.1708,,S
|
690 |
+
689,0,3,"Fischer, Mr. Eberhard Thelander",male,18,0,0,350036,7.7958,,S
|
691 |
+
690,1,1,"Madill, Miss. Georgette Alexandra",female,15,0,1,24160,211.3375,B5,S
|
692 |
+
691,1,1,"Dick, Mr. Albert Adrian",male,31,1,0,17474,57,B20,S
|
693 |
+
692,1,3,"Karun, Miss. Manca",female,4,0,1,349256,13.4167,,C
|
694 |
+
693,1,3,"Lam, Mr. Ali",male,,0,0,1601,56.4958,,S
|
695 |
+
694,0,3,"Saad, Mr. Khalil",male,25,0,0,2672,7.225,,C
|
696 |
+
695,0,1,"Weir, Col. John",male,60,0,0,113800,26.55,,S
|
697 |
+
696,0,2,"Chapman, Mr. Charles Henry",male,52,0,0,248731,13.5,,S
|
698 |
+
697,0,3,"Kelly, Mr. James",male,44,0,0,363592,8.05,,S
|
699 |
+
698,1,3,"Mullens, Miss. Katherine ""Katie""",female,,0,0,35852,7.7333,,Q
|
700 |
+
699,0,1,"Thayer, Mr. John Borland",male,49,1,1,17421,110.8833,C68,C
|
701 |
+
700,0,3,"Humblen, Mr. Adolf Mathias Nicolai Olsen",male,42,0,0,348121,7.65,F G63,S
|
702 |
+
701,1,1,"Astor, Mrs. John Jacob (Madeleine Talmadge Force)",female,18,1,0,PC 17757,227.525,C62 C64,C
|
703 |
+
702,1,1,"Silverthorne, Mr. Spencer Victor",male,35,0,0,PC 17475,26.2875,E24,S
|
704 |
+
703,0,3,"Barbara, Miss. Saiide",female,18,0,1,2691,14.4542,,C
|
705 |
+
704,0,3,"Gallagher, Mr. Martin",male,25,0,0,36864,7.7417,,Q
|
706 |
+
705,0,3,"Hansen, Mr. Henrik Juul",male,26,1,0,350025,7.8542,,S
|
707 |
+
706,0,2,"Morley, Mr. Henry Samuel (""Mr Henry Marshall"")",male,39,0,0,250655,26,,S
|
708 |
+
707,1,2,"Kelly, Mrs. Florence ""Fannie""",female,45,0,0,223596,13.5,,S
|
709 |
+
708,1,1,"Calderhead, Mr. Edward Pennington",male,42,0,0,PC 17476,26.2875,E24,S
|
710 |
+
709,1,1,"Cleaver, Miss. Alice",female,22,0,0,113781,151.55,,S
|
711 |
+
710,1,3,"Moubarek, Master. Halim Gonios (""William George"")",male,,1,1,2661,15.2458,,C
|
712 |
+
711,1,1,"Mayne, Mlle. Berthe Antonine (""Mrs de Villiers"")",female,24,0,0,PC 17482,49.5042,C90,C
|
713 |
+
712,0,1,"Klaber, Mr. Herman",male,,0,0,113028,26.55,C124,S
|
714 |
+
713,1,1,"Taylor, Mr. Elmer Zebley",male,48,1,0,19996,52,C126,S
|
715 |
+
714,0,3,"Larsson, Mr. August Viktor",male,29,0,0,7545,9.4833,,S
|
716 |
+
715,0,2,"Greenberg, Mr. Samuel",male,52,0,0,250647,13,,S
|
717 |
+
716,0,3,"Soholt, Mr. Peter Andreas Lauritz Andersen",male,19,0,0,348124,7.65,F G73,S
|
718 |
+
717,1,1,"Endres, Miss. Caroline Louise",female,38,0,0,PC 17757,227.525,C45,C
|
719 |
+
718,1,2,"Troutt, Miss. Edwina Celia ""Winnie""",female,27,0,0,34218,10.5,E101,S
|
720 |
+
719,0,3,"McEvoy, Mr. Michael",male,,0,0,36568,15.5,,Q
|
721 |
+
720,0,3,"Johnson, Mr. Malkolm Joackim",male,33,0,0,347062,7.775,,S
|
722 |
+
721,1,2,"Harper, Miss. Annie Jessie ""Nina""",female,6,0,1,248727,33,,S
|
723 |
+
722,0,3,"Jensen, Mr. Svend Lauritz",male,17,1,0,350048,7.0542,,S
|
724 |
+
723,0,2,"Gillespie, Mr. William Henry",male,34,0,0,12233,13,,S
|
725 |
+
724,0,2,"Hodges, Mr. Henry Price",male,50,0,0,250643,13,,S
|
726 |
+
725,1,1,"Chambers, Mr. Norman Campbell",male,27,1,0,113806,53.1,E8,S
|
727 |
+
726,0,3,"Oreskovic, Mr. Luka",male,20,0,0,315094,8.6625,,S
|
728 |
+
727,1,2,"Renouf, Mrs. Peter Henry (Lillian Jefferys)",female,30,3,0,31027,21,,S
|
729 |
+
728,1,3,"Mannion, Miss. Margareth",female,,0,0,36866,7.7375,,Q
|
730 |
+
729,0,2,"Bryhl, Mr. Kurt Arnold Gottfrid",male,25,1,0,236853,26,,S
|
731 |
+
730,0,3,"Ilmakangas, Miss. Pieta Sofia",female,25,1,0,STON/O2. 3101271,7.925,,S
|
732 |
+
731,1,1,"Allen, Miss. Elisabeth Walton",female,29,0,0,24160,211.3375,B5,S
|
733 |
+
732,0,3,"Hassan, Mr. Houssein G N",male,11,0,0,2699,18.7875,,C
|
734 |
+
733,0,2,"Knight, Mr. Robert J",male,,0,0,239855,0,,S
|
735 |
+
734,0,2,"Berriman, Mr. William John",male,23,0,0,28425,13,,S
|
736 |
+
735,0,2,"Troupiansky, Mr. Moses Aaron",male,23,0,0,233639,13,,S
|
737 |
+
736,0,3,"Williams, Mr. Leslie",male,28.5,0,0,54636,16.1,,S
|
738 |
+
737,0,3,"Ford, Mrs. Edward (Margaret Ann Watson)",female,48,1,3,W./C. 6608,34.375,,S
|
739 |
+
738,1,1,"Lesurer, Mr. Gustave J",male,35,0,0,PC 17755,512.3292,B101,C
|
740 |
+
739,0,3,"Ivanoff, Mr. Kanio",male,,0,0,349201,7.8958,,S
|
741 |
+
740,0,3,"Nankoff, Mr. Minko",male,,0,0,349218,7.8958,,S
|
742 |
+
741,1,1,"Hawksford, Mr. Walter James",male,,0,0,16988,30,D45,S
|
743 |
+
742,0,1,"Cavendish, Mr. Tyrell William",male,36,1,0,19877,78.85,C46,S
|
744 |
+
743,1,1,"Ryerson, Miss. Susan Parker ""Suzette""",female,21,2,2,PC 17608,262.375,B57 B59 B63 B66,C
|
745 |
+
744,0,3,"McNamee, Mr. Neal",male,24,1,0,376566,16.1,,S
|
746 |
+
745,1,3,"Stranden, Mr. Juho",male,31,0,0,STON/O 2. 3101288,7.925,,S
|
747 |
+
746,0,1,"Crosby, Capt. Edward Gifford",male,70,1,1,WE/P 5735,71,B22,S
|
748 |
+
747,0,3,"Abbott, Mr. Rossmore Edward",male,16,1,1,C.A. 2673,20.25,,S
|
749 |
+
748,1,2,"Sinkkonen, Miss. Anna",female,30,0,0,250648,13,,S
|
750 |
+
749,0,1,"Marvin, Mr. Daniel Warner",male,19,1,0,113773,53.1,D30,S
|
751 |
+
750,0,3,"Connaghton, Mr. Michael",male,31,0,0,335097,7.75,,Q
|
752 |
+
751,1,2,"Wells, Miss. Joan",female,4,1,1,29103,23,,S
|
753 |
+
752,1,3,"Moor, Master. Meier",male,6,0,1,392096,12.475,E121,S
|
754 |
+
753,0,3,"Vande Velde, Mr. Johannes Joseph",male,33,0,0,345780,9.5,,S
|
755 |
+
754,0,3,"Jonkoff, Mr. Lalio",male,23,0,0,349204,7.8958,,S
|
756 |
+
755,1,2,"Herman, Mrs. Samuel (Jane Laver)",female,48,1,2,220845,65,,S
|
757 |
+
756,1,2,"Hamalainen, Master. Viljo",male,0.67,1,1,250649,14.5,,S
|
758 |
+
757,0,3,"Carlsson, Mr. August Sigfrid",male,28,0,0,350042,7.7958,,S
|
759 |
+
758,0,2,"Bailey, Mr. Percy Andrew",male,18,0,0,29108,11.5,,S
|
760 |
+
759,0,3,"Theobald, Mr. Thomas Leonard",male,34,0,0,363294,8.05,,S
|
761 |
+
760,1,1,"Rothes, the Countess. of (Lucy Noel Martha Dyer-Edwards)",female,33,0,0,110152,86.5,B77,S
|
762 |
+
761,0,3,"Garfirth, Mr. John",male,,0,0,358585,14.5,,S
|
763 |
+
762,0,3,"Nirva, Mr. Iisakki Antino Aijo",male,41,0,0,SOTON/O2 3101272,7.125,,S
|
764 |
+
763,1,3,"Barah, Mr. Hanna Assi",male,20,0,0,2663,7.2292,,C
|
765 |
+
764,1,1,"Carter, Mrs. William Ernest (Lucile Polk)",female,36,1,2,113760,120,B96 B98,S
|
766 |
+
765,0,3,"Eklund, Mr. Hans Linus",male,16,0,0,347074,7.775,,S
|
767 |
+
766,1,1,"Hogeboom, Mrs. John C (Anna Andrews)",female,51,1,0,13502,77.9583,D11,S
|
768 |
+
767,0,1,"Brewe, Dr. Arthur Jackson",male,,0,0,112379,39.6,,C
|
769 |
+
768,0,3,"Mangan, Miss. Mary",female,30.5,0,0,364850,7.75,,Q
|
770 |
+
769,0,3,"Moran, Mr. Daniel J",male,,1,0,371110,24.15,,Q
|
771 |
+
770,0,3,"Gronnestad, Mr. Daniel Danielsen",male,32,0,0,8471,8.3625,,S
|
772 |
+
771,0,3,"Lievens, Mr. Rene Aime",male,24,0,0,345781,9.5,,S
|
773 |
+
772,0,3,"Jensen, Mr. Niels Peder",male,48,0,0,350047,7.8542,,S
|
774 |
+
773,0,2,"Mack, Mrs. (Mary)",female,57,0,0,S.O./P.P. 3,10.5,E77,S
|
775 |
+
774,0,3,"Elias, Mr. Dibo",male,,0,0,2674,7.225,,C
|
776 |
+
775,1,2,"Hocking, Mrs. Elizabeth (Eliza Needs)",female,54,1,3,29105,23,,S
|
777 |
+
776,0,3,"Myhrman, Mr. Pehr Fabian Oliver Malkolm",male,18,0,0,347078,7.75,,S
|
778 |
+
777,0,3,"Tobin, Mr. Roger",male,,0,0,383121,7.75,F38,Q
|
779 |
+
778,1,3,"Emanuel, Miss. Virginia Ethel",female,5,0,0,364516,12.475,,S
|
780 |
+
779,0,3,"Kilgannon, Mr. Thomas J",male,,0,0,36865,7.7375,,Q
|
781 |
+
780,1,1,"Robert, Mrs. Edward Scott (Elisabeth Walton McMillan)",female,43,0,1,24160,211.3375,B3,S
|
782 |
+
781,1,3,"Ayoub, Miss. Banoura",female,13,0,0,2687,7.2292,,C
|
783 |
+
782,1,1,"Dick, Mrs. Albert Adrian (Vera Gillespie)",female,17,1,0,17474,57,B20,S
|
784 |
+
783,0,1,"Long, Mr. Milton Clyde",male,29,0,0,113501,30,D6,S
|
785 |
+
784,0,3,"Johnston, Mr. Andrew G",male,,1,2,W./C. 6607,23.45,,S
|
786 |
+
785,0,3,"Ali, Mr. William",male,25,0,0,SOTON/O.Q. 3101312,7.05,,S
|
787 |
+
786,0,3,"Harmer, Mr. Abraham (David Lishin)",male,25,0,0,374887,7.25,,S
|
788 |
+
787,1,3,"Sjoblom, Miss. Anna Sofia",female,18,0,0,3101265,7.4958,,S
|
789 |
+
788,0,3,"Rice, Master. George Hugh",male,8,4,1,382652,29.125,,Q
|
790 |
+
789,1,3,"Dean, Master. Bertram Vere",male,1,1,2,C.A. 2315,20.575,,S
|
791 |
+
790,0,1,"Guggenheim, Mr. Benjamin",male,46,0,0,PC 17593,79.2,B82 B84,C
|
792 |
+
791,0,3,"Keane, Mr. Andrew ""Andy""",male,,0,0,12460,7.75,,Q
|
793 |
+
792,0,2,"Gaskell, Mr. Alfred",male,16,0,0,239865,26,,S
|
794 |
+
793,0,3,"Sage, Miss. Stella Anna",female,,8,2,CA. 2343,69.55,,S
|
795 |
+
794,0,1,"Hoyt, Mr. William Fisher",male,,0,0,PC 17600,30.6958,,C
|
796 |
+
795,0,3,"Dantcheff, Mr. Ristiu",male,25,0,0,349203,7.8958,,S
|
797 |
+
796,0,2,"Otter, Mr. Richard",male,39,0,0,28213,13,,S
|
798 |
+
797,1,1,"Leader, Dr. Alice (Farnham)",female,49,0,0,17465,25.9292,D17,S
|
799 |
+
798,1,3,"Osman, Mrs. Mara",female,31,0,0,349244,8.6833,,S
|
800 |
+
799,0,3,"Ibrahim Shawah, Mr. Yousseff",male,30,0,0,2685,7.2292,,C
|
801 |
+
800,0,3,"Van Impe, Mrs. Jean Baptiste (Rosalie Paula Govaert)",female,30,1,1,345773,24.15,,S
|
802 |
+
801,0,2,"Ponesell, Mr. Martin",male,34,0,0,250647,13,,S
|
803 |
+
802,1,2,"Collyer, Mrs. Harvey (Charlotte Annie Tate)",female,31,1,1,C.A. 31921,26.25,,S
|
804 |
+
803,1,1,"Carter, Master. William Thornton II",male,11,1,2,113760,120,B96 B98,S
|
805 |
+
804,1,3,"Thomas, Master. Assad Alexander",male,0.42,0,1,2625,8.5167,,C
|
806 |
+
805,1,3,"Hedman, Mr. Oskar Arvid",male,27,0,0,347089,6.975,,S
|
807 |
+
806,0,3,"Johansson, Mr. Karl Johan",male,31,0,0,347063,7.775,,S
|
808 |
+
807,0,1,"Andrews, Mr. Thomas Jr",male,39,0,0,112050,0,A36,S
|
809 |
+
808,0,3,"Pettersson, Miss. Ellen Natalia",female,18,0,0,347087,7.775,,S
|
810 |
+
809,0,2,"Meyer, Mr. August",male,39,0,0,248723,13,,S
|
811 |
+
810,1,1,"Chambers, Mrs. Norman Campbell (Bertha Griggs)",female,33,1,0,113806,53.1,E8,S
|
812 |
+
811,0,3,"Alexander, Mr. William",male,26,0,0,3474,7.8875,,S
|
813 |
+
812,0,3,"Lester, Mr. James",male,39,0,0,A/4 48871,24.15,,S
|
814 |
+
813,0,2,"Slemen, Mr. Richard James",male,35,0,0,28206,10.5,,S
|
815 |
+
814,0,3,"Andersson, Miss. Ebba Iris Alfrida",female,6,4,2,347082,31.275,,S
|
816 |
+
815,0,3,"Tomlin, Mr. Ernest Portage",male,30.5,0,0,364499,8.05,,S
|
817 |
+
816,0,1,"Fry, Mr. Richard",male,,0,0,112058,0,B102,S
|
818 |
+
817,0,3,"Heininen, Miss. Wendla Maria",female,23,0,0,STON/O2. 3101290,7.925,,S
|
819 |
+
818,0,2,"Mallet, Mr. Albert",male,31,1,1,S.C./PARIS 2079,37.0042,,C
|
820 |
+
819,0,3,"Holm, Mr. John Fredrik Alexander",male,43,0,0,C 7075,6.45,,S
|
821 |
+
820,0,3,"Skoog, Master. Karl Thorsten",male,10,3,2,347088,27.9,,S
|
822 |
+
821,1,1,"Hays, Mrs. Charles Melville (Clara Jennings Gregg)",female,52,1,1,12749,93.5,B69,S
|
823 |
+
822,1,3,"Lulic, Mr. Nikola",male,27,0,0,315098,8.6625,,S
|
824 |
+
823,0,1,"Reuchlin, Jonkheer. John George",male,38,0,0,19972,0,,S
|
825 |
+
824,1,3,"Moor, Mrs. (Beila)",female,27,0,1,392096,12.475,E121,S
|
826 |
+
825,0,3,"Panula, Master. Urho Abraham",male,2,4,1,3101295,39.6875,,S
|
827 |
+
826,0,3,"Flynn, Mr. John",male,,0,0,368323,6.95,,Q
|
828 |
+
827,0,3,"Lam, Mr. Len",male,,0,0,1601,56.4958,,S
|
829 |
+
828,1,2,"Mallet, Master. Andre",male,1,0,2,S.C./PARIS 2079,37.0042,,C
|
830 |
+
829,1,3,"McCormack, Mr. Thomas Joseph",male,,0,0,367228,7.75,,Q
|
831 |
+
830,1,1,"Stone, Mrs. George Nelson (Martha Evelyn)",female,62,0,0,113572,80,B28,
|
832 |
+
831,1,3,"Yasbeck, Mrs. Antoni (Selini Alexander)",female,15,1,0,2659,14.4542,,C
|
833 |
+
832,1,2,"Richards, Master. George Sibley",male,0.83,1,1,29106,18.75,,S
|
834 |
+
833,0,3,"Saad, Mr. Amin",male,,0,0,2671,7.2292,,C
|
835 |
+
834,0,3,"Augustsson, Mr. Albert",male,23,0,0,347468,7.8542,,S
|
836 |
+
835,0,3,"Allum, Mr. Owen George",male,18,0,0,2223,8.3,,S
|
837 |
+
836,1,1,"Compton, Miss. Sara Rebecca",female,39,1,1,PC 17756,83.1583,E49,C
|
838 |
+
837,0,3,"Pasic, Mr. Jakob",male,21,0,0,315097,8.6625,,S
|
839 |
+
838,0,3,"Sirota, Mr. Maurice",male,,0,0,392092,8.05,,S
|
840 |
+
839,1,3,"Chip, Mr. Chang",male,32,0,0,1601,56.4958,,S
|
841 |
+
840,1,1,"Marechal, Mr. Pierre",male,,0,0,11774,29.7,C47,C
|
842 |
+
841,0,3,"Alhomaki, Mr. Ilmari Rudolf",male,20,0,0,SOTON/O2 3101287,7.925,,S
|
843 |
+
842,0,2,"Mudd, Mr. Thomas Charles",male,16,0,0,S.O./P.P. 3,10.5,,S
|
844 |
+
843,1,1,"Serepeca, Miss. Augusta",female,30,0,0,113798,31,,C
|
845 |
+
844,0,3,"Lemberopolous, Mr. Peter L",male,34.5,0,0,2683,6.4375,,C
|
846 |
+
845,0,3,"Culumovic, Mr. Jeso",male,17,0,0,315090,8.6625,,S
|
847 |
+
846,0,3,"Abbing, Mr. Anthony",male,42,0,0,C.A. 5547,7.55,,S
|
848 |
+
847,0,3,"Sage, Mr. Douglas Bullen",male,,8,2,CA. 2343,69.55,,S
|
849 |
+
848,0,3,"Markoff, Mr. Marin",male,35,0,0,349213,7.8958,,C
|
850 |
+
849,0,2,"Harper, Rev. John",male,28,0,1,248727,33,,S
|
851 |
+
850,1,1,"Goldenberg, Mrs. Samuel L (Edwiga Grabowska)",female,,1,0,17453,89.1042,C92,C
|
852 |
+
851,0,3,"Andersson, Master. Sigvard Harald Elias",male,4,4,2,347082,31.275,,S
|
853 |
+
852,0,3,"Svensson, Mr. Johan",male,74,0,0,347060,7.775,,S
|
854 |
+
853,0,3,"Boulos, Miss. Nourelain",female,9,1,1,2678,15.2458,,C
|
855 |
+
854,1,1,"Lines, Miss. Mary Conover",female,16,0,1,PC 17592,39.4,D28,S
|
856 |
+
855,0,2,"Carter, Mrs. Ernest Courtenay (Lilian Hughes)",female,44,1,0,244252,26,,S
|
857 |
+
856,1,3,"Aks, Mrs. Sam (Leah Rosen)",female,18,0,1,392091,9.35,,S
|
858 |
+
857,1,1,"Wick, Mrs. George Dennick (Mary Hitchcock)",female,45,1,1,36928,164.8667,,S
|
859 |
+
858,1,1,"Daly, Mr. Peter Denis ",male,51,0,0,113055,26.55,E17,S
|
860 |
+
859,1,3,"Baclini, Mrs. Solomon (Latifa Qurban)",female,24,0,3,2666,19.2583,,C
|
861 |
+
860,0,3,"Razi, Mr. Raihed",male,,0,0,2629,7.2292,,C
|
862 |
+
861,0,3,"Hansen, Mr. Claus Peter",male,41,2,0,350026,14.1083,,S
|
863 |
+
862,0,2,"Giles, Mr. Frederick Edward",male,21,1,0,28134,11.5,,S
|
864 |
+
863,1,1,"Swift, Mrs. Frederick Joel (Margaret Welles Barron)",female,48,0,0,17466,25.9292,D17,S
|
865 |
+
864,0,3,"Sage, Miss. Dorothy Edith ""Dolly""",female,,8,2,CA. 2343,69.55,,S
|
866 |
+
865,0,2,"Gill, Mr. John William",male,24,0,0,233866,13,,S
|
867 |
+
866,1,2,"Bystrom, Mrs. (Karolina)",female,42,0,0,236852,13,,S
|
868 |
+
867,1,2,"Duran y More, Miss. Asuncion",female,27,1,0,SC/PARIS 2149,13.8583,,C
|
869 |
+
868,0,1,"Roebling, Mr. Washington Augustus II",male,31,0,0,PC 17590,50.4958,A24,S
|
870 |
+
869,0,3,"van Melkebeke, Mr. Philemon",male,,0,0,345777,9.5,,S
|
871 |
+
870,1,3,"Johnson, Master. Harold Theodor",male,4,1,1,347742,11.1333,,S
|
872 |
+
871,0,3,"Balkic, Mr. Cerin",male,26,0,0,349248,7.8958,,S
|
873 |
+
872,1,1,"Beckwith, Mrs. Richard Leonard (Sallie Monypeny)",female,47,1,1,11751,52.5542,D35,S
|
874 |
+
873,0,1,"Carlsson, Mr. Frans Olof",male,33,0,0,695,5,B51 B53 B55,S
|
875 |
+
874,0,3,"Vander Cruyssen, Mr. Victor",male,47,0,0,345765,9,,S
|
876 |
+
875,1,2,"Abelson, Mrs. Samuel (Hannah Wizosky)",female,28,1,0,P/PP 3381,24,,C
|
877 |
+
876,1,3,"Najib, Miss. Adele Kiamie ""Jane""",female,15,0,0,2667,7.225,,C
|
878 |
+
877,0,3,"Gustafsson, Mr. Alfred Ossian",male,20,0,0,7534,9.8458,,S
|
879 |
+
878,0,3,"Petroff, Mr. Nedelio",male,19,0,0,349212,7.8958,,S
|
880 |
+
879,0,3,"Laleff, Mr. Kristo",male,,0,0,349217,7.8958,,S
|
881 |
+
880,1,1,"Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)",female,56,0,1,11767,83.1583,C50,C
|
882 |
+
881,1,2,"Shelley, Mrs. William (Imanita Parrish Hall)",female,25,0,1,230433,26,,S
|
883 |
+
882,0,3,"Markun, Mr. Johann",male,33,0,0,349257,7.8958,,S
|
884 |
+
883,0,3,"Dahlberg, Miss. Gerda Ulrika",female,22,0,0,7552,10.5167,,S
|
885 |
+
884,0,2,"Banfield, Mr. Frederick James",male,28,0,0,C.A./SOTON 34068,10.5,,S
|
886 |
+
885,0,3,"Sutehall, Mr. Henry Jr",male,25,0,0,SOTON/OQ 392076,7.05,,S
|
887 |
+
886,0,3,"Rice, Mrs. William (Margaret Norton)",female,39,0,5,382652,29.125,,Q
|
888 |
+
887,0,2,"Montvila, Rev. Juozas",male,27,0,0,211536,13,,S
|
889 |
+
888,1,1,"Graham, Miss. Margaret Edith",female,19,0,0,112053,30,B42,S
|
890 |
+
889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.45,,S
|
891 |
+
890,1,1,"Behr, Mr. Karl Howell",male,26,0,0,111369,30,C148,C
|
892 |
+
891,0,3,"Dooley, Mr. Patrick",male,32,0,0,370376,7.75,,Q
|
Data Analitics/Week 4/.ipynb_checkpoints/TU257-Lab3-1-DataExploration-checkpoint.ipynb
ADDED
The diff for this file is too large to render.
See raw diff
|
|
Data Analitics/Week 4/.ipynb_checkpoints/TU257-Lab3-2-Data-Transformations-checkpoint.ipynb
ADDED
The diff for this file is too large to render.
See raw diff
|
|
Data Analitics/Week 4/.ipynb_checkpoints/TU257-Lab3-3-Scaling-Data-checkpoint.ipynb
ADDED
@@ -0,0 +1,1996 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"cells": [
|
3 |
+
{
|
4 |
+
"cell_type": "markdown",
|
5 |
+
"metadata": {},
|
6 |
+
"source": [
|
7 |
+
"### In this demo and lab exercise we will look at scaling numerical data\n",
|
8 |
+
"### The examples given below illustrate using a min-max scaler\n",
|
9 |
+
"\n",
|
10 |
+
"\n",
|
11 |
+
"#### Let's beging with importing the data set\n",
|
12 |
+
"#### Download this from webpage"
|
13 |
+
]
|
14 |
+
},
|
15 |
+
{
|
16 |
+
"cell_type": "code",
|
17 |
+
"execution_count": 1,
|
18 |
+
"metadata": {},
|
19 |
+
"outputs": [],
|
20 |
+
"source": [
|
21 |
+
"import pandas as pd\n",
|
22 |
+
"import numpy as np \n",
|
23 |
+
"import seaborn as sns\n",
|
24 |
+
"import matplotlib.pyplot as plt\n",
|
25 |
+
"\n",
|
26 |
+
"from sklearn.preprocessing import MinMaxScaler\n"
|
27 |
+
]
|
28 |
+
},
|
29 |
+
{
|
30 |
+
"cell_type": "code",
|
31 |
+
"execution_count": 2,
|
32 |
+
"metadata": {},
|
33 |
+
"outputs": [],
|
34 |
+
"source": [
|
35 |
+
"#read in the data set\n",
|
36 |
+
"#NB. you will need to edit this command to change it to the directory you are using\n",
|
37 |
+
"data = pd.read_csv('/Users/brendan.tierney/Dropbox/4-Datasets/small_purchases.csv', )"
|
38 |
+
]
|
39 |
+
},
|
40 |
+
{
|
41 |
+
"cell_type": "code",
|
42 |
+
"execution_count": 3,
|
43 |
+
"metadata": {},
|
44 |
+
"outputs": [
|
45 |
+
{
|
46 |
+
"name": "stdout",
|
47 |
+
"output_type": "stream",
|
48 |
+
"text": [
|
49 |
+
"<class 'pandas.core.frame.DataFrame'>\n",
|
50 |
+
"RangeIndex: 10 entries, 0 to 9\n",
|
51 |
+
"Data columns (total 4 columns):\n",
|
52 |
+
" # Column Non-Null Count Dtype \n",
|
53 |
+
"--- ------ -------------- ----- \n",
|
54 |
+
" 0 Country 10 non-null object \n",
|
55 |
+
" 1 Age 9 non-null float64\n",
|
56 |
+
" 2 Salary 9 non-null float64\n",
|
57 |
+
" 3 Purchased 10 non-null object \n",
|
58 |
+
"dtypes: float64(2), object(2)\n",
|
59 |
+
"memory usage: 448.0+ bytes\n"
|
60 |
+
]
|
61 |
+
}
|
62 |
+
],
|
63 |
+
"source": [
|
64 |
+
"#Display basic information about the Pandas dataframe\n",
|
65 |
+
"data.info()"
|
66 |
+
]
|
67 |
+
},
|
68 |
+
{
|
69 |
+
"cell_type": "code",
|
70 |
+
"execution_count": 4,
|
71 |
+
"metadata": {},
|
72 |
+
"outputs": [
|
73 |
+
{
|
74 |
+
"data": {
|
75 |
+
"text/plain": [
|
76 |
+
"(10, 4)"
|
77 |
+
]
|
78 |
+
},
|
79 |
+
"execution_count": 4,
|
80 |
+
"metadata": {},
|
81 |
+
"output_type": "execute_result"
|
82 |
+
}
|
83 |
+
],
|
84 |
+
"source": [
|
85 |
+
"#How many rows and columns does the dataframe have?\n",
|
86 |
+
"data.shape"
|
87 |
+
]
|
88 |
+
},
|
89 |
+
{
|
90 |
+
"cell_type": "code",
|
91 |
+
"execution_count": 5,
|
92 |
+
"metadata": {},
|
93 |
+
"outputs": [
|
94 |
+
{
|
95 |
+
"data": {
|
96 |
+
"text/html": [
|
97 |
+
"<div>\n",
|
98 |
+
"<style scoped>\n",
|
99 |
+
" .dataframe tbody tr th:only-of-type {\n",
|
100 |
+
" vertical-align: middle;\n",
|
101 |
+
" }\n",
|
102 |
+
"\n",
|
103 |
+
" .dataframe tbody tr th {\n",
|
104 |
+
" vertical-align: top;\n",
|
105 |
+
" }\n",
|
106 |
+
"\n",
|
107 |
+
" .dataframe thead th {\n",
|
108 |
+
" text-align: right;\n",
|
109 |
+
" }\n",
|
110 |
+
"</style>\n",
|
111 |
+
"<table border=\"1\" class=\"dataframe\">\n",
|
112 |
+
" <thead>\n",
|
113 |
+
" <tr style=\"text-align: right;\">\n",
|
114 |
+
" <th></th>\n",
|
115 |
+
" <th>Country</th>\n",
|
116 |
+
" <th>Age</th>\n",
|
117 |
+
" <th>Salary</th>\n",
|
118 |
+
" <th>Purchased</th>\n",
|
119 |
+
" </tr>\n",
|
120 |
+
" </thead>\n",
|
121 |
+
" <tbody>\n",
|
122 |
+
" <tr>\n",
|
123 |
+
" <th>0</th>\n",
|
124 |
+
" <td>France</td>\n",
|
125 |
+
" <td>44.0</td>\n",
|
126 |
+
" <td>27000.0</td>\n",
|
127 |
+
" <td>No</td>\n",
|
128 |
+
" </tr>\n",
|
129 |
+
" <tr>\n",
|
130 |
+
" <th>1</th>\n",
|
131 |
+
" <td>Spain</td>\n",
|
132 |
+
" <td>27.0</td>\n",
|
133 |
+
" <td>48000.0</td>\n",
|
134 |
+
" <td>Yes</td>\n",
|
135 |
+
" </tr>\n",
|
136 |
+
" <tr>\n",
|
137 |
+
" <th>2</th>\n",
|
138 |
+
" <td>Germany</td>\n",
|
139 |
+
" <td>30.0</td>\n",
|
140 |
+
" <td>54000.0</td>\n",
|
141 |
+
" <td>No</td>\n",
|
142 |
+
" </tr>\n",
|
143 |
+
" <tr>\n",
|
144 |
+
" <th>3</th>\n",
|
145 |
+
" <td>Spain</td>\n",
|
146 |
+
" <td>38.0</td>\n",
|
147 |
+
" <td>61000.0</td>\n",
|
148 |
+
" <td>No</td>\n",
|
149 |
+
" </tr>\n",
|
150 |
+
" <tr>\n",
|
151 |
+
" <th>4</th>\n",
|
152 |
+
" <td>Germany</td>\n",
|
153 |
+
" <td>40.0</td>\n",
|
154 |
+
" <td>NaN</td>\n",
|
155 |
+
" <td>Yes</td>\n",
|
156 |
+
" </tr>\n",
|
157 |
+
" <tr>\n",
|
158 |
+
" <th>5</th>\n",
|
159 |
+
" <td>France</td>\n",
|
160 |
+
" <td>35.0</td>\n",
|
161 |
+
" <td>58000.0</td>\n",
|
162 |
+
" <td>Yes</td>\n",
|
163 |
+
" </tr>\n",
|
164 |
+
" <tr>\n",
|
165 |
+
" <th>6</th>\n",
|
166 |
+
" <td>Spain</td>\n",
|
167 |
+
" <td>NaN</td>\n",
|
168 |
+
" <td>52000.0</td>\n",
|
169 |
+
" <td>No</td>\n",
|
170 |
+
" </tr>\n",
|
171 |
+
" <tr>\n",
|
172 |
+
" <th>7</th>\n",
|
173 |
+
" <td>France</td>\n",
|
174 |
+
" <td>48.0</td>\n",
|
175 |
+
" <td>79000.0</td>\n",
|
176 |
+
" <td>Yes</td>\n",
|
177 |
+
" </tr>\n",
|
178 |
+
" <tr>\n",
|
179 |
+
" <th>8</th>\n",
|
180 |
+
" <td>Germany</td>\n",
|
181 |
+
" <td>50.0</td>\n",
|
182 |
+
" <td>83000.0</td>\n",
|
183 |
+
" <td>No</td>\n",
|
184 |
+
" </tr>\n",
|
185 |
+
" <tr>\n",
|
186 |
+
" <th>9</th>\n",
|
187 |
+
" <td>France</td>\n",
|
188 |
+
" <td>37.0</td>\n",
|
189 |
+
" <td>67000.0</td>\n",
|
190 |
+
" <td>Yes</td>\n",
|
191 |
+
" </tr>\n",
|
192 |
+
" </tbody>\n",
|
193 |
+
"</table>\n",
|
194 |
+
"</div>"
|
195 |
+
],
|
196 |
+
"text/plain": [
|
197 |
+
" Country Age Salary Purchased\n",
|
198 |
+
"0 France 44.0 27000.0 No\n",
|
199 |
+
"1 Spain 27.0 48000.0 Yes\n",
|
200 |
+
"2 Germany 30.0 54000.0 No\n",
|
201 |
+
"3 Spain 38.0 61000.0 No\n",
|
202 |
+
"4 Germany 40.0 NaN Yes\n",
|
203 |
+
"5 France 35.0 58000.0 Yes\n",
|
204 |
+
"6 Spain NaN 52000.0 No\n",
|
205 |
+
"7 France 48.0 79000.0 Yes\n",
|
206 |
+
"8 Germany 50.0 83000.0 No\n",
|
207 |
+
"9 France 37.0 67000.0 Yes"
|
208 |
+
]
|
209 |
+
},
|
210 |
+
"execution_count": 5,
|
211 |
+
"metadata": {},
|
212 |
+
"output_type": "execute_result"
|
213 |
+
}
|
214 |
+
],
|
215 |
+
"source": [
|
216 |
+
"#Display the first 10 rows\n",
|
217 |
+
"#Question: How many rows does the dataframe contain\n",
|
218 |
+
"#Question: Modify the code to display all the data\n",
|
219 |
+
"\n",
|
220 |
+
"data.head(10)"
|
221 |
+
]
|
222 |
+
},
|
223 |
+
{
|
224 |
+
"cell_type": "code",
|
225 |
+
"execution_count": 6,
|
226 |
+
"metadata": {},
|
227 |
+
"outputs": [
|
228 |
+
{
|
229 |
+
"data": {
|
230 |
+
"text/html": [
|
231 |
+
"<div>\n",
|
232 |
+
"<style scoped>\n",
|
233 |
+
" .dataframe tbody tr th:only-of-type {\n",
|
234 |
+
" vertical-align: middle;\n",
|
235 |
+
" }\n",
|
236 |
+
"\n",
|
237 |
+
" .dataframe tbody tr th {\n",
|
238 |
+
" vertical-align: top;\n",
|
239 |
+
" }\n",
|
240 |
+
"\n",
|
241 |
+
" .dataframe thead th {\n",
|
242 |
+
" text-align: right;\n",
|
243 |
+
" }\n",
|
244 |
+
"</style>\n",
|
245 |
+
"<table border=\"1\" class=\"dataframe\">\n",
|
246 |
+
" <thead>\n",
|
247 |
+
" <tr style=\"text-align: right;\">\n",
|
248 |
+
" <th></th>\n",
|
249 |
+
" <th>Age</th>\n",
|
250 |
+
" <th>Salary</th>\n",
|
251 |
+
" </tr>\n",
|
252 |
+
" </thead>\n",
|
253 |
+
" <tbody>\n",
|
254 |
+
" <tr>\n",
|
255 |
+
" <th>count</th>\n",
|
256 |
+
" <td>9.000000</td>\n",
|
257 |
+
" <td>9.000000</td>\n",
|
258 |
+
" </tr>\n",
|
259 |
+
" <tr>\n",
|
260 |
+
" <th>mean</th>\n",
|
261 |
+
" <td>38.777778</td>\n",
|
262 |
+
" <td>58777.777778</td>\n",
|
263 |
+
" </tr>\n",
|
264 |
+
" <tr>\n",
|
265 |
+
" <th>std</th>\n",
|
266 |
+
" <td>7.693793</td>\n",
|
267 |
+
" <td>16820.952543</td>\n",
|
268 |
+
" </tr>\n",
|
269 |
+
" <tr>\n",
|
270 |
+
" <th>min</th>\n",
|
271 |
+
" <td>27.000000</td>\n",
|
272 |
+
" <td>27000.000000</td>\n",
|
273 |
+
" </tr>\n",
|
274 |
+
" <tr>\n",
|
275 |
+
" <th>25%</th>\n",
|
276 |
+
" <td>35.000000</td>\n",
|
277 |
+
" <td>52000.000000</td>\n",
|
278 |
+
" </tr>\n",
|
279 |
+
" <tr>\n",
|
280 |
+
" <th>50%</th>\n",
|
281 |
+
" <td>38.000000</td>\n",
|
282 |
+
" <td>58000.000000</td>\n",
|
283 |
+
" </tr>\n",
|
284 |
+
" <tr>\n",
|
285 |
+
" <th>75%</th>\n",
|
286 |
+
" <td>44.000000</td>\n",
|
287 |
+
" <td>67000.000000</td>\n",
|
288 |
+
" </tr>\n",
|
289 |
+
" <tr>\n",
|
290 |
+
" <th>max</th>\n",
|
291 |
+
" <td>50.000000</td>\n",
|
292 |
+
" <td>83000.000000</td>\n",
|
293 |
+
" </tr>\n",
|
294 |
+
" </tbody>\n",
|
295 |
+
"</table>\n",
|
296 |
+
"</div>"
|
297 |
+
],
|
298 |
+
"text/plain": [
|
299 |
+
" Age Salary\n",
|
300 |
+
"count 9.000000 9.000000\n",
|
301 |
+
"mean 38.777778 58777.777778\n",
|
302 |
+
"std 7.693793 16820.952543\n",
|
303 |
+
"min 27.000000 27000.000000\n",
|
304 |
+
"25% 35.000000 52000.000000\n",
|
305 |
+
"50% 38.000000 58000.000000\n",
|
306 |
+
"75% 44.000000 67000.000000\n",
|
307 |
+
"max 50.000000 83000.000000"
|
308 |
+
]
|
309 |
+
},
|
310 |
+
"execution_count": 6,
|
311 |
+
"metadata": {},
|
312 |
+
"output_type": "execute_result"
|
313 |
+
}
|
314 |
+
],
|
315 |
+
"source": [
|
316 |
+
"#Crate summary statisics about the data in the dataframe\n",
|
317 |
+
"#This only provides summary statistics for Numerical data\n",
|
318 |
+
"data.describe()"
|
319 |
+
]
|
320 |
+
},
|
321 |
+
{
|
322 |
+
"cell_type": "code",
|
323 |
+
"execution_count": 7,
|
324 |
+
"metadata": {},
|
325 |
+
"outputs": [
|
326 |
+
{
|
327 |
+
"data": {
|
328 |
+
"text/html": [
|
329 |
+
"<div>\n",
|
330 |
+
"<style scoped>\n",
|
331 |
+
" .dataframe tbody tr th:only-of-type {\n",
|
332 |
+
" vertical-align: middle;\n",
|
333 |
+
" }\n",
|
334 |
+
"\n",
|
335 |
+
" .dataframe tbody tr th {\n",
|
336 |
+
" vertical-align: top;\n",
|
337 |
+
" }\n",
|
338 |
+
"\n",
|
339 |
+
" .dataframe thead th {\n",
|
340 |
+
" text-align: right;\n",
|
341 |
+
" }\n",
|
342 |
+
"</style>\n",
|
343 |
+
"<table border=\"1\" class=\"dataframe\">\n",
|
344 |
+
" <thead>\n",
|
345 |
+
" <tr style=\"text-align: right;\">\n",
|
346 |
+
" <th></th>\n",
|
347 |
+
" <th>count</th>\n",
|
348 |
+
" <th>mean</th>\n",
|
349 |
+
" <th>std</th>\n",
|
350 |
+
" <th>min</th>\n",
|
351 |
+
" <th>25%</th>\n",
|
352 |
+
" <th>50%</th>\n",
|
353 |
+
" <th>75%</th>\n",
|
354 |
+
" <th>max</th>\n",
|
355 |
+
" </tr>\n",
|
356 |
+
" </thead>\n",
|
357 |
+
" <tbody>\n",
|
358 |
+
" <tr>\n",
|
359 |
+
" <th>Age</th>\n",
|
360 |
+
" <td>9.0</td>\n",
|
361 |
+
" <td>38.777778</td>\n",
|
362 |
+
" <td>7.693793</td>\n",
|
363 |
+
" <td>27.0</td>\n",
|
364 |
+
" <td>35.0</td>\n",
|
365 |
+
" <td>38.0</td>\n",
|
366 |
+
" <td>44.0</td>\n",
|
367 |
+
" <td>50.0</td>\n",
|
368 |
+
" </tr>\n",
|
369 |
+
" <tr>\n",
|
370 |
+
" <th>Salary</th>\n",
|
371 |
+
" <td>9.0</td>\n",
|
372 |
+
" <td>58777.777778</td>\n",
|
373 |
+
" <td>16820.952543</td>\n",
|
374 |
+
" <td>27000.0</td>\n",
|
375 |
+
" <td>52000.0</td>\n",
|
376 |
+
" <td>58000.0</td>\n",
|
377 |
+
" <td>67000.0</td>\n",
|
378 |
+
" <td>83000.0</td>\n",
|
379 |
+
" </tr>\n",
|
380 |
+
" </tbody>\n",
|
381 |
+
"</table>\n",
|
382 |
+
"</div>"
|
383 |
+
],
|
384 |
+
"text/plain": [
|
385 |
+
" count mean std min 25% 50% 75% \\\n",
|
386 |
+
"Age 9.0 38.777778 7.693793 27.0 35.0 38.0 44.0 \n",
|
387 |
+
"Salary 9.0 58777.777778 16820.952543 27000.0 52000.0 58000.0 67000.0 \n",
|
388 |
+
"\n",
|
389 |
+
" max \n",
|
390 |
+
"Age 50.0 \n",
|
391 |
+
"Salary 83000.0 "
|
392 |
+
]
|
393 |
+
},
|
394 |
+
"execution_count": 7,
|
395 |
+
"metadata": {},
|
396 |
+
"output_type": "execute_result"
|
397 |
+
}
|
398 |
+
],
|
399 |
+
"source": [
|
400 |
+
"#An alternative was of viewing the summary statistics\n",
|
401 |
+
"data.describe().transpose()"
|
402 |
+
]
|
403 |
+
},
|
404 |
+
{
|
405 |
+
"cell_type": "code",
|
406 |
+
"execution_count": 8,
|
407 |
+
"metadata": {},
|
408 |
+
"outputs": [
|
409 |
+
{
|
410 |
+
"data": {
|
411 |
+
"text/html": [
|
412 |
+
"<div>\n",
|
413 |
+
"<style scoped>\n",
|
414 |
+
" .dataframe tbody tr th:only-of-type {\n",
|
415 |
+
" vertical-align: middle;\n",
|
416 |
+
" }\n",
|
417 |
+
"\n",
|
418 |
+
" .dataframe tbody tr th {\n",
|
419 |
+
" vertical-align: top;\n",
|
420 |
+
" }\n",
|
421 |
+
"\n",
|
422 |
+
" .dataframe thead th {\n",
|
423 |
+
" text-align: right;\n",
|
424 |
+
" }\n",
|
425 |
+
"</style>\n",
|
426 |
+
"<table border=\"1\" class=\"dataframe\">\n",
|
427 |
+
" <thead>\n",
|
428 |
+
" <tr style=\"text-align: right;\">\n",
|
429 |
+
" <th></th>\n",
|
430 |
+
" <th>Country</th>\n",
|
431 |
+
" <th>Age</th>\n",
|
432 |
+
" <th>Salary</th>\n",
|
433 |
+
" <th>Purchased</th>\n",
|
434 |
+
" </tr>\n",
|
435 |
+
" </thead>\n",
|
436 |
+
" <tbody>\n",
|
437 |
+
" <tr>\n",
|
438 |
+
" <th>0</th>\n",
|
439 |
+
" <td>France</td>\n",
|
440 |
+
" <td>44.0</td>\n",
|
441 |
+
" <td>27000.0</td>\n",
|
442 |
+
" <td>No</td>\n",
|
443 |
+
" </tr>\n",
|
444 |
+
" <tr>\n",
|
445 |
+
" <th>1</th>\n",
|
446 |
+
" <td>Spain</td>\n",
|
447 |
+
" <td>27.0</td>\n",
|
448 |
+
" <td>48000.0</td>\n",
|
449 |
+
" <td>Yes</td>\n",
|
450 |
+
" </tr>\n",
|
451 |
+
" <tr>\n",
|
452 |
+
" <th>2</th>\n",
|
453 |
+
" <td>Germany</td>\n",
|
454 |
+
" <td>30.0</td>\n",
|
455 |
+
" <td>54000.0</td>\n",
|
456 |
+
" <td>No</td>\n",
|
457 |
+
" </tr>\n",
|
458 |
+
" <tr>\n",
|
459 |
+
" <th>3</th>\n",
|
460 |
+
" <td>Spain</td>\n",
|
461 |
+
" <td>38.0</td>\n",
|
462 |
+
" <td>61000.0</td>\n",
|
463 |
+
" <td>No</td>\n",
|
464 |
+
" </tr>\n",
|
465 |
+
" <tr>\n",
|
466 |
+
" <th>4</th>\n",
|
467 |
+
" <td>Germany</td>\n",
|
468 |
+
" <td>40.0</td>\n",
|
469 |
+
" <td>NaN</td>\n",
|
470 |
+
" <td>Yes</td>\n",
|
471 |
+
" </tr>\n",
|
472 |
+
" <tr>\n",
|
473 |
+
" <th>5</th>\n",
|
474 |
+
" <td>France</td>\n",
|
475 |
+
" <td>35.0</td>\n",
|
476 |
+
" <td>58000.0</td>\n",
|
477 |
+
" <td>Yes</td>\n",
|
478 |
+
" </tr>\n",
|
479 |
+
" <tr>\n",
|
480 |
+
" <th>6</th>\n",
|
481 |
+
" <td>Spain</td>\n",
|
482 |
+
" <td>NaN</td>\n",
|
483 |
+
" <td>52000.0</td>\n",
|
484 |
+
" <td>No</td>\n",
|
485 |
+
" </tr>\n",
|
486 |
+
" <tr>\n",
|
487 |
+
" <th>7</th>\n",
|
488 |
+
" <td>France</td>\n",
|
489 |
+
" <td>48.0</td>\n",
|
490 |
+
" <td>79000.0</td>\n",
|
491 |
+
" <td>Yes</td>\n",
|
492 |
+
" </tr>\n",
|
493 |
+
" <tr>\n",
|
494 |
+
" <th>8</th>\n",
|
495 |
+
" <td>Germany</td>\n",
|
496 |
+
" <td>50.0</td>\n",
|
497 |
+
" <td>83000.0</td>\n",
|
498 |
+
" <td>No</td>\n",
|
499 |
+
" </tr>\n",
|
500 |
+
" <tr>\n",
|
501 |
+
" <th>9</th>\n",
|
502 |
+
" <td>France</td>\n",
|
503 |
+
" <td>37.0</td>\n",
|
504 |
+
" <td>67000.0</td>\n",
|
505 |
+
" <td>Yes</td>\n",
|
506 |
+
" </tr>\n",
|
507 |
+
" </tbody>\n",
|
508 |
+
"</table>\n",
|
509 |
+
"</div>"
|
510 |
+
],
|
511 |
+
"text/plain": [
|
512 |
+
" Country Age Salary Purchased\n",
|
513 |
+
"0 France 44.0 27000.0 No\n",
|
514 |
+
"1 Spain 27.0 48000.0 Yes\n",
|
515 |
+
"2 Germany 30.0 54000.0 No\n",
|
516 |
+
"3 Spain 38.0 61000.0 No\n",
|
517 |
+
"4 Germany 40.0 NaN Yes\n",
|
518 |
+
"5 France 35.0 58000.0 Yes\n",
|
519 |
+
"6 Spain NaN 52000.0 No\n",
|
520 |
+
"7 France 48.0 79000.0 Yes\n",
|
521 |
+
"8 Germany 50.0 83000.0 No\n",
|
522 |
+
"9 France 37.0 67000.0 Yes"
|
523 |
+
]
|
524 |
+
},
|
525 |
+
"execution_count": 8,
|
526 |
+
"metadata": {},
|
527 |
+
"output_type": "execute_result"
|
528 |
+
}
|
529 |
+
],
|
530 |
+
"source": [
|
531 |
+
"#Display the data from the dataframe\n",
|
532 |
+
"#NB: notices we have some values to NO values -> See question later in the notebook\n",
|
533 |
+
"data"
|
534 |
+
]
|
535 |
+
},
|
536 |
+
{
|
537 |
+
"cell_type": "code",
|
538 |
+
"execution_count": 9,
|
539 |
+
"metadata": {},
|
540 |
+
"outputs": [
|
541 |
+
{
|
542 |
+
"data": {
|
543 |
+
"text/html": [
|
544 |
+
"<div>\n",
|
545 |
+
"<style scoped>\n",
|
546 |
+
" .dataframe tbody tr th:only-of-type {\n",
|
547 |
+
" vertical-align: middle;\n",
|
548 |
+
" }\n",
|
549 |
+
"\n",
|
550 |
+
" .dataframe tbody tr th {\n",
|
551 |
+
" vertical-align: top;\n",
|
552 |
+
" }\n",
|
553 |
+
"\n",
|
554 |
+
" .dataframe thead th {\n",
|
555 |
+
" text-align: right;\n",
|
556 |
+
" }\n",
|
557 |
+
"</style>\n",
|
558 |
+
"<table border=\"1\" class=\"dataframe\">\n",
|
559 |
+
" <thead>\n",
|
560 |
+
" <tr style=\"text-align: right;\">\n",
|
561 |
+
" <th></th>\n",
|
562 |
+
" <th>Country</th>\n",
|
563 |
+
" <th>Age</th>\n",
|
564 |
+
" <th>Salary</th>\n",
|
565 |
+
" <th>Purchased</th>\n",
|
566 |
+
" </tr>\n",
|
567 |
+
" </thead>\n",
|
568 |
+
" <tbody>\n",
|
569 |
+
" <tr>\n",
|
570 |
+
" <th>0</th>\n",
|
571 |
+
" <td>France</td>\n",
|
572 |
+
" <td>0.739130</td>\n",
|
573 |
+
" <td>0.000000</td>\n",
|
574 |
+
" <td>No</td>\n",
|
575 |
+
" </tr>\n",
|
576 |
+
" <tr>\n",
|
577 |
+
" <th>1</th>\n",
|
578 |
+
" <td>Spain</td>\n",
|
579 |
+
" <td>0.000000</td>\n",
|
580 |
+
" <td>0.375000</td>\n",
|
581 |
+
" <td>Yes</td>\n",
|
582 |
+
" </tr>\n",
|
583 |
+
" <tr>\n",
|
584 |
+
" <th>2</th>\n",
|
585 |
+
" <td>Germany</td>\n",
|
586 |
+
" <td>0.130435</td>\n",
|
587 |
+
" <td>0.482143</td>\n",
|
588 |
+
" <td>No</td>\n",
|
589 |
+
" </tr>\n",
|
590 |
+
" <tr>\n",
|
591 |
+
" <th>3</th>\n",
|
592 |
+
" <td>Spain</td>\n",
|
593 |
+
" <td>0.478261</td>\n",
|
594 |
+
" <td>0.607143</td>\n",
|
595 |
+
" <td>No</td>\n",
|
596 |
+
" </tr>\n",
|
597 |
+
" <tr>\n",
|
598 |
+
" <th>4</th>\n",
|
599 |
+
" <td>Germany</td>\n",
|
600 |
+
" <td>0.565217</td>\n",
|
601 |
+
" <td>NaN</td>\n",
|
602 |
+
" <td>Yes</td>\n",
|
603 |
+
" </tr>\n",
|
604 |
+
" <tr>\n",
|
605 |
+
" <th>5</th>\n",
|
606 |
+
" <td>France</td>\n",
|
607 |
+
" <td>0.347826</td>\n",
|
608 |
+
" <td>0.553571</td>\n",
|
609 |
+
" <td>Yes</td>\n",
|
610 |
+
" </tr>\n",
|
611 |
+
" <tr>\n",
|
612 |
+
" <th>6</th>\n",
|
613 |
+
" <td>Spain</td>\n",
|
614 |
+
" <td>NaN</td>\n",
|
615 |
+
" <td>0.446429</td>\n",
|
616 |
+
" <td>No</td>\n",
|
617 |
+
" </tr>\n",
|
618 |
+
" <tr>\n",
|
619 |
+
" <th>7</th>\n",
|
620 |
+
" <td>France</td>\n",
|
621 |
+
" <td>0.913043</td>\n",
|
622 |
+
" <td>0.928571</td>\n",
|
623 |
+
" <td>Yes</td>\n",
|
624 |
+
" </tr>\n",
|
625 |
+
" <tr>\n",
|
626 |
+
" <th>8</th>\n",
|
627 |
+
" <td>Germany</td>\n",
|
628 |
+
" <td>1.000000</td>\n",
|
629 |
+
" <td>1.000000</td>\n",
|
630 |
+
" <td>No</td>\n",
|
631 |
+
" </tr>\n",
|
632 |
+
" <tr>\n",
|
633 |
+
" <th>9</th>\n",
|
634 |
+
" <td>France</td>\n",
|
635 |
+
" <td>0.434783</td>\n",
|
636 |
+
" <td>0.714286</td>\n",
|
637 |
+
" <td>Yes</td>\n",
|
638 |
+
" </tr>\n",
|
639 |
+
" </tbody>\n",
|
640 |
+
"</table>\n",
|
641 |
+
"</div>"
|
642 |
+
],
|
643 |
+
"text/plain": [
|
644 |
+
" Country Age Salary Purchased\n",
|
645 |
+
"0 France 0.739130 0.000000 No\n",
|
646 |
+
"1 Spain 0.000000 0.375000 Yes\n",
|
647 |
+
"2 Germany 0.130435 0.482143 No\n",
|
648 |
+
"3 Spain 0.478261 0.607143 No\n",
|
649 |
+
"4 Germany 0.565217 NaN Yes\n",
|
650 |
+
"5 France 0.347826 0.553571 Yes\n",
|
651 |
+
"6 Spain NaN 0.446429 No\n",
|
652 |
+
"7 France 0.913043 0.928571 Yes\n",
|
653 |
+
"8 Germany 1.000000 1.000000 No\n",
|
654 |
+
"9 France 0.434783 0.714286 Yes"
|
655 |
+
]
|
656 |
+
},
|
657 |
+
"execution_count": 9,
|
658 |
+
"metadata": {},
|
659 |
+
"output_type": "execute_result"
|
660 |
+
}
|
661 |
+
],
|
662 |
+
"source": [
|
663 |
+
"#Setup the MinMaxScaler\n",
|
664 |
+
"#This will only work on Numerical attributes/features\n",
|
665 |
+
"scaler = MinMaxScaler()\n",
|
666 |
+
"\n",
|
667 |
+
"#Apply the scaler to the numerical data\n",
|
668 |
+
"# and save the data back to the dataframe\n",
|
669 |
+
"# overwriting the original values\n",
|
670 |
+
"data[['Age', 'Salary']] = scaler.fit_transform(data[['Age', 'Salary']])\n",
|
671 |
+
"\n",
|
672 |
+
"#Display the dataframe\n",
|
673 |
+
"data"
|
674 |
+
]
|
675 |
+
},
|
676 |
+
{
|
677 |
+
"cell_type": "markdown",
|
678 |
+
"metadata": {},
|
679 |
+
"source": [
|
680 |
+
"### Copy and modify the above code to repace the NaN"
|
681 |
+
]
|
682 |
+
},
|
683 |
+
{
|
684 |
+
"cell_type": "markdown",
|
685 |
+
"metadata": {},
|
686 |
+
"source": [
|
687 |
+
"#### Modify the data set to replace the empty data (NaN) with an appropriate values\n",
|
688 |
+
"#### Rerun the scaler with this updated/modified dataframe\n"
|
689 |
+
]
|
690 |
+
},
|
691 |
+
{
|
692 |
+
"cell_type": "code",
|
693 |
+
"execution_count": null,
|
694 |
+
"metadata": {},
|
695 |
+
"outputs": [],
|
696 |
+
"source": []
|
697 |
+
},
|
698 |
+
{
|
699 |
+
"cell_type": "code",
|
700 |
+
"execution_count": null,
|
701 |
+
"metadata": {},
|
702 |
+
"outputs": [],
|
703 |
+
"source": []
|
704 |
+
},
|
705 |
+
{
|
706 |
+
"cell_type": "code",
|
707 |
+
"execution_count": null,
|
708 |
+
"metadata": {},
|
709 |
+
"outputs": [],
|
710 |
+
"source": []
|
711 |
+
},
|
712 |
+
{
|
713 |
+
"cell_type": "code",
|
714 |
+
"execution_count": null,
|
715 |
+
"metadata": {},
|
716 |
+
"outputs": [],
|
717 |
+
"source": []
|
718 |
+
},
|
719 |
+
{
|
720 |
+
"cell_type": "code",
|
721 |
+
"execution_count": null,
|
722 |
+
"metadata": {},
|
723 |
+
"outputs": [],
|
724 |
+
"source": []
|
725 |
+
},
|
726 |
+
{
|
727 |
+
"cell_type": "code",
|
728 |
+
"execution_count": null,
|
729 |
+
"metadata": {},
|
730 |
+
"outputs": [],
|
731 |
+
"source": [
|
732 |
+
"\n"
|
733 |
+
]
|
734 |
+
},
|
735 |
+
{
|
736 |
+
"cell_type": "code",
|
737 |
+
"execution_count": null,
|
738 |
+
"metadata": {},
|
739 |
+
"outputs": [],
|
740 |
+
"source": []
|
741 |
+
},
|
742 |
+
{
|
743 |
+
"cell_type": "markdown",
|
744 |
+
"metadata": {},
|
745 |
+
"source": [
|
746 |
+
"### Another Example and Exercise - Complete all the steps\n",
|
747 |
+
"\n",
|
748 |
+
"#### Data set = Pima Indian diabetes dataset \n",
|
749 |
+
"#### https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database"
|
750 |
+
]
|
751 |
+
},
|
752 |
+
{
|
753 |
+
"cell_type": "code",
|
754 |
+
"execution_count": 10,
|
755 |
+
"metadata": {},
|
756 |
+
"outputs": [
|
757 |
+
{
|
758 |
+
"data": {
|
759 |
+
"text/html": [
|
760 |
+
"<div>\n",
|
761 |
+
"<style scoped>\n",
|
762 |
+
" .dataframe tbody tr th:only-of-type {\n",
|
763 |
+
" vertical-align: middle;\n",
|
764 |
+
" }\n",
|
765 |
+
"\n",
|
766 |
+
" .dataframe tbody tr th {\n",
|
767 |
+
" vertical-align: top;\n",
|
768 |
+
" }\n",
|
769 |
+
"\n",
|
770 |
+
" .dataframe thead th {\n",
|
771 |
+
" text-align: right;\n",
|
772 |
+
" }\n",
|
773 |
+
"</style>\n",
|
774 |
+
"<table border=\"1\" class=\"dataframe\">\n",
|
775 |
+
" <thead>\n",
|
776 |
+
" <tr style=\"text-align: right;\">\n",
|
777 |
+
" <th></th>\n",
|
778 |
+
" <th>preg</th>\n",
|
779 |
+
" <th>plas</th>\n",
|
780 |
+
" <th>pres</th>\n",
|
781 |
+
" <th>skin</th>\n",
|
782 |
+
" <th>test</th>\n",
|
783 |
+
" <th>mass</th>\n",
|
784 |
+
" <th>pedi</th>\n",
|
785 |
+
" <th>age</th>\n",
|
786 |
+
" <th>class</th>\n",
|
787 |
+
" </tr>\n",
|
788 |
+
" </thead>\n",
|
789 |
+
" <tbody>\n",
|
790 |
+
" <tr>\n",
|
791 |
+
" <th>0</th>\n",
|
792 |
+
" <td>6</td>\n",
|
793 |
+
" <td>148</td>\n",
|
794 |
+
" <td>72</td>\n",
|
795 |
+
" <td>35</td>\n",
|
796 |
+
" <td>0</td>\n",
|
797 |
+
" <td>33.6</td>\n",
|
798 |
+
" <td>0.627</td>\n",
|
799 |
+
" <td>50</td>\n",
|
800 |
+
" <td>1</td>\n",
|
801 |
+
" </tr>\n",
|
802 |
+
" <tr>\n",
|
803 |
+
" <th>1</th>\n",
|
804 |
+
" <td>1</td>\n",
|
805 |
+
" <td>85</td>\n",
|
806 |
+
" <td>66</td>\n",
|
807 |
+
" <td>29</td>\n",
|
808 |
+
" <td>0</td>\n",
|
809 |
+
" <td>26.6</td>\n",
|
810 |
+
" <td>0.351</td>\n",
|
811 |
+
" <td>31</td>\n",
|
812 |
+
" <td>0</td>\n",
|
813 |
+
" </tr>\n",
|
814 |
+
" <tr>\n",
|
815 |
+
" <th>2</th>\n",
|
816 |
+
" <td>8</td>\n",
|
817 |
+
" <td>183</td>\n",
|
818 |
+
" <td>64</td>\n",
|
819 |
+
" <td>0</td>\n",
|
820 |
+
" <td>0</td>\n",
|
821 |
+
" <td>23.3</td>\n",
|
822 |
+
" <td>0.672</td>\n",
|
823 |
+
" <td>32</td>\n",
|
824 |
+
" <td>1</td>\n",
|
825 |
+
" </tr>\n",
|
826 |
+
" <tr>\n",
|
827 |
+
" <th>3</th>\n",
|
828 |
+
" <td>1</td>\n",
|
829 |
+
" <td>89</td>\n",
|
830 |
+
" <td>66</td>\n",
|
831 |
+
" <td>23</td>\n",
|
832 |
+
" <td>94</td>\n",
|
833 |
+
" <td>28.1</td>\n",
|
834 |
+
" <td>0.167</td>\n",
|
835 |
+
" <td>21</td>\n",
|
836 |
+
" <td>0</td>\n",
|
837 |
+
" </tr>\n",
|
838 |
+
" <tr>\n",
|
839 |
+
" <th>4</th>\n",
|
840 |
+
" <td>0</td>\n",
|
841 |
+
" <td>137</td>\n",
|
842 |
+
" <td>40</td>\n",
|
843 |
+
" <td>35</td>\n",
|
844 |
+
" <td>168</td>\n",
|
845 |
+
" <td>43.1</td>\n",
|
846 |
+
" <td>2.288</td>\n",
|
847 |
+
" <td>33</td>\n",
|
848 |
+
" <td>1</td>\n",
|
849 |
+
" </tr>\n",
|
850 |
+
" </tbody>\n",
|
851 |
+
"</table>\n",
|
852 |
+
"</div>"
|
853 |
+
],
|
854 |
+
"text/plain": [
|
855 |
+
" preg plas pres skin test mass pedi age class\n",
|
856 |
+
"0 6 148 72 35 0 33.6 0.627 50 1\n",
|
857 |
+
"1 1 85 66 29 0 26.6 0.351 31 0\n",
|
858 |
+
"2 8 183 64 0 0 23.3 0.672 32 1\n",
|
859 |
+
"3 1 89 66 23 94 28.1 0.167 21 0\n",
|
860 |
+
"4 0 137 40 35 168 43.1 2.288 33 1"
|
861 |
+
]
|
862 |
+
},
|
863 |
+
"execution_count": 10,
|
864 |
+
"metadata": {},
|
865 |
+
"output_type": "execute_result"
|
866 |
+
}
|
867 |
+
],
|
868 |
+
"source": [
|
869 |
+
"#Import the data set\n",
|
870 |
+
"\n",
|
871 |
+
"import pandas as pd\n",
|
872 |
+
"columns = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']\n",
|
873 |
+
"data = pd.read_csv('/Users/brendan.tierney/Dropbox/4-Datasets/pima-indians-diabetes.csv', names=columns)\n",
|
874 |
+
"data.head()"
|
875 |
+
]
|
876 |
+
},
|
877 |
+
{
|
878 |
+
"cell_type": "code",
|
879 |
+
"execution_count": 11,
|
880 |
+
"metadata": {},
|
881 |
+
"outputs": [
|
882 |
+
{
|
883 |
+
"data": {
|
884 |
+
"text/html": [
|
885 |
+
"<div>\n",
|
886 |
+
"<style scoped>\n",
|
887 |
+
" .dataframe tbody tr th:only-of-type {\n",
|
888 |
+
" vertical-align: middle;\n",
|
889 |
+
" }\n",
|
890 |
+
"\n",
|
891 |
+
" .dataframe tbody tr th {\n",
|
892 |
+
" vertical-align: top;\n",
|
893 |
+
" }\n",
|
894 |
+
"\n",
|
895 |
+
" .dataframe thead th {\n",
|
896 |
+
" text-align: right;\n",
|
897 |
+
" }\n",
|
898 |
+
"</style>\n",
|
899 |
+
"<table border=\"1\" class=\"dataframe\">\n",
|
900 |
+
" <thead>\n",
|
901 |
+
" <tr style=\"text-align: right;\">\n",
|
902 |
+
" <th></th>\n",
|
903 |
+
" <th>preg</th>\n",
|
904 |
+
" <th>plas</th>\n",
|
905 |
+
" <th>pres</th>\n",
|
906 |
+
" <th>skin</th>\n",
|
907 |
+
" <th>test</th>\n",
|
908 |
+
" <th>mass</th>\n",
|
909 |
+
" <th>pedi</th>\n",
|
910 |
+
" <th>age</th>\n",
|
911 |
+
" <th>class</th>\n",
|
912 |
+
" </tr>\n",
|
913 |
+
" </thead>\n",
|
914 |
+
" <tbody>\n",
|
915 |
+
" <tr>\n",
|
916 |
+
" <th>count</th>\n",
|
917 |
+
" <td>768.000000</td>\n",
|
918 |
+
" <td>768.000000</td>\n",
|
919 |
+
" <td>768.000000</td>\n",
|
920 |
+
" <td>768.000000</td>\n",
|
921 |
+
" <td>768.000000</td>\n",
|
922 |
+
" <td>768.000000</td>\n",
|
923 |
+
" <td>768.000000</td>\n",
|
924 |
+
" <td>768.000000</td>\n",
|
925 |
+
" <td>768.000000</td>\n",
|
926 |
+
" </tr>\n",
|
927 |
+
" <tr>\n",
|
928 |
+
" <th>mean</th>\n",
|
929 |
+
" <td>3.845052</td>\n",
|
930 |
+
" <td>120.894531</td>\n",
|
931 |
+
" <td>69.105469</td>\n",
|
932 |
+
" <td>20.536458</td>\n",
|
933 |
+
" <td>79.799479</td>\n",
|
934 |
+
" <td>31.992578</td>\n",
|
935 |
+
" <td>0.471876</td>\n",
|
936 |
+
" <td>33.240885</td>\n",
|
937 |
+
" <td>0.348958</td>\n",
|
938 |
+
" </tr>\n",
|
939 |
+
" <tr>\n",
|
940 |
+
" <th>std</th>\n",
|
941 |
+
" <td>3.369578</td>\n",
|
942 |
+
" <td>31.972618</td>\n",
|
943 |
+
" <td>19.355807</td>\n",
|
944 |
+
" <td>15.952218</td>\n",
|
945 |
+
" <td>115.244002</td>\n",
|
946 |
+
" <td>7.884160</td>\n",
|
947 |
+
" <td>0.331329</td>\n",
|
948 |
+
" <td>11.760232</td>\n",
|
949 |
+
" <td>0.476951</td>\n",
|
950 |
+
" </tr>\n",
|
951 |
+
" <tr>\n",
|
952 |
+
" <th>min</th>\n",
|
953 |
+
" <td>0.000000</td>\n",
|
954 |
+
" <td>0.000000</td>\n",
|
955 |
+
" <td>0.000000</td>\n",
|
956 |
+
" <td>0.000000</td>\n",
|
957 |
+
" <td>0.000000</td>\n",
|
958 |
+
" <td>0.000000</td>\n",
|
959 |
+
" <td>0.078000</td>\n",
|
960 |
+
" <td>21.000000</td>\n",
|
961 |
+
" <td>0.000000</td>\n",
|
962 |
+
" </tr>\n",
|
963 |
+
" <tr>\n",
|
964 |
+
" <th>25%</th>\n",
|
965 |
+
" <td>1.000000</td>\n",
|
966 |
+
" <td>99.000000</td>\n",
|
967 |
+
" <td>62.000000</td>\n",
|
968 |
+
" <td>0.000000</td>\n",
|
969 |
+
" <td>0.000000</td>\n",
|
970 |
+
" <td>27.300000</td>\n",
|
971 |
+
" <td>0.243750</td>\n",
|
972 |
+
" <td>24.000000</td>\n",
|
973 |
+
" <td>0.000000</td>\n",
|
974 |
+
" </tr>\n",
|
975 |
+
" <tr>\n",
|
976 |
+
" <th>50%</th>\n",
|
977 |
+
" <td>3.000000</td>\n",
|
978 |
+
" <td>117.000000</td>\n",
|
979 |
+
" <td>72.000000</td>\n",
|
980 |
+
" <td>23.000000</td>\n",
|
981 |
+
" <td>30.500000</td>\n",
|
982 |
+
" <td>32.000000</td>\n",
|
983 |
+
" <td>0.372500</td>\n",
|
984 |
+
" <td>29.000000</td>\n",
|
985 |
+
" <td>0.000000</td>\n",
|
986 |
+
" </tr>\n",
|
987 |
+
" <tr>\n",
|
988 |
+
" <th>75%</th>\n",
|
989 |
+
" <td>6.000000</td>\n",
|
990 |
+
" <td>140.250000</td>\n",
|
991 |
+
" <td>80.000000</td>\n",
|
992 |
+
" <td>32.000000</td>\n",
|
993 |
+
" <td>127.250000</td>\n",
|
994 |
+
" <td>36.600000</td>\n",
|
995 |
+
" <td>0.626250</td>\n",
|
996 |
+
" <td>41.000000</td>\n",
|
997 |
+
" <td>1.000000</td>\n",
|
998 |
+
" </tr>\n",
|
999 |
+
" <tr>\n",
|
1000 |
+
" <th>max</th>\n",
|
1001 |
+
" <td>17.000000</td>\n",
|
1002 |
+
" <td>199.000000</td>\n",
|
1003 |
+
" <td>122.000000</td>\n",
|
1004 |
+
" <td>99.000000</td>\n",
|
1005 |
+
" <td>846.000000</td>\n",
|
1006 |
+
" <td>67.100000</td>\n",
|
1007 |
+
" <td>2.420000</td>\n",
|
1008 |
+
" <td>81.000000</td>\n",
|
1009 |
+
" <td>1.000000</td>\n",
|
1010 |
+
" </tr>\n",
|
1011 |
+
" </tbody>\n",
|
1012 |
+
"</table>\n",
|
1013 |
+
"</div>"
|
1014 |
+
],
|
1015 |
+
"text/plain": [
|
1016 |
+
" preg plas pres skin test mass \\\n",
|
1017 |
+
"count 768.000000 768.000000 768.000000 768.000000 768.000000 768.000000 \n",
|
1018 |
+
"mean 3.845052 120.894531 69.105469 20.536458 79.799479 31.992578 \n",
|
1019 |
+
"std 3.369578 31.972618 19.355807 15.952218 115.244002 7.884160 \n",
|
1020 |
+
"min 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 \n",
|
1021 |
+
"25% 1.000000 99.000000 62.000000 0.000000 0.000000 27.300000 \n",
|
1022 |
+
"50% 3.000000 117.000000 72.000000 23.000000 30.500000 32.000000 \n",
|
1023 |
+
"75% 6.000000 140.250000 80.000000 32.000000 127.250000 36.600000 \n",
|
1024 |
+
"max 17.000000 199.000000 122.000000 99.000000 846.000000 67.100000 \n",
|
1025 |
+
"\n",
|
1026 |
+
" pedi age class \n",
|
1027 |
+
"count 768.000000 768.000000 768.000000 \n",
|
1028 |
+
"mean 0.471876 33.240885 0.348958 \n",
|
1029 |
+
"std 0.331329 11.760232 0.476951 \n",
|
1030 |
+
"min 0.078000 21.000000 0.000000 \n",
|
1031 |
+
"25% 0.243750 24.000000 0.000000 \n",
|
1032 |
+
"50% 0.372500 29.000000 0.000000 \n",
|
1033 |
+
"75% 0.626250 41.000000 1.000000 \n",
|
1034 |
+
"max 2.420000 81.000000 1.000000 "
|
1035 |
+
]
|
1036 |
+
},
|
1037 |
+
"execution_count": 11,
|
1038 |
+
"metadata": {},
|
1039 |
+
"output_type": "execute_result"
|
1040 |
+
}
|
1041 |
+
],
|
1042 |
+
"source": [
|
1043 |
+
"#Create the summary statistics\n",
|
1044 |
+
"data.describe()"
|
1045 |
+
]
|
1046 |
+
},
|
1047 |
+
{
|
1048 |
+
"cell_type": "code",
|
1049 |
+
"execution_count": 12,
|
1050 |
+
"metadata": {},
|
1051 |
+
"outputs": [
|
1052 |
+
{
|
1053 |
+
"data": {
|
1054 |
+
"text/html": [
|
1055 |
+
"<div>\n",
|
1056 |
+
"<style scoped>\n",
|
1057 |
+
" .dataframe tbody tr th:only-of-type {\n",
|
1058 |
+
" vertical-align: middle;\n",
|
1059 |
+
" }\n",
|
1060 |
+
"\n",
|
1061 |
+
" .dataframe tbody tr th {\n",
|
1062 |
+
" vertical-align: top;\n",
|
1063 |
+
" }\n",
|
1064 |
+
"\n",
|
1065 |
+
" .dataframe thead th {\n",
|
1066 |
+
" text-align: right;\n",
|
1067 |
+
" }\n",
|
1068 |
+
"</style>\n",
|
1069 |
+
"<table border=\"1\" class=\"dataframe\">\n",
|
1070 |
+
" <thead>\n",
|
1071 |
+
" <tr style=\"text-align: right;\">\n",
|
1072 |
+
" <th></th>\n",
|
1073 |
+
" <th>count</th>\n",
|
1074 |
+
" <th>mean</th>\n",
|
1075 |
+
" <th>std</th>\n",
|
1076 |
+
" <th>min</th>\n",
|
1077 |
+
" <th>25%</th>\n",
|
1078 |
+
" <th>50%</th>\n",
|
1079 |
+
" <th>75%</th>\n",
|
1080 |
+
" <th>max</th>\n",
|
1081 |
+
" </tr>\n",
|
1082 |
+
" </thead>\n",
|
1083 |
+
" <tbody>\n",
|
1084 |
+
" <tr>\n",
|
1085 |
+
" <th>preg</th>\n",
|
1086 |
+
" <td>768.0</td>\n",
|
1087 |
+
" <td>3.845052</td>\n",
|
1088 |
+
" <td>3.369578</td>\n",
|
1089 |
+
" <td>0.000</td>\n",
|
1090 |
+
" <td>1.00000</td>\n",
|
1091 |
+
" <td>3.0000</td>\n",
|
1092 |
+
" <td>6.00000</td>\n",
|
1093 |
+
" <td>17.00</td>\n",
|
1094 |
+
" </tr>\n",
|
1095 |
+
" <tr>\n",
|
1096 |
+
" <th>plas</th>\n",
|
1097 |
+
" <td>768.0</td>\n",
|
1098 |
+
" <td>120.894531</td>\n",
|
1099 |
+
" <td>31.972618</td>\n",
|
1100 |
+
" <td>0.000</td>\n",
|
1101 |
+
" <td>99.00000</td>\n",
|
1102 |
+
" <td>117.0000</td>\n",
|
1103 |
+
" <td>140.25000</td>\n",
|
1104 |
+
" <td>199.00</td>\n",
|
1105 |
+
" </tr>\n",
|
1106 |
+
" <tr>\n",
|
1107 |
+
" <th>pres</th>\n",
|
1108 |
+
" <td>768.0</td>\n",
|
1109 |
+
" <td>69.105469</td>\n",
|
1110 |
+
" <td>19.355807</td>\n",
|
1111 |
+
" <td>0.000</td>\n",
|
1112 |
+
" <td>62.00000</td>\n",
|
1113 |
+
" <td>72.0000</td>\n",
|
1114 |
+
" <td>80.00000</td>\n",
|
1115 |
+
" <td>122.00</td>\n",
|
1116 |
+
" </tr>\n",
|
1117 |
+
" <tr>\n",
|
1118 |
+
" <th>skin</th>\n",
|
1119 |
+
" <td>768.0</td>\n",
|
1120 |
+
" <td>20.536458</td>\n",
|
1121 |
+
" <td>15.952218</td>\n",
|
1122 |
+
" <td>0.000</td>\n",
|
1123 |
+
" <td>0.00000</td>\n",
|
1124 |
+
" <td>23.0000</td>\n",
|
1125 |
+
" <td>32.00000</td>\n",
|
1126 |
+
" <td>99.00</td>\n",
|
1127 |
+
" </tr>\n",
|
1128 |
+
" <tr>\n",
|
1129 |
+
" <th>test</th>\n",
|
1130 |
+
" <td>768.0</td>\n",
|
1131 |
+
" <td>79.799479</td>\n",
|
1132 |
+
" <td>115.244002</td>\n",
|
1133 |
+
" <td>0.000</td>\n",
|
1134 |
+
" <td>0.00000</td>\n",
|
1135 |
+
" <td>30.5000</td>\n",
|
1136 |
+
" <td>127.25000</td>\n",
|
1137 |
+
" <td>846.00</td>\n",
|
1138 |
+
" </tr>\n",
|
1139 |
+
" <tr>\n",
|
1140 |
+
" <th>mass</th>\n",
|
1141 |
+
" <td>768.0</td>\n",
|
1142 |
+
" <td>31.992578</td>\n",
|
1143 |
+
" <td>7.884160</td>\n",
|
1144 |
+
" <td>0.000</td>\n",
|
1145 |
+
" <td>27.30000</td>\n",
|
1146 |
+
" <td>32.0000</td>\n",
|
1147 |
+
" <td>36.60000</td>\n",
|
1148 |
+
" <td>67.10</td>\n",
|
1149 |
+
" </tr>\n",
|
1150 |
+
" <tr>\n",
|
1151 |
+
" <th>pedi</th>\n",
|
1152 |
+
" <td>768.0</td>\n",
|
1153 |
+
" <td>0.471876</td>\n",
|
1154 |
+
" <td>0.331329</td>\n",
|
1155 |
+
" <td>0.078</td>\n",
|
1156 |
+
" <td>0.24375</td>\n",
|
1157 |
+
" <td>0.3725</td>\n",
|
1158 |
+
" <td>0.62625</td>\n",
|
1159 |
+
" <td>2.42</td>\n",
|
1160 |
+
" </tr>\n",
|
1161 |
+
" <tr>\n",
|
1162 |
+
" <th>age</th>\n",
|
1163 |
+
" <td>768.0</td>\n",
|
1164 |
+
" <td>33.240885</td>\n",
|
1165 |
+
" <td>11.760232</td>\n",
|
1166 |
+
" <td>21.000</td>\n",
|
1167 |
+
" <td>24.00000</td>\n",
|
1168 |
+
" <td>29.0000</td>\n",
|
1169 |
+
" <td>41.00000</td>\n",
|
1170 |
+
" <td>81.00</td>\n",
|
1171 |
+
" </tr>\n",
|
1172 |
+
" <tr>\n",
|
1173 |
+
" <th>class</th>\n",
|
1174 |
+
" <td>768.0</td>\n",
|
1175 |
+
" <td>0.348958</td>\n",
|
1176 |
+
" <td>0.476951</td>\n",
|
1177 |
+
" <td>0.000</td>\n",
|
1178 |
+
" <td>0.00000</td>\n",
|
1179 |
+
" <td>0.0000</td>\n",
|
1180 |
+
" <td>1.00000</td>\n",
|
1181 |
+
" <td>1.00</td>\n",
|
1182 |
+
" </tr>\n",
|
1183 |
+
" </tbody>\n",
|
1184 |
+
"</table>\n",
|
1185 |
+
"</div>"
|
1186 |
+
],
|
1187 |
+
"text/plain": [
|
1188 |
+
" count mean std min 25% 50% 75% \\\n",
|
1189 |
+
"preg 768.0 3.845052 3.369578 0.000 1.00000 3.0000 6.00000 \n",
|
1190 |
+
"plas 768.0 120.894531 31.972618 0.000 99.00000 117.0000 140.25000 \n",
|
1191 |
+
"pres 768.0 69.105469 19.355807 0.000 62.00000 72.0000 80.00000 \n",
|
1192 |
+
"skin 768.0 20.536458 15.952218 0.000 0.00000 23.0000 32.00000 \n",
|
1193 |
+
"test 768.0 79.799479 115.244002 0.000 0.00000 30.5000 127.25000 \n",
|
1194 |
+
"mass 768.0 31.992578 7.884160 0.000 27.30000 32.0000 36.60000 \n",
|
1195 |
+
"pedi 768.0 0.471876 0.331329 0.078 0.24375 0.3725 0.62625 \n",
|
1196 |
+
"age 768.0 33.240885 11.760232 21.000 24.00000 29.0000 41.00000 \n",
|
1197 |
+
"class 768.0 0.348958 0.476951 0.000 0.00000 0.0000 1.00000 \n",
|
1198 |
+
"\n",
|
1199 |
+
" max \n",
|
1200 |
+
"preg 17.00 \n",
|
1201 |
+
"plas 199.00 \n",
|
1202 |
+
"pres 122.00 \n",
|
1203 |
+
"skin 99.00 \n",
|
1204 |
+
"test 846.00 \n",
|
1205 |
+
"mass 67.10 \n",
|
1206 |
+
"pedi 2.42 \n",
|
1207 |
+
"age 81.00 \n",
|
1208 |
+
"class 1.00 "
|
1209 |
+
]
|
1210 |
+
},
|
1211 |
+
"execution_count": 12,
|
1212 |
+
"metadata": {},
|
1213 |
+
"output_type": "execute_result"
|
1214 |
+
}
|
1215 |
+
],
|
1216 |
+
"source": [
|
1217 |
+
"data.describe().transpose()"
|
1218 |
+
]
|
1219 |
+
},
|
1220 |
+
{
|
1221 |
+
"cell_type": "code",
|
1222 |
+
"execution_count": 13,
|
1223 |
+
"metadata": {},
|
1224 |
+
"outputs": [
|
1225 |
+
{
|
1226 |
+
"data": {
|
1227 |
+
"image/png": "\n",
|
1228 |
+
"text/plain": [
|
1229 |
+
"<Figure size 864x2160 with 28 Axes>"
|
1230 |
+
]
|
1231 |
+
},
|
1232 |
+
"metadata": {
|
1233 |
+
"needs_background": "light"
|
1234 |
+
},
|
1235 |
+
"output_type": "display_data"
|
1236 |
+
}
|
1237 |
+
],
|
1238 |
+
"source": [
|
1239 |
+
"data[columns].hist(stacked=False, bins=100, figsize=(12,30), layout=(14,2));"
|
1240 |
+
]
|
1241 |
+
},
|
1242 |
+
{
|
1243 |
+
"cell_type": "code",
|
1244 |
+
"execution_count": 14,
|
1245 |
+
"metadata": {},
|
1246 |
+
"outputs": [
|
1247 |
+
{
|
1248 |
+
"data": {
|
1249 |
+
"text/html": [
|
1250 |
+
"<div>\n",
|
1251 |
+
"<style scoped>\n",
|
1252 |
+
" .dataframe tbody tr th:only-of-type {\n",
|
1253 |
+
" vertical-align: middle;\n",
|
1254 |
+
" }\n",
|
1255 |
+
"\n",
|
1256 |
+
" .dataframe tbody tr th {\n",
|
1257 |
+
" vertical-align: top;\n",
|
1258 |
+
" }\n",
|
1259 |
+
"\n",
|
1260 |
+
" .dataframe thead th {\n",
|
1261 |
+
" text-align: right;\n",
|
1262 |
+
" }\n",
|
1263 |
+
"</style>\n",
|
1264 |
+
"<table border=\"1\" class=\"dataframe\">\n",
|
1265 |
+
" <thead>\n",
|
1266 |
+
" <tr style=\"text-align: right;\">\n",
|
1267 |
+
" <th></th>\n",
|
1268 |
+
" <th>preg</th>\n",
|
1269 |
+
" <th>plas</th>\n",
|
1270 |
+
" <th>pres</th>\n",
|
1271 |
+
" <th>skin</th>\n",
|
1272 |
+
" <th>test</th>\n",
|
1273 |
+
" <th>mass</th>\n",
|
1274 |
+
" <th>pedi</th>\n",
|
1275 |
+
" <th>age</th>\n",
|
1276 |
+
" </tr>\n",
|
1277 |
+
" </thead>\n",
|
1278 |
+
" <tbody>\n",
|
1279 |
+
" <tr>\n",
|
1280 |
+
" <th>0</th>\n",
|
1281 |
+
" <td>6</td>\n",
|
1282 |
+
" <td>148</td>\n",
|
1283 |
+
" <td>72</td>\n",
|
1284 |
+
" <td>35</td>\n",
|
1285 |
+
" <td>0</td>\n",
|
1286 |
+
" <td>33.6</td>\n",
|
1287 |
+
" <td>0.627</td>\n",
|
1288 |
+
" <td>50</td>\n",
|
1289 |
+
" </tr>\n",
|
1290 |
+
" <tr>\n",
|
1291 |
+
" <th>1</th>\n",
|
1292 |
+
" <td>1</td>\n",
|
1293 |
+
" <td>85</td>\n",
|
1294 |
+
" <td>66</td>\n",
|
1295 |
+
" <td>29</td>\n",
|
1296 |
+
" <td>0</td>\n",
|
1297 |
+
" <td>26.6</td>\n",
|
1298 |
+
" <td>0.351</td>\n",
|
1299 |
+
" <td>31</td>\n",
|
1300 |
+
" </tr>\n",
|
1301 |
+
" <tr>\n",
|
1302 |
+
" <th>2</th>\n",
|
1303 |
+
" <td>8</td>\n",
|
1304 |
+
" <td>183</td>\n",
|
1305 |
+
" <td>64</td>\n",
|
1306 |
+
" <td>0</td>\n",
|
1307 |
+
" <td>0</td>\n",
|
1308 |
+
" <td>23.3</td>\n",
|
1309 |
+
" <td>0.672</td>\n",
|
1310 |
+
" <td>32</td>\n",
|
1311 |
+
" </tr>\n",
|
1312 |
+
" <tr>\n",
|
1313 |
+
" <th>3</th>\n",
|
1314 |
+
" <td>1</td>\n",
|
1315 |
+
" <td>89</td>\n",
|
1316 |
+
" <td>66</td>\n",
|
1317 |
+
" <td>23</td>\n",
|
1318 |
+
" <td>94</td>\n",
|
1319 |
+
" <td>28.1</td>\n",
|
1320 |
+
" <td>0.167</td>\n",
|
1321 |
+
" <td>21</td>\n",
|
1322 |
+
" </tr>\n",
|
1323 |
+
" <tr>\n",
|
1324 |
+
" <th>4</th>\n",
|
1325 |
+
" <td>0</td>\n",
|
1326 |
+
" <td>137</td>\n",
|
1327 |
+
" <td>40</td>\n",
|
1328 |
+
" <td>35</td>\n",
|
1329 |
+
" <td>168</td>\n",
|
1330 |
+
" <td>43.1</td>\n",
|
1331 |
+
" <td>2.288</td>\n",
|
1332 |
+
" <td>33</td>\n",
|
1333 |
+
" </tr>\n",
|
1334 |
+
" </tbody>\n",
|
1335 |
+
"</table>\n",
|
1336 |
+
"</div>"
|
1337 |
+
],
|
1338 |
+
"text/plain": [
|
1339 |
+
" preg plas pres skin test mass pedi age\n",
|
1340 |
+
"0 6 148 72 35 0 33.6 0.627 50\n",
|
1341 |
+
"1 1 85 66 29 0 26.6 0.351 31\n",
|
1342 |
+
"2 8 183 64 0 0 23.3 0.672 32\n",
|
1343 |
+
"3 1 89 66 23 94 28.1 0.167 21\n",
|
1344 |
+
"4 0 137 40 35 168 43.1 2.288 33"
|
1345 |
+
]
|
1346 |
+
},
|
1347 |
+
"execution_count": 14,
|
1348 |
+
"metadata": {},
|
1349 |
+
"output_type": "execute_result"
|
1350 |
+
}
|
1351 |
+
],
|
1352 |
+
"source": [
|
1353 |
+
"#The data set contains a Class attribute. \n",
|
1354 |
+
"#This is an indicator variable that is non-descriptive and only indicates if the \n",
|
1355 |
+
"# descriptive data indicates a particular event\n",
|
1356 |
+
"\n",
|
1357 |
+
"#Let's separate the data to into 2 dataframes.\n",
|
1358 |
+
"# - The first will contain the descriptive attributes\n",
|
1359 |
+
"# - The second will contain the indication attribute\n",
|
1360 |
+
"\n",
|
1361 |
+
"#Create a new dataframe (X) to contain the descriptive attributes, droping the indicitor attribute\n",
|
1362 |
+
"X = data.drop('class', axis=1)\n",
|
1363 |
+
"\n",
|
1364 |
+
"#Create a new dataframe (Y) to only contain the indicator attribute\n",
|
1365 |
+
"Y = data['class']\n",
|
1366 |
+
"X.head()"
|
1367 |
+
]
|
1368 |
+
},
|
1369 |
+
{
|
1370 |
+
"cell_type": "code",
|
1371 |
+
"execution_count": 15,
|
1372 |
+
"metadata": {},
|
1373 |
+
"outputs": [
|
1374 |
+
{
|
1375 |
+
"data": {
|
1376 |
+
"text/html": [
|
1377 |
+
"<div>\n",
|
1378 |
+
"<style scoped>\n",
|
1379 |
+
" .dataframe tbody tr th:only-of-type {\n",
|
1380 |
+
" vertical-align: middle;\n",
|
1381 |
+
" }\n",
|
1382 |
+
"\n",
|
1383 |
+
" .dataframe tbody tr th {\n",
|
1384 |
+
" vertical-align: top;\n",
|
1385 |
+
" }\n",
|
1386 |
+
"\n",
|
1387 |
+
" .dataframe thead th {\n",
|
1388 |
+
" text-align: right;\n",
|
1389 |
+
" }\n",
|
1390 |
+
"</style>\n",
|
1391 |
+
"<table border=\"1\" class=\"dataframe\">\n",
|
1392 |
+
" <thead>\n",
|
1393 |
+
" <tr style=\"text-align: right;\">\n",
|
1394 |
+
" <th></th>\n",
|
1395 |
+
" <th>preg</th>\n",
|
1396 |
+
" <th>plas</th>\n",
|
1397 |
+
" <th>pres</th>\n",
|
1398 |
+
" <th>skin</th>\n",
|
1399 |
+
" <th>test</th>\n",
|
1400 |
+
" <th>mass</th>\n",
|
1401 |
+
" <th>pedi</th>\n",
|
1402 |
+
" <th>age</th>\n",
|
1403 |
+
" </tr>\n",
|
1404 |
+
" </thead>\n",
|
1405 |
+
" <tbody>\n",
|
1406 |
+
" <tr>\n",
|
1407 |
+
" <th>0</th>\n",
|
1408 |
+
" <td>0.352941</td>\n",
|
1409 |
+
" <td>0.743719</td>\n",
|
1410 |
+
" <td>0.590164</td>\n",
|
1411 |
+
" <td>0.353535</td>\n",
|
1412 |
+
" <td>0.000000</td>\n",
|
1413 |
+
" <td>0.500745</td>\n",
|
1414 |
+
" <td>0.234415</td>\n",
|
1415 |
+
" <td>0.483333</td>\n",
|
1416 |
+
" </tr>\n",
|
1417 |
+
" <tr>\n",
|
1418 |
+
" <th>1</th>\n",
|
1419 |
+
" <td>0.058824</td>\n",
|
1420 |
+
" <td>0.427136</td>\n",
|
1421 |
+
" <td>0.540984</td>\n",
|
1422 |
+
" <td>0.292929</td>\n",
|
1423 |
+
" <td>0.000000</td>\n",
|
1424 |
+
" <td>0.396423</td>\n",
|
1425 |
+
" <td>0.116567</td>\n",
|
1426 |
+
" <td>0.166667</td>\n",
|
1427 |
+
" </tr>\n",
|
1428 |
+
" <tr>\n",
|
1429 |
+
" <th>2</th>\n",
|
1430 |
+
" <td>0.470588</td>\n",
|
1431 |
+
" <td>0.919598</td>\n",
|
1432 |
+
" <td>0.524590</td>\n",
|
1433 |
+
" <td>0.000000</td>\n",
|
1434 |
+
" <td>0.000000</td>\n",
|
1435 |
+
" <td>0.347243</td>\n",
|
1436 |
+
" <td>0.253629</td>\n",
|
1437 |
+
" <td>0.183333</td>\n",
|
1438 |
+
" </tr>\n",
|
1439 |
+
" <tr>\n",
|
1440 |
+
" <th>3</th>\n",
|
1441 |
+
" <td>0.058824</td>\n",
|
1442 |
+
" <td>0.447236</td>\n",
|
1443 |
+
" <td>0.540984</td>\n",
|
1444 |
+
" <td>0.232323</td>\n",
|
1445 |
+
" <td>0.111111</td>\n",
|
1446 |
+
" <td>0.418778</td>\n",
|
1447 |
+
" <td>0.038002</td>\n",
|
1448 |
+
" <td>0.000000</td>\n",
|
1449 |
+
" </tr>\n",
|
1450 |
+
" <tr>\n",
|
1451 |
+
" <th>4</th>\n",
|
1452 |
+
" <td>0.000000</td>\n",
|
1453 |
+
" <td>0.688442</td>\n",
|
1454 |
+
" <td>0.327869</td>\n",
|
1455 |
+
" <td>0.353535</td>\n",
|
1456 |
+
" <td>0.198582</td>\n",
|
1457 |
+
" <td>0.642325</td>\n",
|
1458 |
+
" <td>0.943638</td>\n",
|
1459 |
+
" <td>0.200000</td>\n",
|
1460 |
+
" </tr>\n",
|
1461 |
+
" </tbody>\n",
|
1462 |
+
"</table>\n",
|
1463 |
+
"</div>"
|
1464 |
+
],
|
1465 |
+
"text/plain": [
|
1466 |
+
" preg plas pres skin test mass pedi \\\n",
|
1467 |
+
"0 0.352941 0.743719 0.590164 0.353535 0.000000 0.500745 0.234415 \n",
|
1468 |
+
"1 0.058824 0.427136 0.540984 0.292929 0.000000 0.396423 0.116567 \n",
|
1469 |
+
"2 0.470588 0.919598 0.524590 0.000000 0.000000 0.347243 0.253629 \n",
|
1470 |
+
"3 0.058824 0.447236 0.540984 0.232323 0.111111 0.418778 0.038002 \n",
|
1471 |
+
"4 0.000000 0.688442 0.327869 0.353535 0.198582 0.642325 0.943638 \n",
|
1472 |
+
"\n",
|
1473 |
+
" age \n",
|
1474 |
+
"0 0.483333 \n",
|
1475 |
+
"1 0.166667 \n",
|
1476 |
+
"2 0.183333 \n",
|
1477 |
+
"3 0.000000 \n",
|
1478 |
+
"4 0.200000 "
|
1479 |
+
]
|
1480 |
+
},
|
1481 |
+
"execution_count": 15,
|
1482 |
+
"metadata": {},
|
1483 |
+
"output_type": "execute_result"
|
1484 |
+
}
|
1485 |
+
],
|
1486 |
+
"source": [
|
1487 |
+
"from sklearn.preprocessing import MinMaxScaler\n",
|
1488 |
+
"X_copy = X.copy() #We create a copy so we can still refer to the original dataframe later\n",
|
1489 |
+
"scaler = MinMaxScaler()\n",
|
1490 |
+
"#Create list of Columns to transform/scale\n",
|
1491 |
+
"X_columns = X.columns\n",
|
1492 |
+
"#Create a new dataframe\n",
|
1493 |
+
"X_scaled = pd.DataFrame(scaler.fit_transform(X_copy), columns=X_columns)\n",
|
1494 |
+
"X_scaled.head()"
|
1495 |
+
]
|
1496 |
+
},
|
1497 |
+
{
|
1498 |
+
"cell_type": "code",
|
1499 |
+
"execution_count": 16,
|
1500 |
+
"metadata": {},
|
1501 |
+
"outputs": [
|
1502 |
+
{
|
1503 |
+
"data": {
|
1504 |
+
"text/html": [
|
1505 |
+
"<div>\n",
|
1506 |
+
"<style scoped>\n",
|
1507 |
+
" .dataframe tbody tr th:only-of-type {\n",
|
1508 |
+
" vertical-align: middle;\n",
|
1509 |
+
" }\n",
|
1510 |
+
"\n",
|
1511 |
+
" .dataframe tbody tr th {\n",
|
1512 |
+
" vertical-align: top;\n",
|
1513 |
+
" }\n",
|
1514 |
+
"\n",
|
1515 |
+
" .dataframe thead th {\n",
|
1516 |
+
" text-align: right;\n",
|
1517 |
+
" }\n",
|
1518 |
+
"</style>\n",
|
1519 |
+
"<table border=\"1\" class=\"dataframe\">\n",
|
1520 |
+
" <thead>\n",
|
1521 |
+
" <tr style=\"text-align: right;\">\n",
|
1522 |
+
" <th></th>\n",
|
1523 |
+
" <th>preg</th>\n",
|
1524 |
+
" <th>plas</th>\n",
|
1525 |
+
" <th>pres</th>\n",
|
1526 |
+
" <th>skin</th>\n",
|
1527 |
+
" <th>test</th>\n",
|
1528 |
+
" <th>mass</th>\n",
|
1529 |
+
" <th>pedi</th>\n",
|
1530 |
+
" <th>age</th>\n",
|
1531 |
+
" <th>class</th>\n",
|
1532 |
+
" </tr>\n",
|
1533 |
+
" </thead>\n",
|
1534 |
+
" <tbody>\n",
|
1535 |
+
" <tr>\n",
|
1536 |
+
" <th>count</th>\n",
|
1537 |
+
" <td>768.000000</td>\n",
|
1538 |
+
" <td>768.000000</td>\n",
|
1539 |
+
" <td>768.000000</td>\n",
|
1540 |
+
" <td>768.000000</td>\n",
|
1541 |
+
" <td>768.000000</td>\n",
|
1542 |
+
" <td>768.000000</td>\n",
|
1543 |
+
" <td>768.000000</td>\n",
|
1544 |
+
" <td>768.000000</td>\n",
|
1545 |
+
" <td>768.000000</td>\n",
|
1546 |
+
" </tr>\n",
|
1547 |
+
" <tr>\n",
|
1548 |
+
" <th>mean</th>\n",
|
1549 |
+
" <td>3.845052</td>\n",
|
1550 |
+
" <td>120.894531</td>\n",
|
1551 |
+
" <td>69.105469</td>\n",
|
1552 |
+
" <td>20.536458</td>\n",
|
1553 |
+
" <td>79.799479</td>\n",
|
1554 |
+
" <td>31.992578</td>\n",
|
1555 |
+
" <td>0.471876</td>\n",
|
1556 |
+
" <td>33.240885</td>\n",
|
1557 |
+
" <td>0.348958</td>\n",
|
1558 |
+
" </tr>\n",
|
1559 |
+
" <tr>\n",
|
1560 |
+
" <th>std</th>\n",
|
1561 |
+
" <td>3.369578</td>\n",
|
1562 |
+
" <td>31.972618</td>\n",
|
1563 |
+
" <td>19.355807</td>\n",
|
1564 |
+
" <td>15.952218</td>\n",
|
1565 |
+
" <td>115.244002</td>\n",
|
1566 |
+
" <td>7.884160</td>\n",
|
1567 |
+
" <td>0.331329</td>\n",
|
1568 |
+
" <td>11.760232</td>\n",
|
1569 |
+
" <td>0.476951</td>\n",
|
1570 |
+
" </tr>\n",
|
1571 |
+
" <tr>\n",
|
1572 |
+
" <th>min</th>\n",
|
1573 |
+
" <td>0.000000</td>\n",
|
1574 |
+
" <td>0.000000</td>\n",
|
1575 |
+
" <td>0.000000</td>\n",
|
1576 |
+
" <td>0.000000</td>\n",
|
1577 |
+
" <td>0.000000</td>\n",
|
1578 |
+
" <td>0.000000</td>\n",
|
1579 |
+
" <td>0.078000</td>\n",
|
1580 |
+
" <td>21.000000</td>\n",
|
1581 |
+
" <td>0.000000</td>\n",
|
1582 |
+
" </tr>\n",
|
1583 |
+
" <tr>\n",
|
1584 |
+
" <th>25%</th>\n",
|
1585 |
+
" <td>1.000000</td>\n",
|
1586 |
+
" <td>99.000000</td>\n",
|
1587 |
+
" <td>62.000000</td>\n",
|
1588 |
+
" <td>0.000000</td>\n",
|
1589 |
+
" <td>0.000000</td>\n",
|
1590 |
+
" <td>27.300000</td>\n",
|
1591 |
+
" <td>0.243750</td>\n",
|
1592 |
+
" <td>24.000000</td>\n",
|
1593 |
+
" <td>0.000000</td>\n",
|
1594 |
+
" </tr>\n",
|
1595 |
+
" <tr>\n",
|
1596 |
+
" <th>50%</th>\n",
|
1597 |
+
" <td>3.000000</td>\n",
|
1598 |
+
" <td>117.000000</td>\n",
|
1599 |
+
" <td>72.000000</td>\n",
|
1600 |
+
" <td>23.000000</td>\n",
|
1601 |
+
" <td>30.500000</td>\n",
|
1602 |
+
" <td>32.000000</td>\n",
|
1603 |
+
" <td>0.372500</td>\n",
|
1604 |
+
" <td>29.000000</td>\n",
|
1605 |
+
" <td>0.000000</td>\n",
|
1606 |
+
" </tr>\n",
|
1607 |
+
" <tr>\n",
|
1608 |
+
" <th>75%</th>\n",
|
1609 |
+
" <td>6.000000</td>\n",
|
1610 |
+
" <td>140.250000</td>\n",
|
1611 |
+
" <td>80.000000</td>\n",
|
1612 |
+
" <td>32.000000</td>\n",
|
1613 |
+
" <td>127.250000</td>\n",
|
1614 |
+
" <td>36.600000</td>\n",
|
1615 |
+
" <td>0.626250</td>\n",
|
1616 |
+
" <td>41.000000</td>\n",
|
1617 |
+
" <td>1.000000</td>\n",
|
1618 |
+
" </tr>\n",
|
1619 |
+
" <tr>\n",
|
1620 |
+
" <th>max</th>\n",
|
1621 |
+
" <td>17.000000</td>\n",
|
1622 |
+
" <td>199.000000</td>\n",
|
1623 |
+
" <td>122.000000</td>\n",
|
1624 |
+
" <td>99.000000</td>\n",
|
1625 |
+
" <td>846.000000</td>\n",
|
1626 |
+
" <td>67.100000</td>\n",
|
1627 |
+
" <td>2.420000</td>\n",
|
1628 |
+
" <td>81.000000</td>\n",
|
1629 |
+
" <td>1.000000</td>\n",
|
1630 |
+
" </tr>\n",
|
1631 |
+
" </tbody>\n",
|
1632 |
+
"</table>\n",
|
1633 |
+
"</div>"
|
1634 |
+
],
|
1635 |
+
"text/plain": [
|
1636 |
+
" preg plas pres skin test mass \\\n",
|
1637 |
+
"count 768.000000 768.000000 768.000000 768.000000 768.000000 768.000000 \n",
|
1638 |
+
"mean 3.845052 120.894531 69.105469 20.536458 79.799479 31.992578 \n",
|
1639 |
+
"std 3.369578 31.972618 19.355807 15.952218 115.244002 7.884160 \n",
|
1640 |
+
"min 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 \n",
|
1641 |
+
"25% 1.000000 99.000000 62.000000 0.000000 0.000000 27.300000 \n",
|
1642 |
+
"50% 3.000000 117.000000 72.000000 23.000000 30.500000 32.000000 \n",
|
1643 |
+
"75% 6.000000 140.250000 80.000000 32.000000 127.250000 36.600000 \n",
|
1644 |
+
"max 17.000000 199.000000 122.000000 99.000000 846.000000 67.100000 \n",
|
1645 |
+
"\n",
|
1646 |
+
" pedi age class \n",
|
1647 |
+
"count 768.000000 768.000000 768.000000 \n",
|
1648 |
+
"mean 0.471876 33.240885 0.348958 \n",
|
1649 |
+
"std 0.331329 11.760232 0.476951 \n",
|
1650 |
+
"min 0.078000 21.000000 0.000000 \n",
|
1651 |
+
"25% 0.243750 24.000000 0.000000 \n",
|
1652 |
+
"50% 0.372500 29.000000 0.000000 \n",
|
1653 |
+
"75% 0.626250 41.000000 1.000000 \n",
|
1654 |
+
"max 2.420000 81.000000 1.000000 "
|
1655 |
+
]
|
1656 |
+
},
|
1657 |
+
"execution_count": 16,
|
1658 |
+
"metadata": {},
|
1659 |
+
"output_type": "execute_result"
|
1660 |
+
}
|
1661 |
+
],
|
1662 |
+
"source": [
|
1663 |
+
"data.describe()"
|
1664 |
+
]
|
1665 |
+
},
|
1666 |
+
{
|
1667 |
+
"cell_type": "code",
|
1668 |
+
"execution_count": 17,
|
1669 |
+
"metadata": {},
|
1670 |
+
"outputs": [
|
1671 |
+
{
|
1672 |
+
"data": {
|
1673 |
+
"text/html": [
|
1674 |
+
"<div>\n",
|
1675 |
+
"<style scoped>\n",
|
1676 |
+
" .dataframe tbody tr th:only-of-type {\n",
|
1677 |
+
" vertical-align: middle;\n",
|
1678 |
+
" }\n",
|
1679 |
+
"\n",
|
1680 |
+
" .dataframe tbody tr th {\n",
|
1681 |
+
" vertical-align: top;\n",
|
1682 |
+
" }\n",
|
1683 |
+
"\n",
|
1684 |
+
" .dataframe thead th {\n",
|
1685 |
+
" text-align: right;\n",
|
1686 |
+
" }\n",
|
1687 |
+
"</style>\n",
|
1688 |
+
"<table border=\"1\" class=\"dataframe\">\n",
|
1689 |
+
" <thead>\n",
|
1690 |
+
" <tr style=\"text-align: right;\">\n",
|
1691 |
+
" <th></th>\n",
|
1692 |
+
" <th>preg</th>\n",
|
1693 |
+
" <th>plas</th>\n",
|
1694 |
+
" <th>pres</th>\n",
|
1695 |
+
" <th>skin</th>\n",
|
1696 |
+
" <th>test</th>\n",
|
1697 |
+
" <th>mass</th>\n",
|
1698 |
+
" <th>pedi</th>\n",
|
1699 |
+
" <th>age</th>\n",
|
1700 |
+
" </tr>\n",
|
1701 |
+
" </thead>\n",
|
1702 |
+
" <tbody>\n",
|
1703 |
+
" <tr>\n",
|
1704 |
+
" <th>count</th>\n",
|
1705 |
+
" <td>768.000000</td>\n",
|
1706 |
+
" <td>768.000000</td>\n",
|
1707 |
+
" <td>768.000000</td>\n",
|
1708 |
+
" <td>768.000000</td>\n",
|
1709 |
+
" <td>768.000000</td>\n",
|
1710 |
+
" <td>768.000000</td>\n",
|
1711 |
+
" <td>768.000000</td>\n",
|
1712 |
+
" <td>768.000000</td>\n",
|
1713 |
+
" </tr>\n",
|
1714 |
+
" <tr>\n",
|
1715 |
+
" <th>mean</th>\n",
|
1716 |
+
" <td>0.226180</td>\n",
|
1717 |
+
" <td>0.607510</td>\n",
|
1718 |
+
" <td>0.566438</td>\n",
|
1719 |
+
" <td>0.207439</td>\n",
|
1720 |
+
" <td>0.094326</td>\n",
|
1721 |
+
" <td>0.476790</td>\n",
|
1722 |
+
" <td>0.168179</td>\n",
|
1723 |
+
" <td>0.204015</td>\n",
|
1724 |
+
" </tr>\n",
|
1725 |
+
" <tr>\n",
|
1726 |
+
" <th>std</th>\n",
|
1727 |
+
" <td>0.198210</td>\n",
|
1728 |
+
" <td>0.160666</td>\n",
|
1729 |
+
" <td>0.158654</td>\n",
|
1730 |
+
" <td>0.161134</td>\n",
|
1731 |
+
" <td>0.136222</td>\n",
|
1732 |
+
" <td>0.117499</td>\n",
|
1733 |
+
" <td>0.141473</td>\n",
|
1734 |
+
" <td>0.196004</td>\n",
|
1735 |
+
" </tr>\n",
|
1736 |
+
" <tr>\n",
|
1737 |
+
" <th>min</th>\n",
|
1738 |
+
" <td>0.000000</td>\n",
|
1739 |
+
" <td>0.000000</td>\n",
|
1740 |
+
" <td>0.000000</td>\n",
|
1741 |
+
" <td>0.000000</td>\n",
|
1742 |
+
" <td>0.000000</td>\n",
|
1743 |
+
" <td>0.000000</td>\n",
|
1744 |
+
" <td>0.000000</td>\n",
|
1745 |
+
" <td>0.000000</td>\n",
|
1746 |
+
" </tr>\n",
|
1747 |
+
" <tr>\n",
|
1748 |
+
" <th>25%</th>\n",
|
1749 |
+
" <td>0.058824</td>\n",
|
1750 |
+
" <td>0.497487</td>\n",
|
1751 |
+
" <td>0.508197</td>\n",
|
1752 |
+
" <td>0.000000</td>\n",
|
1753 |
+
" <td>0.000000</td>\n",
|
1754 |
+
" <td>0.406855</td>\n",
|
1755 |
+
" <td>0.070773</td>\n",
|
1756 |
+
" <td>0.050000</td>\n",
|
1757 |
+
" </tr>\n",
|
1758 |
+
" <tr>\n",
|
1759 |
+
" <th>50%</th>\n",
|
1760 |
+
" <td>0.176471</td>\n",
|
1761 |
+
" <td>0.587940</td>\n",
|
1762 |
+
" <td>0.590164</td>\n",
|
1763 |
+
" <td>0.232323</td>\n",
|
1764 |
+
" <td>0.036052</td>\n",
|
1765 |
+
" <td>0.476900</td>\n",
|
1766 |
+
" <td>0.125747</td>\n",
|
1767 |
+
" <td>0.133333</td>\n",
|
1768 |
+
" </tr>\n",
|
1769 |
+
" <tr>\n",
|
1770 |
+
" <th>75%</th>\n",
|
1771 |
+
" <td>0.352941</td>\n",
|
1772 |
+
" <td>0.704774</td>\n",
|
1773 |
+
" <td>0.655738</td>\n",
|
1774 |
+
" <td>0.323232</td>\n",
|
1775 |
+
" <td>0.150414</td>\n",
|
1776 |
+
" <td>0.545455</td>\n",
|
1777 |
+
" <td>0.234095</td>\n",
|
1778 |
+
" <td>0.333333</td>\n",
|
1779 |
+
" </tr>\n",
|
1780 |
+
" <tr>\n",
|
1781 |
+
" <th>max</th>\n",
|
1782 |
+
" <td>1.000000</td>\n",
|
1783 |
+
" <td>1.000000</td>\n",
|
1784 |
+
" <td>1.000000</td>\n",
|
1785 |
+
" <td>1.000000</td>\n",
|
1786 |
+
" <td>1.000000</td>\n",
|
1787 |
+
" <td>1.000000</td>\n",
|
1788 |
+
" <td>1.000000</td>\n",
|
1789 |
+
" <td>1.000000</td>\n",
|
1790 |
+
" </tr>\n",
|
1791 |
+
" </tbody>\n",
|
1792 |
+
"</table>\n",
|
1793 |
+
"</div>"
|
1794 |
+
],
|
1795 |
+
"text/plain": [
|
1796 |
+
" preg plas pres skin test mass \\\n",
|
1797 |
+
"count 768.000000 768.000000 768.000000 768.000000 768.000000 768.000000 \n",
|
1798 |
+
"mean 0.226180 0.607510 0.566438 0.207439 0.094326 0.476790 \n",
|
1799 |
+
"std 0.198210 0.160666 0.158654 0.161134 0.136222 0.117499 \n",
|
1800 |
+
"min 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 \n",
|
1801 |
+
"25% 0.058824 0.497487 0.508197 0.000000 0.000000 0.406855 \n",
|
1802 |
+
"50% 0.176471 0.587940 0.590164 0.232323 0.036052 0.476900 \n",
|
1803 |
+
"75% 0.352941 0.704774 0.655738 0.323232 0.150414 0.545455 \n",
|
1804 |
+
"max 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 \n",
|
1805 |
+
"\n",
|
1806 |
+
" pedi age \n",
|
1807 |
+
"count 768.000000 768.000000 \n",
|
1808 |
+
"mean 0.168179 0.204015 \n",
|
1809 |
+
"std 0.141473 0.196004 \n",
|
1810 |
+
"min 0.000000 0.000000 \n",
|
1811 |
+
"25% 0.070773 0.050000 \n",
|
1812 |
+
"50% 0.125747 0.133333 \n",
|
1813 |
+
"75% 0.234095 0.333333 \n",
|
1814 |
+
"max 1.000000 1.000000 "
|
1815 |
+
]
|
1816 |
+
},
|
1817 |
+
"execution_count": 17,
|
1818 |
+
"metadata": {},
|
1819 |
+
"output_type": "execute_result"
|
1820 |
+
}
|
1821 |
+
],
|
1822 |
+
"source": [
|
1823 |
+
"X_scaled.describe()"
|
1824 |
+
]
|
1825 |
+
},
|
1826 |
+
{
|
1827 |
+
"cell_type": "code",
|
1828 |
+
"execution_count": 18,
|
1829 |
+
"metadata": {},
|
1830 |
+
"outputs": [],
|
1831 |
+
"source": [
|
1832 |
+
"#Question: Add code (below) to create a new dataframe, where only the 'preg' and 'plas' attributes are transformed"
|
1833 |
+
]
|
1834 |
+
},
|
1835 |
+
{
|
1836 |
+
"cell_type": "code",
|
1837 |
+
"execution_count": 19,
|
1838 |
+
"metadata": {},
|
1839 |
+
"outputs": [
|
1840 |
+
{
|
1841 |
+
"data": {
|
1842 |
+
"text/html": [
|
1843 |
+
"<div>\n",
|
1844 |
+
"<style scoped>\n",
|
1845 |
+
" .dataframe tbody tr th:only-of-type {\n",
|
1846 |
+
" vertical-align: middle;\n",
|
1847 |
+
" }\n",
|
1848 |
+
"\n",
|
1849 |
+
" .dataframe tbody tr th {\n",
|
1850 |
+
" vertical-align: top;\n",
|
1851 |
+
" }\n",
|
1852 |
+
"\n",
|
1853 |
+
" .dataframe thead th {\n",
|
1854 |
+
" text-align: right;\n",
|
1855 |
+
" }\n",
|
1856 |
+
"</style>\n",
|
1857 |
+
"<table border=\"1\" class=\"dataframe\">\n",
|
1858 |
+
" <thead>\n",
|
1859 |
+
" <tr style=\"text-align: right;\">\n",
|
1860 |
+
" <th></th>\n",
|
1861 |
+
" <th>preg</th>\n",
|
1862 |
+
" <th>plas</th>\n",
|
1863 |
+
" <th>pres</th>\n",
|
1864 |
+
" <th>skin</th>\n",
|
1865 |
+
" <th>test</th>\n",
|
1866 |
+
" <th>mass</th>\n",
|
1867 |
+
" <th>pedi</th>\n",
|
1868 |
+
" <th>age</th>\n",
|
1869 |
+
" </tr>\n",
|
1870 |
+
" </thead>\n",
|
1871 |
+
" <tbody>\n",
|
1872 |
+
" <tr>\n",
|
1873 |
+
" <th>0</th>\n",
|
1874 |
+
" <td>0.352941</td>\n",
|
1875 |
+
" <td>0.743719</td>\n",
|
1876 |
+
" <td>72</td>\n",
|
1877 |
+
" <td>35</td>\n",
|
1878 |
+
" <td>0</td>\n",
|
1879 |
+
" <td>33.6</td>\n",
|
1880 |
+
" <td>0.627</td>\n",
|
1881 |
+
" <td>50</td>\n",
|
1882 |
+
" </tr>\n",
|
1883 |
+
" <tr>\n",
|
1884 |
+
" <th>1</th>\n",
|
1885 |
+
" <td>0.058824</td>\n",
|
1886 |
+
" <td>0.427136</td>\n",
|
1887 |
+
" <td>66</td>\n",
|
1888 |
+
" <td>29</td>\n",
|
1889 |
+
" <td>0</td>\n",
|
1890 |
+
" <td>26.6</td>\n",
|
1891 |
+
" <td>0.351</td>\n",
|
1892 |
+
" <td>31</td>\n",
|
1893 |
+
" </tr>\n",
|
1894 |
+
" <tr>\n",
|
1895 |
+
" <th>2</th>\n",
|
1896 |
+
" <td>0.470588</td>\n",
|
1897 |
+
" <td>0.919598</td>\n",
|
1898 |
+
" <td>64</td>\n",
|
1899 |
+
" <td>0</td>\n",
|
1900 |
+
" <td>0</td>\n",
|
1901 |
+
" <td>23.3</td>\n",
|
1902 |
+
" <td>0.672</td>\n",
|
1903 |
+
" <td>32</td>\n",
|
1904 |
+
" </tr>\n",
|
1905 |
+
" <tr>\n",
|
1906 |
+
" <th>3</th>\n",
|
1907 |
+
" <td>0.058824</td>\n",
|
1908 |
+
" <td>0.447236</td>\n",
|
1909 |
+
" <td>66</td>\n",
|
1910 |
+
" <td>23</td>\n",
|
1911 |
+
" <td>94</td>\n",
|
1912 |
+
" <td>28.1</td>\n",
|
1913 |
+
" <td>0.167</td>\n",
|
1914 |
+
" <td>21</td>\n",
|
1915 |
+
" </tr>\n",
|
1916 |
+
" <tr>\n",
|
1917 |
+
" <th>4</th>\n",
|
1918 |
+
" <td>0.000000</td>\n",
|
1919 |
+
" <td>0.688442</td>\n",
|
1920 |
+
" <td>40</td>\n",
|
1921 |
+
" <td>35</td>\n",
|
1922 |
+
" <td>168</td>\n",
|
1923 |
+
" <td>43.1</td>\n",
|
1924 |
+
" <td>2.288</td>\n",
|
1925 |
+
" <td>33</td>\n",
|
1926 |
+
" </tr>\n",
|
1927 |
+
" </tbody>\n",
|
1928 |
+
"</table>\n",
|
1929 |
+
"</div>"
|
1930 |
+
],
|
1931 |
+
"text/plain": [
|
1932 |
+
" preg plas pres skin test mass pedi age\n",
|
1933 |
+
"0 0.352941 0.743719 72 35 0 33.6 0.627 50\n",
|
1934 |
+
"1 0.058824 0.427136 66 29 0 26.6 0.351 31\n",
|
1935 |
+
"2 0.470588 0.919598 64 0 0 23.3 0.672 32\n",
|
1936 |
+
"3 0.058824 0.447236 66 23 94 28.1 0.167 21\n",
|
1937 |
+
"4 0.000000 0.688442 40 35 168 43.1 2.288 33"
|
1938 |
+
]
|
1939 |
+
},
|
1940 |
+
"execution_count": 19,
|
1941 |
+
"metadata": {},
|
1942 |
+
"output_type": "execute_result"
|
1943 |
+
}
|
1944 |
+
],
|
1945 |
+
"source": [
|
1946 |
+
"from sklearn.preprocessing import MinMaxScaler\n",
|
1947 |
+
"X_copy = X.copy()\n",
|
1948 |
+
"scaler = MinMaxScaler()\n",
|
1949 |
+
"X_copy[['preg', 'plas']] = scaler.fit_transform(X_copy[['preg', 'plas']])\n",
|
1950 |
+
"X_copy.head()"
|
1951 |
+
]
|
1952 |
+
},
|
1953 |
+
{
|
1954 |
+
"cell_type": "code",
|
1955 |
+
"execution_count": null,
|
1956 |
+
"metadata": {},
|
1957 |
+
"outputs": [],
|
1958 |
+
"source": []
|
1959 |
+
},
|
1960 |
+
{
|
1961 |
+
"cell_type": "code",
|
1962 |
+
"execution_count": null,
|
1963 |
+
"metadata": {},
|
1964 |
+
"outputs": [],
|
1965 |
+
"source": []
|
1966 |
+
},
|
1967 |
+
{
|
1968 |
+
"cell_type": "code",
|
1969 |
+
"execution_count": null,
|
1970 |
+
"metadata": {},
|
1971 |
+
"outputs": [],
|
1972 |
+
"source": []
|
1973 |
+
}
|
1974 |
+
],
|
1975 |
+
"metadata": {
|
1976 |
+
"kernelspec": {
|
1977 |
+
"display_name": "Python 3",
|
1978 |
+
"language": "python",
|
1979 |
+
"name": "python3"
|
1980 |
+
},
|
1981 |
+
"language_info": {
|
1982 |
+
"codemirror_mode": {
|
1983 |
+
"name": "ipython",
|
1984 |
+
"version": 3
|
1985 |
+
},
|
1986 |
+
"file_extension": ".py",
|
1987 |
+
"mimetype": "text/x-python",
|
1988 |
+
"name": "python",
|
1989 |
+
"nbconvert_exporter": "python",
|
1990 |
+
"pygments_lexer": "ipython3",
|
1991 |
+
"version": "3.7.3"
|
1992 |
+
}
|
1993 |
+
},
|
1994 |
+
"nbformat": 4,
|
1995 |
+
"nbformat_minor": 2
|
1996 |
+
}
|
Data Analitics/Week 4/.ipynb_checkpoints/TU257-Lab3-4-Correlation-checkpoint.ipynb
ADDED
The diff for this file is too large to render.
See raw diff
|
|
Data Analitics/Week 4/.ipynb_checkpoints/TU257-Lab3-5-Sampling-and-Unbalanced-checkpoint.ipynb
ADDED
@@ -0,0 +1,538 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"cells": [
|
3 |
+
{
|
4 |
+
"cell_type": "code",
|
5 |
+
"execution_count": 1,
|
6 |
+
"metadata": {},
|
7 |
+
"outputs": [
|
8 |
+
{
|
9 |
+
"data": {
|
10 |
+
"text/html": [
|
11 |
+
"<div>\n",
|
12 |
+
"<style scoped>\n",
|
13 |
+
" .dataframe tbody tr th:only-of-type {\n",
|
14 |
+
" vertical-align: middle;\n",
|
15 |
+
" }\n",
|
16 |
+
"\n",
|
17 |
+
" .dataframe tbody tr th {\n",
|
18 |
+
" vertical-align: top;\n",
|
19 |
+
" }\n",
|
20 |
+
"\n",
|
21 |
+
" .dataframe thead th {\n",
|
22 |
+
" text-align: right;\n",
|
23 |
+
" }\n",
|
24 |
+
"</style>\n",
|
25 |
+
"<table border=\"1\" class=\"dataframe\">\n",
|
26 |
+
" <thead>\n",
|
27 |
+
" <tr style=\"text-align: right;\">\n",
|
28 |
+
" <th></th>\n",
|
29 |
+
" <th>preg</th>\n",
|
30 |
+
" <th>plas</th>\n",
|
31 |
+
" <th>pres</th>\n",
|
32 |
+
" <th>skin</th>\n",
|
33 |
+
" <th>test</th>\n",
|
34 |
+
" <th>mass</th>\n",
|
35 |
+
" <th>pedi</th>\n",
|
36 |
+
" <th>age</th>\n",
|
37 |
+
" <th>class</th>\n",
|
38 |
+
" </tr>\n",
|
39 |
+
" </thead>\n",
|
40 |
+
" <tbody>\n",
|
41 |
+
" <tr>\n",
|
42 |
+
" <th>0</th>\n",
|
43 |
+
" <td>6</td>\n",
|
44 |
+
" <td>148</td>\n",
|
45 |
+
" <td>72</td>\n",
|
46 |
+
" <td>35</td>\n",
|
47 |
+
" <td>0</td>\n",
|
48 |
+
" <td>33.6</td>\n",
|
49 |
+
" <td>0.627</td>\n",
|
50 |
+
" <td>50</td>\n",
|
51 |
+
" <td>1</td>\n",
|
52 |
+
" </tr>\n",
|
53 |
+
" <tr>\n",
|
54 |
+
" <th>1</th>\n",
|
55 |
+
" <td>1</td>\n",
|
56 |
+
" <td>85</td>\n",
|
57 |
+
" <td>66</td>\n",
|
58 |
+
" <td>29</td>\n",
|
59 |
+
" <td>0</td>\n",
|
60 |
+
" <td>26.6</td>\n",
|
61 |
+
" <td>0.351</td>\n",
|
62 |
+
" <td>31</td>\n",
|
63 |
+
" <td>0</td>\n",
|
64 |
+
" </tr>\n",
|
65 |
+
" <tr>\n",
|
66 |
+
" <th>2</th>\n",
|
67 |
+
" <td>8</td>\n",
|
68 |
+
" <td>183</td>\n",
|
69 |
+
" <td>64</td>\n",
|
70 |
+
" <td>0</td>\n",
|
71 |
+
" <td>0</td>\n",
|
72 |
+
" <td>23.3</td>\n",
|
73 |
+
" <td>0.672</td>\n",
|
74 |
+
" <td>32</td>\n",
|
75 |
+
" <td>1</td>\n",
|
76 |
+
" </tr>\n",
|
77 |
+
" <tr>\n",
|
78 |
+
" <th>3</th>\n",
|
79 |
+
" <td>1</td>\n",
|
80 |
+
" <td>89</td>\n",
|
81 |
+
" <td>66</td>\n",
|
82 |
+
" <td>23</td>\n",
|
83 |
+
" <td>94</td>\n",
|
84 |
+
" <td>28.1</td>\n",
|
85 |
+
" <td>0.167</td>\n",
|
86 |
+
" <td>21</td>\n",
|
87 |
+
" <td>0</td>\n",
|
88 |
+
" </tr>\n",
|
89 |
+
" <tr>\n",
|
90 |
+
" <th>4</th>\n",
|
91 |
+
" <td>0</td>\n",
|
92 |
+
" <td>137</td>\n",
|
93 |
+
" <td>40</td>\n",
|
94 |
+
" <td>35</td>\n",
|
95 |
+
" <td>168</td>\n",
|
96 |
+
" <td>43.1</td>\n",
|
97 |
+
" <td>2.288</td>\n",
|
98 |
+
" <td>33</td>\n",
|
99 |
+
" <td>1</td>\n",
|
100 |
+
" </tr>\n",
|
101 |
+
" </tbody>\n",
|
102 |
+
"</table>\n",
|
103 |
+
"</div>"
|
104 |
+
],
|
105 |
+
"text/plain": [
|
106 |
+
" preg plas pres skin test mass pedi age class\n",
|
107 |
+
"0 6 148 72 35 0 33.6 0.627 50 1\n",
|
108 |
+
"1 1 85 66 29 0 26.6 0.351 31 0\n",
|
109 |
+
"2 8 183 64 0 0 23.3 0.672 32 1\n",
|
110 |
+
"3 1 89 66 23 94 28.1 0.167 21 0\n",
|
111 |
+
"4 0 137 40 35 168 43.1 2.288 33 1"
|
112 |
+
]
|
113 |
+
},
|
114 |
+
"execution_count": 1,
|
115 |
+
"metadata": {},
|
116 |
+
"output_type": "execute_result"
|
117 |
+
}
|
118 |
+
],
|
119 |
+
"source": [
|
120 |
+
"#Import the data set\n",
|
121 |
+
"\n",
|
122 |
+
"import pandas as pd\n",
|
123 |
+
"columns = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']\n",
|
124 |
+
"data = pd.read_csv('/Users/brendan.tierney/Dropbox/4-Datasets/pima-indians-diabetes.csv', names=columns)\n",
|
125 |
+
"data.head()"
|
126 |
+
]
|
127 |
+
},
|
128 |
+
{
|
129 |
+
"cell_type": "code",
|
130 |
+
"execution_count": 2,
|
131 |
+
"metadata": {},
|
132 |
+
"outputs": [
|
133 |
+
{
|
134 |
+
"data": {
|
135 |
+
"text/plain": [
|
136 |
+
"(768, 9)"
|
137 |
+
]
|
138 |
+
},
|
139 |
+
"execution_count": 2,
|
140 |
+
"metadata": {},
|
141 |
+
"output_type": "execute_result"
|
142 |
+
}
|
143 |
+
],
|
144 |
+
"source": [
|
145 |
+
"data.shape"
|
146 |
+
]
|
147 |
+
},
|
148 |
+
{
|
149 |
+
"cell_type": "code",
|
150 |
+
"execution_count": 4,
|
151 |
+
"metadata": {},
|
152 |
+
"outputs": [
|
153 |
+
{
|
154 |
+
"data": {
|
155 |
+
"text/plain": [
|
156 |
+
"0 500\n",
|
157 |
+
"1 268\n",
|
158 |
+
"Name: class, dtype: int64"
|
159 |
+
]
|
160 |
+
},
|
161 |
+
"execution_count": 4,
|
162 |
+
"metadata": {},
|
163 |
+
"output_type": "execute_result"
|
164 |
+
}
|
165 |
+
],
|
166 |
+
"source": [
|
167 |
+
"data['class'].value_counts()"
|
168 |
+
]
|
169 |
+
},
|
170 |
+
{
|
171 |
+
"cell_type": "code",
|
172 |
+
"execution_count": 9,
|
173 |
+
"metadata": {},
|
174 |
+
"outputs": [
|
175 |
+
{
|
176 |
+
"data": {
|
177 |
+
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEFCAYAAAAYKqc0AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/YYfK9AAAACXBIWXMAAAsTAAALEwEAmpwYAAAQN0lEQVR4nO3df6zddX3H8edrVECFUaB3FVuwbNQ5zAKaihB/xEk2BefKH8pQp5WwNFkg0TB/dGoUjS64ZAPNnFkzDFX8AUORTpgOUaJG+VEUUESlY7C2Aq3QVpT5A3nvj/MpnNZ7e2/be++hnz4fycn5fD+fz/d836e9ffV7P+d7zklVIUnqy++MugBJ0vQz3CWpQ4a7JHXIcJekDhnuktQhw12SOmS4SxNIMpbkB0mePOpaxpPkgFbf2Khr0ROP4a6RSvLaJGuS/CzJvUn+M8kLZ+G4leSYSaatAC6uqv9r+1yX5K9nuraJ7Hj8qvol8DEGdUrbMdw1MknOBS4E/h6YDxwF/AuwdIRlAYOzYmAZcMk0Puac6XqsIZ8ClrV6pccY7hqJJIcA7wPOrqrPVdXPq+rXVfUfVfXWNueAJBcm+XG7XbgtxJK8Mck3dnjMx87Gk1yc5CNJrkryUJIbkvxBG/ta2+XW9hvDX45T4vOBLVW1vu3zAeBFwD+3ff659X8oybokP01yc5IXDdVzXpLLk1yS5KfAG5McneRrraYvtxovGdrnxCTfTLIlya1JXrKz47f6NgMn7v7fhnpkuGtUTgIOBK7YyZx3Mgit44HjgBOAd+3CMc4A3gscCqwFPgBQVS9u48dV1UFVdek4+/4x8MNtG1X1TuDrwDltn3Pa0E2tvsMYnEX/e5IDhx5nKXA5MBf4ZJtzI3A4cB7w+m0TkywArgLe3x7vLcBnk4zt5PgAdzD485EeY7hrVA4HflJVj+xkzuuA91XVxqraxCCoX7+T+Tu6oqpubMf4JIMQnqq5wEOTTaqqS6rqgap6pKr+ETgA+MOhKd+qqs9X1aPAGPA84N1V9auq+gawemjuXwFXV9XVVfVoVV0DrAFOnaSMh1q90mMMd43KA8C8Sdahnw7cM7R9T+ubqvuG2g8DB+3CvpuBgyeblOQtSe5IsjXJFuAQYN7QlHVD7acDD1bVwxOMPwN4dVuS2dIe74XAEZOUcTCwZbJatW8x3DUq3wJ+CZy2kzk/ZhB42xzV+gB+Djxl20CSp01zfbcBz9yhb7uPUG3r628DTgcOraq5wFYgE+xzL3BYkqcM9R051F4HfKKq5g7dnlpV5493/CF/BNw6heekfYjhrpGoqq3Au4GPJDktyVOSPCnJKUn+oU37NPCudr35vDZ/24uPtwLPTnJ8W+M+bxdLuB/4/Z2M3wjMbevgE+1zMPAIsAmYk+TdwO9O9IBVdQ+DZZbzkuyf5CTglUNTLgFemeRlSfZLcmCSlyRZOFHNrb7DgOt38ly0DzLcNTJtjfpcBi+SbmJw5noO8Pk25f0MwvA24LvAt1sfVfUjBlfbfBm4E9juypkpOA9Y1ZY/Th+ntl8BFzNYB9/mQ8CrkmxO8mHgS8AXgR8xWDL6Bdsvs4zndQxeTH6gPZdLGfwGQ1WtY/AC7Dt4/M/jrTz+73TH4wO8FljVrnmXHhO/rEMaX3vn59eB52x7I9MMHONS4AdV9Z7d2PcABr/BvLiqNk57cdqrGe7SLEryPOBB4H+AP2PwW8pJVfWdUdal/szEO+YkTexpwOcYXAq6Hvgbg10zwTN3SeqQL6hKUocMd0nq0BNizX3evHm1aNGiUZchSXuVm2+++SdVNe7n+T8hwn3RokWsWbNm1GVI0l4lyT0TjbksI0kdMtwlqUOGuyR1yHCXpA4Z7pLUoSmFe5K7k3w3yS1J1rS+w5Jck+TOdn9o60+SDydZm+S2JM+dyScgSfptu3Lm/idVdXxVLWnbK4Brq2oxcG3bBjgFWNxuy4GPTlexkqSp2ZNlmaXAqtZexePfqLMU+HgNXM/gCw8m+5owSdI0muqbmAr4ryQF/GtVrQTmV9W9bfw+YH5rL2D7LyxY3/ruHeojyXIGZ/YcddRRu1f9LFu04qpRl9CVu89/xahLkLo11XB/YVVtSPJ7wDVJfjA8WFXVgn/K2n8QKwGWLFniR1NK0jSa0rJMVW1o9xuBK4ATgPu3Lbe0+23fBLOB7b/0d2HrkyTNkknDPclTkxy8rc3g22O+B6wGlrVpy4ArW3s18IZ21cyJwNah5RtJ0iyYyrLMfOCKJNvmf6qqvpjkJuCyJGcx+HLgbV8yfDVwKrAWeBg4c9qrliTt1KThXlV3AceN0/8AcPI4/QWcPS3VSZJ2i+9QlaQOGe6S1CHDXZI6ZLhLUocMd0nqkOEuSR0y3CWpQ4a7JHXIcJekDhnuktQhw12SOmS4S1KHDHdJ6pDhLkkdMtwlqUOGuyR1yHCXpA4Z7pLUIcNdkjpkuEtShwx3SeqQ4S5JHTLcJalDhrskdchwl6QOGe6S1CHDXZI6ZLhLUocMd0nqkOEuSR0y3CWpQ4a7JHVoyuGeZL8k30nyhbZ9dJIbkqxNcmmS/Vv/AW17bRtfNEO1S5ImsCtn7m8C7hja/iBwQVUdA2wGzmr9ZwGbW/8FbZ4kaRZNKdyTLAReAfxb2w7wUuDyNmUVcFprL23btPGT23xJ0iyZ6pn7hcDbgEfb9uHAlqp6pG2vBxa09gJgHUAb39rmS5JmyaThnuTPgY1VdfN0HjjJ8iRrkqzZtGnTdD60JO3zpnLm/gLgL5LcDXyGwXLMh4C5Sea0OQuBDa29ATgSoI0fAjyw44NW1cqqWlJVS8bGxvboSUiStjdpuFfV31XVwqpaBJwBfKWqXgd8FXhVm7YMuLK1V7dt2vhXqqqmtWpJ0k7tyXXubwfOTbKWwZr6Ra3/IuDw1n8usGLPSpQk7ao5k095XFVdB1zX2ncBJ4wz5xfAq6ehNknSbvIdqpLUIcNdkjpkuEtShwx3SeqQ4S5JHTLcJalDhrskdchwl6QOGe6S1CHDXZI6ZLhLUocMd0nqkOEuSR0y3CWpQ4a7JHXIcJekDhnuktQhw12SOmS4S1KHDHdJ6pDhLkkdMtwlqUOGuyR1yHCXpA4Z7pLUIcNdkjpkuEtShwx3SeqQ4S5JHTLcJalDhrskdchwl6QOTRruSQ5McmOSW5PcnuS9rf/oJDckWZvk0iT7t/4D2vbaNr5ohp+DJGkHUzlz/yXw0qo6DjgeeHmSE4EPAhdU1THAZuCsNv8sYHPrv6DNkyTNoknDvQZ+1jaf1G4FvBS4vPWvAk5r7aVtmzZ+cpJMV8GSpMlNac09yX5JbgE2AtcA/w1sqapH2pT1wILWXgCsA2jjW4HDp7FmSdIkphTuVfWbqjoeWAicADxrTw+cZHmSNUnWbNq0aU8fTpI0ZJeulqmqLcBXgZOAuUnmtKGFwIbW3gAcCdDGDwEeGOexVlbVkqpaMjY2tnvVS5LGNZWrZcaSzG3tJwN/CtzBIORf1aYtA65s7dVtmzb+laqqaaxZkjSJOZNP4QhgVZL9GPxncFlVfSHJ94HPJHk/8B3gojb/IuATSdYCDwJnzEDdkqSdmDTcq+o24Dnj9N/FYP19x/5fAK+eluokSbvFd6hKUocMd0nqkOEuSR0y3CWpQ4a7JHXIcJekDhnuktQhw12SOjSVd6hKeoJbtOKqUZfQlbvPf8WoS9hjnrlLUocMd0nqkOEuSR0y3CWpQ4a7JHXIcJekDhnuktQhw12SOmS4S1KHDHdJ6pDhLkkdMtwlqUOGuyR1yHCXpA4Z7pLUIcNdkjpkuEtShwx3SeqQ4S5JHTLcJalDhrskdchwl6QOGe6S1CHDXZI6NGm4JzkyyVeTfD/J7Une1PoPS3JNkjvb/aGtP0k+nGRtktuSPHemn4QkaXtTOXN/BPjbqjoWOBE4O8mxwArg2qpaDFzbtgFOARa323Lgo9NetSRppyYN96q6t6q+3doPAXcAC4ClwKo2bRVwWmsvBT5eA9cDc5McMd2FS5Imtktr7kkWAc8BbgDmV9W9beg+YH5rLwDWDe22vvVJkmbJlMM9yUHAZ4E3V9VPh8eqqoDalQMnWZ5kTZI1mzZt2pVdJUmTmFK4J3kSg2D/ZFV9rnXfv225pd1vbP0bgCOHdl/Y+rZTVSuraklVLRkbG9vd+iVJ45jK1TIBLgLuqKp/GhpaDSxr7WXAlUP9b2hXzZwIbB1avpEkzYI5U5jzAuD1wHeT3NL63gGcD1yW5CzgHuD0NnY1cCqwFngYOHM6C5YkTW7ScK+qbwCZYPjkceYXcPYe1iVJ2gO+Q1WSOmS4S1KHDHdJ6pDhLkkdMtwlqUOGuyR1yHCXpA4Z7pLUIcNdkjpkuEtShwx3SeqQ4S5JHTLcJalDhrskdchwl6QOGe6S1CHDXZI6ZLhLUocMd0nqkOEuSR0y3CWpQ4a7JHXIcJekDhnuktQhw12SOmS4S1KHDHdJ6pDhLkkdMtwlqUOGuyR1yHCXpA4Z7pLUoUnDPcnHkmxM8r2hvsOSXJPkznZ/aOtPkg8nWZvktiTPncniJUnjm8qZ+8XAy3foWwFcW1WLgWvbNsApwOJ2Ww58dHrKlCTtiknDvaq+Bjy4Q/dSYFVrrwJOG+r/eA1cD8xNcsQ01SpJmqLdXXOfX1X3tvZ9wPzWXgCsG5q3vvVJkmbRHr+gWlUF1K7ul2R5kjVJ1mzatGlPy5AkDdndcL9/23JLu9/Y+jcARw7NW9j6fktVrayqJVW1ZGxsbDfLkCSNZ3fDfTWwrLWXAVcO9b+hXTVzIrB1aPlGkjRL5kw2IcmngZcA85KsB94DnA9cluQs4B7g9Db9auBUYC3wMHDmDNQsSZrEpOFeVa+ZYOjkceYWcPaeFiVJ2jO+Q1WSOmS4S1KHDHdJ6pDhLkkdMtwlqUOGuyR1yHCXpA4Z7pLUIcNdkjpkuEtShwx3SeqQ4S5JHTLcJalDhrskdchwl6QOGe6S1CHDXZI6ZLhLUocMd0nqkOEuSR0y3CWpQ4a7JHXIcJekDhnuktQhw12SOmS4S1KHDHdJ6pDhLkkdMtwlqUOGuyR1yHCXpA4Z7pLUIcNdkjo0I+Ge5OVJfphkbZIVM3EMSdLEpj3ck+wHfAQ4BTgWeE2SY6f7OJKkic3EmfsJwNqququqfgV8Blg6A8eRJE1gzgw85gJg3dD2euD5O05KshxY3jZ/luSHM1DLvmoe8JNRFzGZfHDUFWgE/NmcXs+YaGAmwn1KqmolsHJUx+9ZkjVVtWTUdUg78mdz9szEsswG4Mih7YWtT5I0S2Yi3G8CFic5Osn+wBnA6hk4jiRpAtO+LFNVjyQ5B/gSsB/wsaq6fbqPo51yuUtPVP5szpJU1ahrkCRNM9+hKkkdMtwlqUOGuyR1aGTXuWt6JHkWg3cAL2hdG4DVVXXH6KqSNGqeue/Fkrydwcc7BLix3QJ82g9s0xNZkjNHXUPvvFpmL5bkR8Czq+rXO/TvD9xeVYtHU5m0c0n+t6qOGnUdPXNZZu/2KPB04J4d+o9oY9LIJLltoiFg/mzWsi8y3PdubwauTXInj39Y21HAMcA5oypKauYDLwM279Af4JuzX86+xXDfi1XVF5M8k8HHLA+/oHpTVf1mdJVJAHwBOKiqbtlxIMl1s17NPsY1d0nqkFfLSFKHDHdJ6pDhLkkdMtwlqUOGuyR16P8BtIgOjfRADRgAAAAASUVORK5CYII=\n",
|
178 |
+
"text/plain": [
|
179 |
+
"<Figure size 432x288 with 1 Axes>"
|
180 |
+
]
|
181 |
+
},
|
182 |
+
"metadata": {
|
183 |
+
"needs_background": "light"
|
184 |
+
},
|
185 |
+
"output_type": "display_data"
|
186 |
+
}
|
187 |
+
],
|
188 |
+
"source": [
|
189 |
+
"#print bar chart\n",
|
190 |
+
"data['class'].value_counts().plot(kind='bar', title='Count (target)');"
|
191 |
+
]
|
192 |
+
},
|
193 |
+
{
|
194 |
+
"cell_type": "markdown",
|
195 |
+
"metadata": {},
|
196 |
+
"source": [
|
197 |
+
"#### Down Sampling - Majority Class - Using Random Sampling"
|
198 |
+
]
|
199 |
+
},
|
200 |
+
{
|
201 |
+
"cell_type": "code",
|
202 |
+
"execution_count": 14,
|
203 |
+
"metadata": {},
|
204 |
+
"outputs": [
|
205 |
+
{
|
206 |
+
"name": "stdout",
|
207 |
+
"output_type": "stream",
|
208 |
+
"text": [
|
209 |
+
"Class = 0 500\n",
|
210 |
+
"Class = 1 268\n"
|
211 |
+
]
|
212 |
+
}
|
213 |
+
],
|
214 |
+
"source": [
|
215 |
+
"count_class_0, count_class_1 = data['class'].value_counts()\n",
|
216 |
+
"\n",
|
217 |
+
"# Divide by class\n",
|
218 |
+
"df_class_0 = data[data['class'] == 0] #majority class\n",
|
219 |
+
"df_class_1 = data[data['class'] == 1] #minority class\n",
|
220 |
+
"\n",
|
221 |
+
"print('Class = 0 ', df_class_0.shape[0])\n",
|
222 |
+
"print('Class = 1 ', df_class_1.shape[0])"
|
223 |
+
]
|
224 |
+
},
|
225 |
+
{
|
226 |
+
"cell_type": "code",
|
227 |
+
"execution_count": 16,
|
228 |
+
"metadata": {},
|
229 |
+
"outputs": [
|
230 |
+
{
|
231 |
+
"data": {
|
232 |
+
"text/plain": [
|
233 |
+
"(268, 9)"
|
234 |
+
]
|
235 |
+
},
|
236 |
+
"execution_count": 16,
|
237 |
+
"metadata": {},
|
238 |
+
"output_type": "execute_result"
|
239 |
+
}
|
240 |
+
],
|
241 |
+
"source": [
|
242 |
+
"# Sample Majority class (y=0, to have same number of records as minority calls (y=1)\n",
|
243 |
+
"df_class_0_under = df_class_0.sample(count_class_1)\n",
|
244 |
+
"\n",
|
245 |
+
"df_class_0_under.shape"
|
246 |
+
]
|
247 |
+
},
|
248 |
+
{
|
249 |
+
"cell_type": "code",
|
250 |
+
"execution_count": 19,
|
251 |
+
"metadata": {},
|
252 |
+
"outputs": [
|
253 |
+
{
|
254 |
+
"name": "stdout",
|
255 |
+
"output_type": "stream",
|
256 |
+
"text": [
|
257 |
+
"Random under-sampling:\n",
|
258 |
+
"0 268\n",
|
259 |
+
"1 268\n",
|
260 |
+
"Name: class, dtype: int64\n",
|
261 |
+
"Num records = 536\n"
|
262 |
+
]
|
263 |
+
},
|
264 |
+
{
|
265 |
+
"data": {
|
266 |
+
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEFCAYAAAAYKqc0AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/YYfK9AAAACXBIWXMAAAsTAAALEwEAmpwYAAAPyklEQVR4nO3df6zddX3H8edrVHEKs2DvaimtZVq3QRarqYjxR1hMRFhMMdkY6LAal5qFJhp/bPgj2hlZ2DJ/RiWpkVAFEaag3WQ6bDRI/AGFQQUq0ihdWwu98lvZ0MJ7f5xv4fRyb+/ve+inz0dyc8/5fL/f831fuDx7+r3nXFJVSJLa8nuDHkCSNPOMuyQ1yLhLUoOMuyQ1yLhLUoOMuyQ1yLhLY0gylOSnSX5/0LOMJsnh3XxDg55FTz3GXQOV5I1JNif5dZLdSf4zySvn4LyV5AXj7HYucFFV/W93zPeS/O1szzaWkeevqkeAC+nNKe3HuGtgkrwL+CTwT8BCYCnwOWDVAMcCes+KgdXAxTP4mPNm6rH6fBlY3c0rPc64ayCSPBv4CHBOVV1RVb+pqt9V1b9X1Xu7fQ5P8skkv+w+PrkvYknekuTaEY/5+LPxJBcl+WySbyZ5KMmPkzy/23ZNd8jN3d8Y/nqUEV8G3F9VO7tjzgNeBXymO+Yz3fqnkuxI8mCSG5K8qm+edUm+muTiJA8Cb0lyXJJrupm+0814cd8xJyX5QZL7k9yc5OQDnb+b7z7gpKn/21CLjLsG5eXAM4ArD7DPB+hFawXwIuBE4IOTOMeZwD8CRwHbgPMAqurV3fYXVdURVXXZKMf+GXD7vjtV9QHg+8Da7pi13abru/mOpvcs+t+SPKPvcVYBXwXmA5d0+1wHPAdYB5y9b8cki4FvAh/tHu89wNeSDB3g/ABb6f3zkR5n3DUozwF+VVV7D7DPm4CPVNWeqhqmF+qzD7D/SFdW1XXdOS6hF+GJmg88NN5OVXVxVd1TVXur6mPA4cAf9+3yw6r6elU9BgwBLwU+VFW/raprgY19+/4NcFVVXVVVj1XV1cBm4LRxxniom1d6nHHXoNwDLBjnOvQxwPa++9u7tYm6q+/2w8ARkzj2PuDI8XZK8p4kW5M8kOR+4NnAgr5ddvTdPga4t6oeHmP784C/6i7J3N893iuBReOMcSRw/3iz6tBi3DUoPwQeAU4/wD6/pBe8fZZ2awC/AZ65b0OS587wfFuAF45Y2+9XqHbX1/8eOAM4qqrmAw8AGeOY3cDRSZ7Zt7ak7/YO4EtVNb/v41lVdf5o5+/zp8DNE/iadAgx7hqIqnoA+BDw2SSnJ3lmkqclOTXJv3S7XQp8sHu9+YJu/30/fLwZOCHJiu4a97pJjnA38EcH2H4dML+7Dj7WMUcCe4FhYF6SDwF/MNYDVtV2epdZ1iV5epKXA6/v2+Vi4PVJTklyWJJnJDk5ybFjzdzNdzTwowN8LToEGXcNTHeN+l30fkg6TO+Z61rg690uH6UXwy3AT4AbuzWq6mf0Xm3zHeAOYL9XzkzAOmBDd/njjFFm+y1wEb3r4Pt8CvjLJPcl+TTwbeBbwM/oXTL6P/a/zDKaN9H7YfI93ddyGb2/wVBVO+j9APb9PPHP47088d/pyPMDvBHY0L3mXXpc/J91SKPr3vn5feDF+97INAvnuAz4aVV9eArHHk7vbzCvrqo9Mz6cDmrGXZpDSV4K3Av8Angtvb+lvLyq/nuQc6k9s/GOOUljey5wBb2Xgu4E/s6wazb4zF2SGuQPVCWpQcZdkhr0lLjmvmDBglq2bNmgx5Ckg8oNN9zwq6oa9ff5PyXivmzZMjZv3jzoMSTpoJJk+1jbvCwjSQ0y7pLUIOMuSQ0y7pLUIOMuSQ0y7pLUIOMuSQ0y7pLUoKfEm5gOFsvO/eagR2jKnef/xaBHaIbfmzOrhe9Nn7lLUoOMuyQ1yLhLUoOMuyQ1yLhLUoOMuyQ1yLhLUoOMuyQ1yLhLUoPGjXuSJUm+m+S2JLcmeUe3vi7JriQ3dR+n9R3zviTbktye5JTZ/AIkSU82kV8/sBd4d1XdmORI4IYkV3fbPlFV/9q/c5LjgTOBE4BjgO8keWFVPTqTg0uSxjbuM/eq2l1VN3a3HwK2AosPcMgq4CtV9UhV/QLYBpw4E8NKkiZmUtfckywDXgz8uFtam2RLkguTHNWtLQZ29B22kwP/YSBJmmETjnuSI4CvAe+sqgeBC4DnAyuA3cDHJnPiJGuSbE6yeXh4eDKHSpLGMaG4J3kavbBfUlVXAFTV3VX1aFU9BnyeJy697AKW9B1+bLe2n6paX1Urq2rl0NDQdL4GSdIIE3m1TIAvAFur6uN964v6dnsDcEt3eyNwZpLDkxwHLAeum7mRJUnjmcirZV4BnA38JMlN3dr7gbOSrAAKuBN4O0BV3ZrkcuA2eq+0OcdXykjS3Bo37lV1LZBRNl11gGPOA86bxlySpGnwHaqS1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNGjfuSZYk+W6S25LcmuQd3frRSa5Ockf3+ahuPUk+nWRbki1JXjLbX4QkaX8Teea+F3h3VR0PnASck+R44FxgU1UtBzZ19wFOBZZ3H2uAC2Z8aknSAY0b96raXVU3drcfArYCi4FVwIZutw3A6d3tVcAXq+dHwPwki2Z6cEnS2CZ1zT3JMuDFwI+BhVW1u9t0F7Cwu70Y2NF32M5uTZI0RyYc9yRHAF8D3llVD/Zvq6oCajInTrImyeYkm4eHhydzqCRpHBOKe5Kn0Qv7JVV1Rbd8977LLd3nPd36LmBJ3+HHdmv7qar1VbWyqlYODQ1NdX5J0igm8mqZAF8AtlbVx/s2bQRWd7dXA9/oW39z96qZk4AH+i7fSJLmwLwJ7PMK4GzgJ0lu6tbeD5wPXJ7kbcB24Ixu21XAacA24GHgrTM5sCRpfOPGvaquBTLG5teMsn8B50xzLknSNPgOVUlqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAaNG/ckFybZk+SWvrV1SXYluan7OK1v2/uSbEtye5JTZmtwSdLYJvLM/SLgdaOsf6KqVnQfVwEkOR44EzihO+ZzSQ6bqWElSRMzbtyr6hrg3gk+3irgK1X1SFX9AtgGnDiN+SRJUzCda+5rk2zpLtsc1a0tBnb07bOzW5MkzaGpxv0C4PnACmA38LHJPkCSNUk2J9k8PDw8xTEkSaOZUtyr6u6qerSqHgM+zxOXXnYBS/p2PbZbG+0x1lfVyqpaOTQ0NJUxJEljmFLckyzqu/sGYN8raTYCZyY5PMlxwHLguumNKEmarHnj7ZDkUuBkYEGSncCHgZOTrAAKuBN4O0BV3ZrkcuA2YC9wTlU9OiuTS5LGNG7cq+qsUZa/cID9zwPOm85QkqTp8R2qktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktSgceOe5MIke5Lc0rd2dJKrk9zRfT6qW0+STyfZlmRLkpfM5vCSpNFN5Jn7RcDrRqydC2yqquXApu4+wKnA8u5jDXDBzIwpSZqMceNeVdcA945YXgVs6G5vAE7vW/9i9fwImJ9k0QzNKkmaoKlec19YVbu723cBC7vbi4Edffvt7NYkSXNo2j9QraoCarLHJVmTZHOSzcPDw9MdQ5LUZ6pxv3vf5Zbu855ufRewpG+/Y7u1J6mq9VW1sqpWDg0NTXEMSdJophr3jcDq7vZq4Bt962/uXjVzEvBA3+UbSdIcmTfeDkkuBU4GFiTZCXwYOB+4PMnbgO3AGd3uVwGnAduAh4G3zsLMkqRxjBv3qjprjE2vGWXfAs6Z7lCSpOnxHaqS1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNmjedg5PcCTwEPArsraqVSY4GLgOWAXcCZ1TVfdMbU5I0GTPxzP3Pq2pFVa3s7p8LbKqq5cCm7r4kaQ7NxmWZVcCG7vYG4PRZOIck6QCmG/cC/ivJDUnWdGsLq2p3d/suYOE0zyFJmqRpXXMHXllVu5L8IXB1kp/2b6yqSlKjHdj9YbAGYOnSpdMcQ5LUb1rP3KtqV/d5D3AlcCJwd5JFAN3nPWMcu76qVlbVyqGhoemMIUkaYcpxT/KsJEfuuw28FrgF2Ais7nZbDXxjukNKkiZnOpdlFgJXJtn3OF+uqm8luR64PMnbgO3AGdMfU5I0GVOOe1X9HHjRKOv3AK+ZzlCSpOnxHaqS1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNmrW4J3ldktuTbEty7mydR5L0ZLMS9ySHAZ8FTgWOB85KcvxsnEuS9GSz9cz9RGBbVf28qn4LfAVYNUvnkiSNMG+WHncxsKPv/k7gZf07JFkDrOnu/jrJ7bM0y6FoAfCrQQ8xnvzzoCfQAPi9ObOeN9aG2Yr7uKpqPbB+UOdvWZLNVbVy0HNII/m9OXdm67LMLmBJ3/1juzVJ0hyYrbhfDyxPclySpwNnAhtn6VySpBFm5bJMVe1Nshb4NnAYcGFV3Tob59KovNylpyq/N+dIqmrQM0iSZpjvUJWkBhl3SWqQcZekBg3sde6aOUn+hN47gBd3S7uAjVW1dXBTSRokn7kf5JL8A71f7xDguu4jwKX+wjY9VSV566BnaJ2vljnIJfkZcEJV/W7E+tOBW6tq+WAmk8aW5H+qaumg52iZl2UOfo8BxwDbR6wv6rZJA5Fky1ibgIVzOcuhyLgf/N4JbEpyB0/8sralwAuAtYMaSqIX8FOA+0asB/jB3I9zaDHuB7mq+laSF9L7Ncv9P1C9vqoeHdxkEv8BHFFVN43ckOR7cz7NIcZr7pLUIF8tI0kNMu6S1CDjLkkNMu6S1CDjLkkN+n9FlP1ETWJfHAAAAABJRU5ErkJggg==\n",
|
267 |
+
"text/plain": [
|
268 |
+
"<Figure size 432x288 with 1 Axes>"
|
269 |
+
]
|
270 |
+
},
|
271 |
+
"metadata": {
|
272 |
+
"needs_background": "light"
|
273 |
+
},
|
274 |
+
"output_type": "display_data"
|
275 |
+
}
|
276 |
+
],
|
277 |
+
"source": [
|
278 |
+
"# join the dataframes containing y=1 and y=0\n",
|
279 |
+
"df_test_under = pd.concat([df_class_0_under, df_class_1])\n",
|
280 |
+
"\n",
|
281 |
+
"print('Random under-sampling:')\n",
|
282 |
+
"print(df_test_under['class'].value_counts())\n",
|
283 |
+
"print(\"Num records = \", df_test_under.shape[0])\n",
|
284 |
+
"\n",
|
285 |
+
"df_test_under['class'].value_counts().plot(kind='bar', title='Count (target)');"
|
286 |
+
]
|
287 |
+
},
|
288 |
+
{
|
289 |
+
"cell_type": "markdown",
|
290 |
+
"metadata": {},
|
291 |
+
"source": [
|
292 |
+
"#### Down Sampling - Majority Class - Using imblearn "
|
293 |
+
]
|
294 |
+
},
|
295 |
+
{
|
296 |
+
"cell_type": "code",
|
297 |
+
"execution_count": 23,
|
298 |
+
"metadata": {},
|
299 |
+
"outputs": [
|
300 |
+
{
|
301 |
+
"name": "stdout",
|
302 |
+
"output_type": "stream",
|
303 |
+
"text": [
|
304 |
+
"imblearn over-sampling:\n",
|
305 |
+
"0 268\n",
|
306 |
+
"1 268\n",
|
307 |
+
"Name: class, dtype: int64\n",
|
308 |
+
"Num records = 536\n"
|
309 |
+
]
|
310 |
+
},
|
311 |
+
{
|
312 |
+
"data": {
|
313 |
+
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEFCAYAAAAYKqc0AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/YYfK9AAAACXBIWXMAAAsTAAALEwEAmpwYAAAPyklEQVR4nO3df6zddX3H8edrVHEKs2DvaimtZVq3QRarqYjxR1hMRFhMMdkY6LAal5qFJhp/bPgj2hlZ2DJ/RiWpkVAFEaag3WQ6bDRI/AGFQQUq0ihdWwu98lvZ0MJ7f5xv4fRyb+/ve+inz0dyc8/5fL/f831fuDx7+r3nXFJVSJLa8nuDHkCSNPOMuyQ1yLhLUoOMuyQ1yLhLUoOMuyQ1yLhLY0gylOSnSX5/0LOMJsnh3XxDg55FTz3GXQOV5I1JNif5dZLdSf4zySvn4LyV5AXj7HYucFFV/W93zPeS/O1szzaWkeevqkeAC+nNKe3HuGtgkrwL+CTwT8BCYCnwOWDVAMcCes+KgdXAxTP4mPNm6rH6fBlY3c0rPc64ayCSPBv4CHBOVV1RVb+pqt9V1b9X1Xu7fQ5P8skkv+w+PrkvYknekuTaEY/5+LPxJBcl+WySbyZ5KMmPkzy/23ZNd8jN3d8Y/nqUEV8G3F9VO7tjzgNeBXymO+Yz3fqnkuxI8mCSG5K8qm+edUm+muTiJA8Cb0lyXJJrupm+0814cd8xJyX5QZL7k9yc5OQDnb+b7z7gpKn/21CLjLsG5eXAM4ArD7DPB+hFawXwIuBE4IOTOMeZwD8CRwHbgPMAqurV3fYXVdURVXXZKMf+GXD7vjtV9QHg+8Da7pi13abru/mOpvcs+t+SPKPvcVYBXwXmA5d0+1wHPAdYB5y9b8cki4FvAh/tHu89wNeSDB3g/ABb6f3zkR5n3DUozwF+VVV7D7DPm4CPVNWeqhqmF+qzD7D/SFdW1XXdOS6hF+GJmg88NN5OVXVxVd1TVXur6mPA4cAf9+3yw6r6elU9BgwBLwU+VFW/raprgY19+/4NcFVVXVVVj1XV1cBm4LRxxniom1d6nHHXoNwDLBjnOvQxwPa++9u7tYm6q+/2w8ARkzj2PuDI8XZK8p4kW5M8kOR+4NnAgr5ddvTdPga4t6oeHmP784C/6i7J3N893iuBReOMcSRw/3iz6tBi3DUoPwQeAU4/wD6/pBe8fZZ2awC/AZ65b0OS587wfFuAF45Y2+9XqHbX1/8eOAM4qqrmAw8AGeOY3cDRSZ7Zt7ak7/YO4EtVNb/v41lVdf5o5+/zp8DNE/iadAgx7hqIqnoA+BDw2SSnJ3lmkqclOTXJv3S7XQp8sHu9+YJu/30/fLwZOCHJiu4a97pJjnA38EcH2H4dML+7Dj7WMUcCe4FhYF6SDwF/MNYDVtV2epdZ1iV5epKXA6/v2+Vi4PVJTklyWJJnJDk5ybFjzdzNdzTwowN8LToEGXcNTHeN+l30fkg6TO+Z61rg690uH6UXwy3AT4AbuzWq6mf0Xm3zHeAOYL9XzkzAOmBDd/njjFFm+y1wEb3r4Pt8CvjLJPcl+TTwbeBbwM/oXTL6P/a/zDKaN9H7YfI93ddyGb2/wVBVO+j9APb9PPHP47088d/pyPMDvBHY0L3mXXpc/J91SKPr3vn5feDF+97INAvnuAz4aVV9eArHHk7vbzCvrqo9Mz6cDmrGXZpDSV4K3Av8Angtvb+lvLyq/nuQc6k9s/GOOUljey5wBb2Xgu4E/s6wazb4zF2SGuQPVCWpQcZdkhr0lLjmvmDBglq2bNmgx5Ckg8oNN9zwq6oa9ff5PyXivmzZMjZv3jzoMSTpoJJk+1jbvCwjSQ0y7pLUIOMuSQ0y7pLUIOMuSQ0y7pLUIOMuSQ0y7pLUoKfEm5gOFsvO/eagR2jKnef/xaBHaIbfmzOrhe9Nn7lLUoOMuyQ1yLhLUoOMuyQ1yLhLUoOMuyQ1yLhLUoOMuyQ1yLhLUoPGjXuSJUm+m+S2JLcmeUe3vi7JriQ3dR+n9R3zviTbktye5JTZ/AIkSU82kV8/sBd4d1XdmORI4IYkV3fbPlFV/9q/c5LjgTOBE4BjgO8keWFVPTqTg0uSxjbuM/eq2l1VN3a3HwK2AosPcMgq4CtV9UhV/QLYBpw4E8NKkiZmUtfckywDXgz8uFtam2RLkguTHNWtLQZ29B22kwP/YSBJmmETjnuSI4CvAe+sqgeBC4DnAyuA3cDHJnPiJGuSbE6yeXh4eDKHSpLGMaG4J3kavbBfUlVXAFTV3VX1aFU9BnyeJy697AKW9B1+bLe2n6paX1Urq2rl0NDQdL4GSdIIE3m1TIAvAFur6uN964v6dnsDcEt3eyNwZpLDkxwHLAeum7mRJUnjmcirZV4BnA38JMlN3dr7gbOSrAAKuBN4O0BV3ZrkcuA2eq+0OcdXykjS3Bo37lV1LZBRNl11gGPOA86bxlySpGnwHaqS1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNGjfuSZYk+W6S25LcmuQd3frRSa5Ockf3+ahuPUk+nWRbki1JXjLbX4QkaX8Teea+F3h3VR0PnASck+R44FxgU1UtBzZ19wFOBZZ3H2uAC2Z8aknSAY0b96raXVU3drcfArYCi4FVwIZutw3A6d3tVcAXq+dHwPwki2Z6cEnS2CZ1zT3JMuDFwI+BhVW1u9t0F7Cwu70Y2NF32M5uTZI0RyYc9yRHAF8D3llVD/Zvq6oCajInTrImyeYkm4eHhydzqCRpHBOKe5Kn0Qv7JVV1Rbd8977LLd3nPd36LmBJ3+HHdmv7qar1VbWyqlYODQ1NdX5J0igm8mqZAF8AtlbVx/s2bQRWd7dXA9/oW39z96qZk4AH+i7fSJLmwLwJ7PMK4GzgJ0lu6tbeD5wPXJ7kbcB24Ixu21XAacA24GHgrTM5sCRpfOPGvaquBTLG5teMsn8B50xzLknSNPgOVUlqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAaNG/ckFybZk+SWvrV1SXYluan7OK1v2/uSbEtye5JTZmtwSdLYJvLM/SLgdaOsf6KqVnQfVwEkOR44EzihO+ZzSQ6bqWElSRMzbtyr6hrg3gk+3irgK1X1SFX9AtgGnDiN+SRJUzCda+5rk2zpLtsc1a0tBnb07bOzW5MkzaGpxv0C4PnACmA38LHJPkCSNUk2J9k8PDw8xTEkSaOZUtyr6u6qerSqHgM+zxOXXnYBS/p2PbZbG+0x1lfVyqpaOTQ0NJUxJEljmFLckyzqu/sGYN8raTYCZyY5PMlxwHLguumNKEmarHnj7ZDkUuBkYEGSncCHgZOTrAAKuBN4O0BV3ZrkcuA2YC9wTlU9OiuTS5LGNG7cq+qsUZa/cID9zwPOm85QkqTp8R2qktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktSgceOe5MIke5Lc0rd2dJKrk9zRfT6qW0+STyfZlmRLkpfM5vCSpNFN5Jn7RcDrRqydC2yqquXApu4+wKnA8u5jDXDBzIwpSZqMceNeVdcA945YXgVs6G5vAE7vW/9i9fwImJ9k0QzNKkmaoKlec19YVbu723cBC7vbi4Edffvt7NYkSXNo2j9QraoCarLHJVmTZHOSzcPDw9MdQ5LUZ6pxv3vf5Zbu855ufRewpG+/Y7u1J6mq9VW1sqpWDg0NTXEMSdJophr3jcDq7vZq4Bt962/uXjVzEvBA3+UbSdIcmTfeDkkuBU4GFiTZCXwYOB+4PMnbgO3AGd3uVwGnAduAh4G3zsLMkqRxjBv3qjprjE2vGWXfAs6Z7lCSpOnxHaqS1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNmjedg5PcCTwEPArsraqVSY4GLgOWAXcCZ1TVfdMbU5I0GTPxzP3Pq2pFVa3s7p8LbKqq5cCm7r4kaQ7NxmWZVcCG7vYG4PRZOIck6QCmG/cC/ivJDUnWdGsLq2p3d/suYOE0zyFJmqRpXXMHXllVu5L8IXB1kp/2b6yqSlKjHdj9YbAGYOnSpdMcQ5LUb1rP3KtqV/d5D3AlcCJwd5JFAN3nPWMcu76qVlbVyqGhoemMIUkaYcpxT/KsJEfuuw28FrgF2Ais7nZbDXxjukNKkiZnOpdlFgJXJtn3OF+uqm8luR64PMnbgO3AGdMfU5I0GVOOe1X9HHjRKOv3AK+ZzlCSpOnxHaqS1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNmrW4J3ldktuTbEty7mydR5L0ZLMS9ySHAZ8FTgWOB85KcvxsnEuS9GSz9cz9RGBbVf28qn4LfAVYNUvnkiSNMG+WHncxsKPv/k7gZf07JFkDrOnu/jrJ7bM0y6FoAfCrQQ8xnvzzoCfQAPi9ObOeN9aG2Yr7uKpqPbB+UOdvWZLNVbVy0HNII/m9OXdm67LMLmBJ3/1juzVJ0hyYrbhfDyxPclySpwNnAhtn6VySpBFm5bJMVe1Nshb4NnAYcGFV3Tob59KovNylpyq/N+dIqmrQM0iSZpjvUJWkBhl3SWqQcZekBg3sde6aOUn+hN47gBd3S7uAjVW1dXBTSRokn7kf5JL8A71f7xDguu4jwKX+wjY9VSV566BnaJ2vljnIJfkZcEJV/W7E+tOBW6tq+WAmk8aW5H+qaumg52iZl2UOfo8BxwDbR6wv6rZJA5Fky1ibgIVzOcuhyLgf/N4JbEpyB0/8sralwAuAtYMaSqIX8FOA+0asB/jB3I9zaDHuB7mq+laSF9L7Ncv9P1C9vqoeHdxkEv8BHFFVN43ckOR7cz7NIcZr7pLUIF8tI0kNMu6S1CDjLkkNMu6S1CDjLkkN+n9FlP1ETWJfHAAAAABJRU5ErkJggg==\n",
|
314 |
+
"text/plain": [
|
315 |
+
"<Figure size 432x288 with 1 Axes>"
|
316 |
+
]
|
317 |
+
},
|
318 |
+
"metadata": {
|
319 |
+
"needs_background": "light"
|
320 |
+
},
|
321 |
+
"output_type": "display_data"
|
322 |
+
}
|
323 |
+
],
|
324 |
+
"source": [
|
325 |
+
"from imblearn.under_sampling import RandomUnderSampler\n",
|
326 |
+
"\n",
|
327 |
+
"#separate the data in descriptive and target attributes\n",
|
328 |
+
"X = data.drop('class', axis=1)\n",
|
329 |
+
"Y = data['class']\n",
|
330 |
+
"\n",
|
331 |
+
"rus = RandomUnderSampler(random_state=42, replacement=True)\n",
|
332 |
+
"X_rus, Y_rus = rus.fit_resample(X, Y)\n",
|
333 |
+
"\n",
|
334 |
+
"df_rus = pd.concat([pd.DataFrame(X_rus), pd.DataFrame(Y_rus, columns=['class'])], axis=1)\n",
|
335 |
+
"\n",
|
336 |
+
"print('imblearn over-sampling:')\n",
|
337 |
+
"print(df_rus['class'].value_counts())\n",
|
338 |
+
"print(\"Num records = \", df_rus.shape[0])\n",
|
339 |
+
"\n",
|
340 |
+
"df_rus['class'].value_counts().plot(kind='bar', title='Count (target)');"
|
341 |
+
]
|
342 |
+
},
|
343 |
+
{
|
344 |
+
"cell_type": "code",
|
345 |
+
"execution_count": 24,
|
346 |
+
"metadata": {},
|
347 |
+
"outputs": [],
|
348 |
+
"source": [
|
349 |
+
"# we should have the same/similar results as previous. Although the selection of records could be different"
|
350 |
+
]
|
351 |
+
},
|
352 |
+
{
|
353 |
+
"cell_type": "markdown",
|
354 |
+
"metadata": {},
|
355 |
+
"source": [
|
356 |
+
" #### Down/Under sampling the majority class y=1 using Sci-Kit Learn"
|
357 |
+
]
|
358 |
+
},
|
359 |
+
{
|
360 |
+
"cell_type": "code",
|
361 |
+
"execution_count": 25,
|
362 |
+
"metadata": {},
|
363 |
+
"outputs": [
|
364 |
+
{
|
365 |
+
"name": "stdout",
|
366 |
+
"output_type": "stream",
|
367 |
+
"text": [
|
368 |
+
"Original Data distribution\n",
|
369 |
+
"0 500\n",
|
370 |
+
"1 268\n",
|
371 |
+
"Name: class, dtype: int64\n",
|
372 |
+
"Sci-Kit Learn : resample : Down Sampled data set\n",
|
373 |
+
"0 268\n",
|
374 |
+
"1 268\n",
|
375 |
+
"Name: class, dtype: int64\n",
|
376 |
+
"Num records = 536\n"
|
377 |
+
]
|
378 |
+
},
|
379 |
+
{
|
380 |
+
"data": {
|
381 |
+
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEFCAYAAAAYKqc0AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/YYfK9AAAACXBIWXMAAAsTAAALEwEAmpwYAAAPyklEQVR4nO3df6zddX3H8edrVHEKs2DvaimtZVq3QRarqYjxR1hMRFhMMdkY6LAal5qFJhp/bPgj2hlZ2DJ/RiWpkVAFEaag3WQ6bDRI/AGFQQUq0ihdWwu98lvZ0MJ7f5xv4fRyb+/ve+inz0dyc8/5fL/f831fuDx7+r3nXFJVSJLa8nuDHkCSNPOMuyQ1yLhLUoOMuyQ1yLhLUoOMuyQ1yLhLY0gylOSnSX5/0LOMJsnh3XxDg55FTz3GXQOV5I1JNif5dZLdSf4zySvn4LyV5AXj7HYucFFV/W93zPeS/O1szzaWkeevqkeAC+nNKe3HuGtgkrwL+CTwT8BCYCnwOWDVAMcCes+KgdXAxTP4mPNm6rH6fBlY3c0rPc64ayCSPBv4CHBOVV1RVb+pqt9V1b9X1Xu7fQ5P8skkv+w+PrkvYknekuTaEY/5+LPxJBcl+WySbyZ5KMmPkzy/23ZNd8jN3d8Y/nqUEV8G3F9VO7tjzgNeBXymO+Yz3fqnkuxI8mCSG5K8qm+edUm+muTiJA8Cb0lyXJJrupm+0814cd8xJyX5QZL7k9yc5OQDnb+b7z7gpKn/21CLjLsG5eXAM4ArD7DPB+hFawXwIuBE4IOTOMeZwD8CRwHbgPMAqurV3fYXVdURVXXZKMf+GXD7vjtV9QHg+8Da7pi13abru/mOpvcs+t+SPKPvcVYBXwXmA5d0+1wHPAdYB5y9b8cki4FvAh/tHu89wNeSDB3g/ABb6f3zkR5n3DUozwF+VVV7D7DPm4CPVNWeqhqmF+qzD7D/SFdW1XXdOS6hF+GJmg88NN5OVXVxVd1TVXur6mPA4cAf9+3yw6r6elU9BgwBLwU+VFW/raprgY19+/4NcFVVXVVVj1XV1cBm4LRxxniom1d6nHHXoNwDLBjnOvQxwPa++9u7tYm6q+/2w8ARkzj2PuDI8XZK8p4kW5M8kOR+4NnAgr5ddvTdPga4t6oeHmP784C/6i7J3N893iuBReOMcSRw/3iz6tBi3DUoPwQeAU4/wD6/pBe8fZZ2awC/AZ65b0OS587wfFuAF45Y2+9XqHbX1/8eOAM4qqrmAw8AGeOY3cDRSZ7Zt7ak7/YO4EtVNb/v41lVdf5o5+/zp8DNE/iadAgx7hqIqnoA+BDw2SSnJ3lmkqclOTXJv3S7XQp8sHu9+YJu/30/fLwZOCHJiu4a97pJjnA38EcH2H4dML+7Dj7WMUcCe4FhYF6SDwF/MNYDVtV2epdZ1iV5epKXA6/v2+Vi4PVJTklyWJJnJDk5ybFjzdzNdzTwowN8LToEGXcNTHeN+l30fkg6TO+Z61rg690uH6UXwy3AT4AbuzWq6mf0Xm3zHeAOYL9XzkzAOmBDd/njjFFm+y1wEb3r4Pt8CvjLJPcl+TTwbeBbwM/oXTL6P/a/zDKaN9H7YfI93ddyGb2/wVBVO+j9APb9PPHP47088d/pyPMDvBHY0L3mXXpc/J91SKPr3vn5feDF+97INAvnuAz4aVV9eArHHk7vbzCvrqo9Mz6cDmrGXZpDSV4K3Av8Angtvb+lvLyq/nuQc6k9s/GOOUljey5wBb2Xgu4E/s6wazb4zF2SGuQPVCWpQcZdkhr0lLjmvmDBglq2bNmgx5Ckg8oNN9zwq6oa9ff5PyXivmzZMjZv3jzoMSTpoJJk+1jbvCwjSQ0y7pLUIOMuSQ0y7pLUIOMuSQ0y7pLUIOMuSQ0y7pLUoKfEm5gOFsvO/eagR2jKnef/xaBHaIbfmzOrhe9Nn7lLUoOMuyQ1yLhLUoOMuyQ1yLhLUoOMuyQ1yLhLUoOMuyQ1yLhLUoPGjXuSJUm+m+S2JLcmeUe3vi7JriQ3dR+n9R3zviTbktye5JTZ/AIkSU82kV8/sBd4d1XdmORI4IYkV3fbPlFV/9q/c5LjgTOBE4BjgO8keWFVPTqTg0uSxjbuM/eq2l1VN3a3HwK2AosPcMgq4CtV9UhV/QLYBpw4E8NKkiZmUtfckywDXgz8uFtam2RLkguTHNWtLQZ29B22kwP/YSBJmmETjnuSI4CvAe+sqgeBC4DnAyuA3cDHJnPiJGuSbE6yeXh4eDKHSpLGMaG4J3kavbBfUlVXAFTV3VX1aFU9BnyeJy697AKW9B1+bLe2n6paX1Urq2rl0NDQdL4GSdIIE3m1TIAvAFur6uN964v6dnsDcEt3eyNwZpLDkxwHLAeum7mRJUnjmcirZV4BnA38JMlN3dr7gbOSrAAKuBN4O0BV3ZrkcuA2eq+0OcdXykjS3Bo37lV1LZBRNl11gGPOA86bxlySpGnwHaqS1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNGjfuSZYk+W6S25LcmuQd3frRSa5Ockf3+ahuPUk+nWRbki1JXjLbX4QkaX8Teea+F3h3VR0PnASck+R44FxgU1UtBzZ19wFOBZZ3H2uAC2Z8aknSAY0b96raXVU3drcfArYCi4FVwIZutw3A6d3tVcAXq+dHwPwki2Z6cEnS2CZ1zT3JMuDFwI+BhVW1u9t0F7Cwu70Y2NF32M5uTZI0RyYc9yRHAF8D3llVD/Zvq6oCajInTrImyeYkm4eHhydzqCRpHBOKe5Kn0Qv7JVV1Rbd8977LLd3nPd36LmBJ3+HHdmv7qar1VbWyqlYODQ1NdX5J0igm8mqZAF8AtlbVx/s2bQRWd7dXA9/oW39z96qZk4AH+i7fSJLmwLwJ7PMK4GzgJ0lu6tbeD5wPXJ7kbcB24Ixu21XAacA24GHgrTM5sCRpfOPGvaquBTLG5teMsn8B50xzLknSNPgOVUlqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAaNG/ckFybZk+SWvrV1SXYluan7OK1v2/uSbEtye5JTZmtwSdLYJvLM/SLgdaOsf6KqVnQfVwEkOR44EzihO+ZzSQ6bqWElSRMzbtyr6hrg3gk+3irgK1X1SFX9AtgGnDiN+SRJUzCda+5rk2zpLtsc1a0tBnb07bOzW5MkzaGpxv0C4PnACmA38LHJPkCSNUk2J9k8PDw8xTEkSaOZUtyr6u6qerSqHgM+zxOXXnYBS/p2PbZbG+0x1lfVyqpaOTQ0NJUxJEljmFLckyzqu/sGYN8raTYCZyY5PMlxwHLguumNKEmarHnj7ZDkUuBkYEGSncCHgZOTrAAKuBN4O0BV3ZrkcuA2YC9wTlU9OiuTS5LGNG7cq+qsUZa/cID9zwPOm85QkqTp8R2qktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktSgceOe5MIke5Lc0rd2dJKrk9zRfT6qW0+STyfZlmRLkpfM5vCSpNFN5Jn7RcDrRqydC2yqquXApu4+wKnA8u5jDXDBzIwpSZqMceNeVdcA945YXgVs6G5vAE7vW/9i9fwImJ9k0QzNKkmaoKlec19YVbu723cBC7vbi4Edffvt7NYkSXNo2j9QraoCarLHJVmTZHOSzcPDw9MdQ5LUZ6pxv3vf5Zbu855ufRewpG+/Y7u1J6mq9VW1sqpWDg0NTXEMSdJophr3jcDq7vZq4Bt962/uXjVzEvBA3+UbSdIcmTfeDkkuBU4GFiTZCXwYOB+4PMnbgO3AGd3uVwGnAduAh4G3zsLMkqRxjBv3qjprjE2vGWXfAs6Z7lCSpOnxHaqS1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNmjedg5PcCTwEPArsraqVSY4GLgOWAXcCZ1TVfdMbU5I0GTPxzP3Pq2pFVa3s7p8LbKqq5cCm7r4kaQ7NxmWZVcCG7vYG4PRZOIck6QCmG/cC/ivJDUnWdGsLq2p3d/suYOE0zyFJmqRpXXMHXllVu5L8IXB1kp/2b6yqSlKjHdj9YbAGYOnSpdMcQ5LUb1rP3KtqV/d5D3AlcCJwd5JFAN3nPWMcu76qVlbVyqGhoemMIUkaYcpxT/KsJEfuuw28FrgF2Ais7nZbDXxjukNKkiZnOpdlFgJXJtn3OF+uqm8luR64PMnbgO3AGdMfU5I0GVOOe1X9HHjRKOv3AK+ZzlCSpOnxHaqS1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNmrW4J3ldktuTbEty7mydR5L0ZLMS9ySHAZ8FTgWOB85KcvxsnEuS9GSz9cz9RGBbVf28qn4LfAVYNUvnkiSNMG+WHncxsKPv/k7gZf07JFkDrOnu/jrJ7bM0y6FoAfCrQQ8xnvzzoCfQAPi9ObOeN9aG2Yr7uKpqPbB+UOdvWZLNVbVy0HNII/m9OXdm67LMLmBJ3/1juzVJ0hyYrbhfDyxPclySpwNnAhtn6VySpBFm5bJMVe1Nshb4NnAYcGFV3Tob59KovNylpyq/N+dIqmrQM0iSZpjvUJWkBhl3SWqQcZekBg3sde6aOUn+hN47gBd3S7uAjVW1dXBTSRokn7kf5JL8A71f7xDguu4jwKX+wjY9VSV566BnaJ2vljnIJfkZcEJV/W7E+tOBW6tq+WAmk8aW5H+qaumg52iZl2UOfo8BxwDbR6wv6rZJA5Fky1ibgIVzOcuhyLgf/N4JbEpyB0/8sralwAuAtYMaSqIX8FOA+0asB/jB3I9zaDHuB7mq+laSF9L7Ncv9P1C9vqoeHdxkEv8BHFFVN43ckOR7cz7NIcZr7pLUIF8tI0kNMu6S1CDjLkkNMu6S1CDjLkkN+n9FlP1ETWJfHAAAAABJRU5ErkJggg==\n",
|
382 |
+
"text/plain": [
|
383 |
+
"<Figure size 432x288 with 1 Axes>"
|
384 |
+
]
|
385 |
+
},
|
386 |
+
"metadata": {
|
387 |
+
"needs_background": "light"
|
388 |
+
},
|
389 |
+
"output_type": "display_data"
|
390 |
+
}
|
391 |
+
],
|
392 |
+
"source": [
|
393 |
+
"from sklearn.utils import resample\n",
|
394 |
+
"\n",
|
395 |
+
"print(\"Original Data distribution\")\n",
|
396 |
+
"print(data['class'].value_counts())\n",
|
397 |
+
"\n",
|
398 |
+
"# Down Sample Majority class\n",
|
399 |
+
"down_sample = resample(data[data['class']==0],\n",
|
400 |
+
" replace = True, # sample with replacement\n",
|
401 |
+
" n_samples = data[data['class']==1].shape[0], # to match minority class\n",
|
402 |
+
" random_state=42) # reproducible results\n",
|
403 |
+
"\n",
|
404 |
+
"# Combine majority class with upsampled minority class\n",
|
405 |
+
"train_downsample = pd.concat([data[data['class']==1], down_sample])\n",
|
406 |
+
"\n",
|
407 |
+
"# Display new class counts\n",
|
408 |
+
"print('Sci-Kit Learn : resample : Down Sampled data set')\n",
|
409 |
+
"print(train_downsample['class'].value_counts())\n",
|
410 |
+
"print(\"Num records = \", train_downsample.shape[0])\n",
|
411 |
+
"train_downsample['class'].value_counts().plot(kind='bar', title='Count (target)');"
|
412 |
+
]
|
413 |
+
},
|
414 |
+
{
|
415 |
+
"cell_type": "markdown",
|
416 |
+
"metadata": {},
|
417 |
+
"source": [
|
418 |
+
"#### Over sampling the minority call y=0 (using random sampling)"
|
419 |
+
]
|
420 |
+
},
|
421 |
+
{
|
422 |
+
"cell_type": "code",
|
423 |
+
"execution_count": 28,
|
424 |
+
"metadata": {},
|
425 |
+
"outputs": [
|
426 |
+
{
|
427 |
+
"name": "stdout",
|
428 |
+
"output_type": "stream",
|
429 |
+
"text": [
|
430 |
+
"Random over-sampling:\n",
|
431 |
+
"0 500\n",
|
432 |
+
"1 500\n",
|
433 |
+
"Name: class, dtype: int64\n"
|
434 |
+
]
|
435 |
+
},
|
436 |
+
{
|
437 |
+
"data": {
|
438 |
+
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEFCAYAAAAYKqc0AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/YYfK9AAAACXBIWXMAAAsTAAALEwEAmpwYAAAQKklEQVR4nO3df6zddX3H8edrVECFUaB3FVuwbNQ5zAKaihB/xEk2BefKH8pQp5WwNFkg0TB/MDWKRhdcsoFmzqwZhir+gKFIp0yHKFGj/CgKKKLSMVlbgVZoK8r8gbz3x/kUTq/39t62995DP30+kpPz+X4+n+/5vk97++r3fs73nJOqQpLUl98ZdQGSpJlnuEtShwx3SeqQ4S5JHTLcJalDhrskdchwlyaRZCzJ95M8cdS1TCTJAa2+sVHXoscfw10jleTVSdYm+VmSe5L8Z5Lnz8FxK8kxU0w7D7ikqv6v7XNdkr+e7domM/74VfVL4CMM6pR2YLhrZJKcC1wE/D2wEDgK+Bdg+QjLAgZnxcAK4NIZfMx5M/VYQz4BrGj1So8y3DUSSQ4B3gOcXVWfqaqfV9Wvq+o/qurNbc4BSS5K8uN2u2h7iCV5fZKvj3vMR8/Gk1yS5ENJPp/kwSQ3JPmDNvbVtsut7TeGv5ygxOcCW6tqQ9vnfcALgH9u+/xz6/9AkvVJfprk5iQvGKrn/CRXJLk0yU+B1yc5OslXW01fajVeOrTPiUm+kWRrkluTvGhnx2/1bQFO3P2/DfXIcNeonAQcCFy5kzlvZxBaxwPHAScA79iFY5wBvBs4FFgHvA+gql7Yxo+rqoOq6rIJ9v1j4AfbN6rq7cDXgHPaPue0oZtafYcxOIv+9yQHDj3OcuAKYD7w8TbnRuBw4HzgtdsnJlkEfB54b3u8NwGfTjK2k+MD3MHgz0d6lOGuUTkc+ElVPbyTOa8B3lNVm6pqM4Ogfu1O5o93ZVXd2I7xcQYhPF3zgQenmlRVl1bV/VX1cFX9I3AA8IdDU75ZVZ+tqkeAMeA5wDur6ldV9XVgzdDcvwKurqqrq+qRqroGWAucOkUZD7Z6pUcZ7hqV+4EFU6xDPxW4e2j77tY3XfcOtR8CDtqFfbcAB081KcmbktyRZFuSrcAhwIKhKeuH2k8FHqiqhyYZfxrwyrYks7U93vOBI6Yo42Bg61S1at9iuGtUvgn8EjhtJ3N+zCDwtjuq9QH8HHjS9oEkT5nh+m4Dnj6ub4ePUG3r628BTgcOrar5wDYgk+xzD3BYkicN9R051F4PfKyq5g/dnlxVF0x0/CF/BNw6jeekfYjhrpGoqm3AO4EPJTktyZOSPCHJKUn+oU37JPCOdr35gjZ/+4uPtwLPTHJ8W+M+fxdLuA/4/Z2M3wjMb+vgk+1zMPAwsBmYl+SdwO9O9oBVdTeDZZbzk+yf5CTg5UNTLgVenuQlSfZLcmCSFyVZPFnNrb7DgOt38ly0DzLcNTJtjfpcBi+SbmZw5noO8Nk25b0MwvA24DvAt1ofVfVDBlfbfAm4E9jhyplpOB9Y3ZY/Tp+gtl8BlzBYB9/uA8ArkmxJ8kHgi8AXgB8yWDL6BTsus0zkNQxeTL6/PZfLGPwGQ1WtZ/AC7Nt47M/jzTz273T88QFeDaxu17xLj4pf1iFNrL3z82vAs7a/kWkWjnEZ8P2qetdu7HsAg99gXlhVm2a8OO3VDHdpDiV5DvAA8D/AnzH4LeWkqvr2KOtSf2bjHXOSJvcU4DMMLgXdAPyNwa7Z4Jm7JHXIF1QlqUOGuyR16HGx5r5gwYJasmTJqMuQpL3KzTff/JOqmvDz/B8X4b5kyRLWrl076jIkaa+S5O7JxlyWkaQOGe6S1CHDXZI6ZLhLUocMd0nq0LTCPcmPknwnyS1J1ra+w5Jck+TOdn9o60+SDyZZl+S2JM+ezScgSfptu3Lm/idVdXxVLWvb5wHXVtVS4Nq2DXAKsLTdVgIfnqliJUnTsyfLMsuB1a29mse+UWc58NEauJ7BFx5M9TVhkqQZNN03MRXwX0kK+NeqWgUsrKp72vi9wMLWXsSOX1iwofXdM9RHkpUMzuw56qijdq/6ObbkvM+PuoSu/OiCl426hG74szmzevjZnG64P7+qNib5PeCaJN8fHqyqasE/be0/iFUAy5Yt86MpJWkGTWtZpqo2tvtNwJXACcB925db2v32b4LZyI5f+ru49UmS5siU4Z7kyUkO3t5m8O0x3wXWACvatBXAVa29Bnhdu2rmRGDb0PKNJGkOTGdZZiFwZZLt8z9RVV9IchNweZKzGHw58PYvGb4aOBVYBzwEnDnjVUuSdmrKcK+qu4DjJui/Hzh5gv4Czp6R6iRJu8V3qEpShwx3SeqQ4S5JHTLcJalDhrskdchwl6QOGe6S1CHDXZI6ZLhLUocMd0nqkOEuSR0y3CWpQ4a7JHXIcJekDhnuktQhw12SOmS4S1KHDHdJ6pDhLkkdMtwlqUOGuyR1yHCXpA4Z7pLUIcNdkjpkuEtShwx3SeqQ4S5JHTLcJalDhrskdchwl6QOGe6S1CHDXZI6NO1wT7Jfkm8n+VzbPjrJDUnWJbksyf6t/4C2va6NL5ml2iVJk9iVM/c3AHcMbb8fuLCqjgG2AGe1/rOALa3/wjZPkjSHphXuSRYDLwP+rW0HeDFwRZuyGjittZe3bdr4yW2+JGmOTPfM/SLgLcAjbftwYGtVPdy2NwCLWnsRsB6gjW9r8yVJc2TKcE/y58Cmqrp5Jg+cZGWStUnWbt68eSYfWpL2edM5c38e8BdJfgR8isFyzAeA+UnmtTmLgY2tvRE4EqCNHwLcP/5Bq2pVVS2rqmVjY2N79CQkSTuaMtyr6u+qanFVLQHOAL5cVa8BvgK8ok1bAVzV2mvaNm38y1VVM1q1JGmn9uQ697cC5yZZx2BN/eLWfzFweOs/Fzhvz0qUJO2qeVNPeUxVXQdc19p3ASdMMOcXwCtnoDZJ0m7yHaqS1CHDXZI6ZLhLUocMd0nqkOEuSR0y3CWpQ4a7JHXIcJekDhnuktQhw12SOmS4S1KHDHdJ6pDhLkkdMtwlqUOGuyR1yHCXpA4Z7pLUIcNdkjpkuEtShwx3SeqQ4S5JHTLcJalDhrskdchwl6QOGe6S1CHDXZI6ZLhLUocMd0nqkOEuSR0y3CWpQ4a7JHXIcJekDk0Z7kkOTHJjkluT3J7k3a3/6CQ3JFmX5LIk+7f+A9r2uja+ZJafgyRpnOmcuf8SeHFVHQccD7w0yYnA+4ELq+oYYAtwVpt/FrCl9V/Y5kmS5tCU4V4DP2ubT2i3Al4MXNH6VwOntfbytk0bPzlJZqpgSdLUprXmnmS/JLcAm4BrgP8GtlbVw23KBmBRay8C1gO08W3A4TNYsyRpCtMK96r6TVUdDywGTgCesacHTrIyydokazdv3rynDydJGrJLV8tU1VbgK8BJwPwk89rQYmBja28EjgRo44cA90/wWKuqallVLRsbG9u96iVJE5rO1TJjSea39hOBPwXuYBDyr2jTVgBXtfaatk0b/3JV1QzWLEmawrypp3AEsDrJfgz+M7i8qj6X5HvAp5K8F/g2cHGbfzHwsSTrgAeAM2ahbknSTkwZ7lV1G/CsCfrvYrD+Pr7/F8ArZ6Q6SdJu8R2qktQhw12SOmS4S1KHDHdJ6pDhLkkdMtwlqUOGuyR1yHCXpA4Z7pLUIcNdkjpkuEtShwx3SeqQ4S5JHTLcJalDhrskdchwl6QOGe6S1CHDXZI6ZLhLUocMd0nqkOEuSR0y3CWpQ4a7JHXIcJekDhnuktQhw12SOmS4S1KHDHdJ6pDhLkkdMtwlqUOGuyR1yHCXpA4Z7pLUoSnDPcmRSb6S5HtJbk/yhtZ/WJJrktzZ7g9t/UnywSTrktyW5Nmz/SQkSTuazpn7w8DfVtWxwInA2UmOBc4Drq2qpcC1bRvgFGBpu60EPjzjVUuSdmrKcK+qe6rqW639IHAHsAhYDqxu01YDp7X2cuCjNXA9MD/JETNduCRpcru05p5kCfAs4AZgYVXd04buBRa29iJg/dBuG1qfJGmOTDvckxwEfBp4Y1X9dHisqgqoXTlwkpVJ1iZZu3nz5l3ZVZI0hWmFe5InMAj2j1fVZ1r3fduXW9r9pta/EThyaPfFrW8HVbWqqpZV1bKxsbHdrV+SNIHpXC0T4GLgjqr6p6GhNcCK1l4BXDXU/7p21cyJwLah5RtJ0hyYN405zwNeC3wnyS2t723ABcDlSc4C7gZOb2NXA6cC64CHgDNnsmBJ0tSmDPeq+jqQSYZPnmB+AWfvYV2SpD3gO1QlqUOGuyR1yHCXpA4Z7pLUIcNdkjpkuEtShwx3SeqQ4S5JHTLcJalDhrskdchwl6QOGe6S1CHDXZI6ZLhLUocMd0nqkOEuSR0y3CWpQ4a7JHXIcJekDhnuktQhw12SOmS4S1KHDHdJ6pDhLkkdMtwlqUOGuyR1yHCXpA4Z7pLUIcNdkjpkuEtShwx3SeqQ4S5JHZoy3JN8JMmmJN8d6jssyTVJ7mz3h7b+JPlgknVJbkvy7NksXpI0semcuV8CvHRc33nAtVW1FLi2bQOcAixtt5XAh2emTEnSrpgy3Kvqq8AD47qXA6tbezVw2lD/R2vgemB+kiNmqFZJ0jTt7pr7wqq6p7XvBRa29iJg/dC8Da1PkjSH9vgF1aoqoHZ1vyQrk6xNsnbz5s17WoYkacjuhvt925db2v2m1r8ROHJo3uLW91uqalVVLauqZWNjY7tZhiRpIrsb7muAFa29ArhqqP917aqZE4FtQ8s3kqQ5Mm+qCUk+CbwIWJBkA/Au4ALg8iRnAXcDp7fpVwOnAuuAh4AzZ6FmSdIUpgz3qnrVJEMnTzC3gLP3tChJ0p7xHaqS1CHDXZI6ZLhLUocMd0nqkOEuSR0y3CWpQ4a7JHXIcJekDhnuktQhw12SOmS4S1KHDHdJ6pDhLkkdMtwlqUOGuyR1yHCXpA4Z7pLUIcNdkjpkuEtShwx3SeqQ4S5JHTLcJalDhrskdchwl6QOGe6S1CHDXZI6ZLhLUocMd0nqkOEuSR0y3CWpQ4a7JHXIcJekDhnuktShWQn3JC9N8oMk65KcNxvHkCRNbsbDPcl+wIeAU4BjgVclOXamjyNJmtxsnLmfAKyrqruq6lfAp4Dls3AcSdIk5s3CYy4C1g9tbwCeO35SkpXAyrb5syQ/mIVa9lULgJ+Muoip5P2jrkAj4M/mzHraZAOzEe7TUlWrgFWjOn7PkqytqmWjrkMaz5/NuTMbyzIbgSOHthe3PknSHJmNcL8JWJrk6CT7A2cAa2bhOJKkScz4skxVPZzkHOCLwH7AR6rq9pk+jnbK5S49XvmzOUdSVaOuQZI0w3yHqiR1yHCXpA4Z7pLUoZFd566ZkeQZDN4BvKh1bQTWVNUdo6tK0qh55r4XS/JWBh/vEODGdgvwST+wTY9nSc4cdQ2982qZvViSHwLPrKpfj+vfH7i9qpaOpjJp55L8b1UdNeo6euayzN7tEeCpwN3j+o9oY9LIJLltsiFg4VzWsi8y3PdubwSuTXInj31Y21HAMcA5oypKahYCLwG2jOsP8I25L2ffYrjvxarqC0mezuBjlodfUL2pqn4zusokAD4HHFRVt4wfSHLdnFezj3HNXZI65NUyktQhw12SOmS4S1KHDHdJ6pDhLkkd+n+rPQ6LBFTagQAAAABJRU5ErkJggg==\n",
|
439 |
+
"text/plain": [
|
440 |
+
"<Figure size 432x288 with 1 Axes>"
|
441 |
+
]
|
442 |
+
},
|
443 |
+
"metadata": {
|
444 |
+
"needs_background": "light"
|
445 |
+
},
|
446 |
+
"output_type": "display_data"
|
447 |
+
}
|
448 |
+
],
|
449 |
+
"source": [
|
450 |
+
"df_class_1_over = df_class_1.sample(count_class_0, replace=True)\n",
|
451 |
+
"\n",
|
452 |
+
"df_test_over = pd.concat([df_class_0, df_class_1_over], axis=0)\n",
|
453 |
+
"\n",
|
454 |
+
"print('Random over-sampling:')\n",
|
455 |
+
"print(df_test_over['class'].value_counts())\n",
|
456 |
+
"\n",
|
457 |
+
"df_test_over['class'].value_counts().plot(kind='bar', title='Count (target)');"
|
458 |
+
]
|
459 |
+
},
|
460 |
+
{
|
461 |
+
"cell_type": "markdown",
|
462 |
+
"metadata": {},
|
463 |
+
"source": [
|
464 |
+
"#### Over sampling the minority call y=0 using SMOTE"
|
465 |
+
]
|
466 |
+
},
|
467 |
+
{
|
468 |
+
"cell_type": "code",
|
469 |
+
"execution_count": 31,
|
470 |
+
"metadata": {},
|
471 |
+
"outputs": [
|
472 |
+
{
|
473 |
+
"name": "stdout",
|
474 |
+
"output_type": "stream",
|
475 |
+
"text": [
|
476 |
+
"0 500\n",
|
477 |
+
"1 268\n",
|
478 |
+
"Name: class, dtype: int64\n",
|
479 |
+
"SMOTE over-sampling:\n",
|
480 |
+
"0 500\n",
|
481 |
+
"1 500\n",
|
482 |
+
"Name: class, dtype: int64\n"
|
483 |
+
]
|
484 |
+
},
|
485 |
+
{
|
486 |
+
"data": {
|
487 |
+
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEFCAYAAAAYKqc0AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/YYfK9AAAACXBIWXMAAAsTAAALEwEAmpwYAAAQKklEQVR4nO3df6zddX3H8edrVECFUaB3FVuwbNQ5zAKaihB/xEk2BefKH8pQp5WwNFkg0TB/MDWKRhdcsoFmzqwZhir+gKFIp0yHKFGj/CgKKKLSMVlbgVZoK8r8gbz3x/kUTq/39t62995DP30+kpPz+X4+n+/5vk97++r3fs73nJOqQpLUl98ZdQGSpJlnuEtShwx3SeqQ4S5JHTLcJalDhrskdchwlyaRZCzJ95M8cdS1TCTJAa2+sVHXoscfw10jleTVSdYm+VmSe5L8Z5Lnz8FxK8kxU0w7D7ikqv6v7XNdkr+e7domM/74VfVL4CMM6pR2YLhrZJKcC1wE/D2wEDgK+Bdg+QjLAgZnxcAK4NIZfMx5M/VYQz4BrGj1So8y3DUSSQ4B3gOcXVWfqaqfV9Wvq+o/qurNbc4BSS5K8uN2u2h7iCV5fZKvj3vMR8/Gk1yS5ENJPp/kwSQ3JPmDNvbVtsut7TeGv5ygxOcCW6tqQ9vnfcALgH9u+/xz6/9AkvVJfprk5iQvGKrn/CRXJLk0yU+B1yc5OslXW01fajVeOrTPiUm+kWRrkluTvGhnx2/1bQFO3P2/DfXIcNeonAQcCFy5kzlvZxBaxwPHAScA79iFY5wBvBs4FFgHvA+gql7Yxo+rqoOq6rIJ9v1j4AfbN6rq7cDXgHPaPue0oZtafYcxOIv+9yQHDj3OcuAKYD7w8TbnRuBw4HzgtdsnJlkEfB54b3u8NwGfTjK2k+MD3MHgz0d6lOGuUTkc+ElVPbyTOa8B3lNVm6pqM4Ogfu1O5o93ZVXd2I7xcQYhPF3zgQenmlRVl1bV/VX1cFX9I3AA8IdDU75ZVZ+tqkeAMeA5wDur6ldV9XVgzdDcvwKurqqrq+qRqroGWAucOkUZD7Z6pUcZ7hqV+4EFU6xDPxW4e2j77tY3XfcOtR8CDtqFfbcAB081KcmbktyRZFuSrcAhwIKhKeuH2k8FHqiqhyYZfxrwyrYks7U93vOBI6Yo42Bg61S1at9iuGtUvgn8EjhtJ3N+zCDwtjuq9QH8HHjS9oEkT5nh+m4Dnj6ub4ePUG3r628BTgcOrar5wDYgk+xzD3BYkicN9R051F4PfKyq5g/dnlxVF0x0/CF/BNw6jeekfYjhrpGoqm3AO4EPJTktyZOSPCHJKUn+oU37JPCOdr35gjZ/+4uPtwLPTHJ8W+M+fxdLuA/4/Z2M3wjMb+vgk+1zMPAwsBmYl+SdwO9O9oBVdTeDZZbzk+yf5CTg5UNTLgVenuQlSfZLcmCSFyVZPFnNrb7DgOt38ly0DzLcNTJtjfpcBi+SbmZw5noO8Nk25b0MwvA24DvAt1ofVfVDBlfbfAm4E9jhyplpOB9Y3ZY/Tp+gtl8BlzBYB9/uA8ArkmxJ8kHgi8AXgB8yWDL6BTsus0zkNQxeTL6/PZfLGPwGQ1WtZ/AC7Nt47M/jzTz273T88QFeDaxu17xLj4pf1iFNrL3z82vAs7a/kWkWjnEZ8P2qetdu7HsAg99gXlhVm2a8OO3VDHdpDiV5DvAA8D/AnzH4LeWkqvr2KOtSf2bjHXOSJvcU4DMMLgXdAPyNwa7Z4Jm7JHXIF1QlqUOGuyR16HGx5r5gwYJasmTJqMuQpL3KzTff/JOqmvDz/B8X4b5kyRLWrl076jIkaa+S5O7JxlyWkaQOGe6S1CHDXZI6ZLhLUocMd0nq0LTCPcmPknwnyS1J1ra+w5Jck+TOdn9o60+SDyZZl+S2JM+ezScgSfptu3Lm/idVdXxVLWvb5wHXVtVS4Nq2DXAKsLTdVgIfnqliJUnTsyfLMsuB1a29mse+UWc58NEauJ7BFx5M9TVhkqQZNN03MRXwX0kK+NeqWgUsrKp72vi9wMLWXsSOX1iwofXdM9RHkpUMzuw56qijdq/6ObbkvM+PuoSu/OiCl426hG74szmzevjZnG64P7+qNib5PeCaJN8fHqyqasE/be0/iFUAy5Yt86MpJWkGTWtZpqo2tvtNwJXACcB925db2v32b4LZyI5f+ru49UmS5siU4Z7kyUkO3t5m8O0x3wXWACvatBXAVa29Bnhdu2rmRGDb0PKNJGkOTGdZZiFwZZLt8z9RVV9IchNweZKzGHw58PYvGb4aOBVYBzwEnDnjVUuSdmrKcK+qu4DjJui/Hzh5gv4Czp6R6iRJu8V3qEpShwx3SeqQ4S5JHTLcJalDhrskdchwl6QOGe6S1CHDXZI6ZLhLUocMd0nqkOEuSR0y3CWpQ4a7JHXIcJekDhnuktQhw12SOmS4S1KHDHdJ6pDhLkkdMtwlqUOGuyR1yHCXpA4Z7pLUIcNdkjpkuEtShwx3SeqQ4S5JHTLcJalDhrskdchwl6QOGe6S1CHDXZI6NO1wT7Jfkm8n+VzbPjrJDUnWJbksyf6t/4C2va6NL5ml2iVJk9iVM/c3AHcMbb8fuLCqjgG2AGe1/rOALa3/wjZPkjSHphXuSRYDLwP+rW0HeDFwRZuyGjittZe3bdr4yW2+JGmOTPfM/SLgLcAjbftwYGtVPdy2NwCLWnsRsB6gjW9r8yVJc2TKcE/y58Cmqrp5Jg+cZGWStUnWbt68eSYfWpL2edM5c38e8BdJfgR8isFyzAeA+UnmtTmLgY2tvRE4EqCNHwLcP/5Bq2pVVS2rqmVjY2N79CQkSTuaMtyr6u+qanFVLQHOAL5cVa8BvgK8ok1bAVzV2mvaNm38y1VVM1q1JGmn9uQ697cC5yZZx2BN/eLWfzFweOs/Fzhvz0qUJO2qeVNPeUxVXQdc19p3ASdMMOcXwCtnoDZJ0m7yHaqS1CHDXZI6ZLhLUocMd0nqkOEuSR0y3CWpQ4a7JHXIcJekDhnuktQhw12SOmS4S1KHDHdJ6pDhLkkdMtwlqUOGuyR1yHCXpA4Z7pLUIcNdkjpkuEtShwx3SeqQ4S5JHTLcJalDhrskdchwl6QOGe6S1CHDXZI6ZLhLUocMd0nqkOEuSR0y3CWpQ4a7JHXIcJekDk0Z7kkOTHJjkluT3J7k3a3/6CQ3JFmX5LIk+7f+A9r2uja+ZJafgyRpnOmcuf8SeHFVHQccD7w0yYnA+4ELq+oYYAtwVpt/FrCl9V/Y5kmS5tCU4V4DP2ubT2i3Al4MXNH6VwOntfbytk0bPzlJZqpgSdLUprXmnmS/JLcAm4BrgP8GtlbVw23KBmBRay8C1gO08W3A4TNYsyRpCtMK96r6TVUdDywGTgCesacHTrIyydokazdv3rynDydJGrJLV8tU1VbgK8BJwPwk89rQYmBja28EjgRo44cA90/wWKuqallVLRsbG9u96iVJE5rO1TJjSea39hOBPwXuYBDyr2jTVgBXtfaatk0b/3JV1QzWLEmawrypp3AEsDrJfgz+M7i8qj6X5HvAp5K8F/g2cHGbfzHwsSTrgAeAM2ahbknSTkwZ7lV1G/CsCfrvYrD+Pr7/F8ArZ6Q6SdJu8R2qktQhw12SOmS4S1KHDHdJ6pDhLkkdMtwlqUOGuyR1yHCXpA4Z7pLUIcNdkjpkuEtShwx3SeqQ4S5JHTLcJalDhrskdchwl6QOGe6S1CHDXZI6ZLhLUocMd0nqkOEuSR0y3CWpQ4a7JHXIcJekDhnuktQhw12SOmS4S1KHDHdJ6pDhLkkdMtwlqUOGuyR1yHCXpA4Z7pLUoSnDPcmRSb6S5HtJbk/yhtZ/WJJrktzZ7g9t/UnywSTrktyW5Nmz/SQkSTuazpn7w8DfVtWxwInA2UmOBc4Drq2qpcC1bRvgFGBpu60EPjzjVUuSdmrKcK+qe6rqW639IHAHsAhYDqxu01YDp7X2cuCjNXA9MD/JETNduCRpcru05p5kCfAs4AZgYVXd04buBRa29iJg/dBuG1qfJGmOTDvckxwEfBp4Y1X9dHisqgqoXTlwkpVJ1iZZu3nz5l3ZVZI0hWmFe5InMAj2j1fVZ1r3fduXW9r9pta/EThyaPfFrW8HVbWqqpZV1bKxsbHdrV+SNIHpXC0T4GLgjqr6p6GhNcCK1l4BXDXU/7p21cyJwLah5RtJ0hyYN405zwNeC3wnyS2t723ABcDlSc4C7gZOb2NXA6cC64CHgDNnsmBJ0tSmDPeq+jqQSYZPnmB+AWfvYV2SpD3gO1QlqUOGuyR1yHCXpA4Z7pLUIcNdkjpkuEtShwx3SeqQ4S5JHTLcJalDhrskdchwl6QOGe6S1CHDXZI6ZLhLUocMd0nqkOEuSR0y3CWpQ4a7JHXIcJekDhnuktQhw12SOmS4S1KHDHdJ6pDhLkkdMtwlqUOGuyR1yHCXpA4Z7pLUIcNdkjpkuEtShwx3SeqQ4S5JHZoy3JN8JMmmJN8d6jssyTVJ7mz3h7b+JPlgknVJbkvy7NksXpI0semcuV8CvHRc33nAtVW1FLi2bQOcAixtt5XAh2emTEnSrpgy3Kvqq8AD47qXA6tbezVw2lD/R2vgemB+kiNmqFZJ0jTt7pr7wqq6p7XvBRa29iJg/dC8Da1PkjSH9vgF1aoqoHZ1vyQrk6xNsnbz5s17WoYkacjuhvt925db2v2m1r8ROHJo3uLW91uqalVVLauqZWNjY7tZhiRpIrsb7muAFa29ArhqqP917aqZE4FtQ8s3kqQ5Mm+qCUk+CbwIWJBkA/Au4ALg8iRnAXcDp7fpVwOnAuuAh4AzZ6FmSdIUpgz3qnrVJEMnTzC3gLP3tChJ0p7xHaqS1CHDXZI6ZLhLUocMd0nqkOEuSR0y3CWpQ4a7JHXIcJekDhnuktQhw12SOmS4S1KHDHdJ6pDhLkkdMtwlqUOGuyR1yHCXpA4Z7pLUIcNdkjpkuEtShwx3SeqQ4S5JHTLcJalDhrskdchwl6QOGe6S1CHDXZI6ZLhLUocMd0nqkOEuSR0y3CWpQ4a7JHXIcJekDhnuktShWQn3JC9N8oMk65KcNxvHkCRNbsbDPcl+wIeAU4BjgVclOXamjyNJmtxsnLmfAKyrqruq6lfAp4Dls3AcSdIk5s3CYy4C1g9tbwCeO35SkpXAyrb5syQ/mIVa9lULgJ+Muoip5P2jrkAj4M/mzHraZAOzEe7TUlWrgFWjOn7PkqytqmWjrkMaz5/NuTMbyzIbgSOHthe3PknSHJmNcL8JWJrk6CT7A2cAa2bhOJKkScz4skxVPZzkHOCLwH7AR6rq9pk+jnbK5S49XvmzOUdSVaOuQZI0w3yHqiR1yHCXpA4Z7pLUoZFd566ZkeQZDN4BvKh1bQTWVNUdo6tK0qh55r4XS/JWBh/vEODGdgvwST+wTY9nSc4cdQ2982qZvViSHwLPrKpfj+vfH7i9qpaOpjJp55L8b1UdNeo6euayzN7tEeCpwN3j+o9oY9LIJLltsiFg4VzWsi8y3PdubwSuTXInj31Y21HAMcA5oypKahYCLwG2jOsP8I25L2ffYrjvxarqC0mezuBjlodfUL2pqn4zusokAD4HHFRVt4wfSHLdnFezj3HNXZI65NUyktQhw12SOmS4S1KHDHdJ6pDhLkkd+n+rPQ6LBFTagQAAAABJRU5ErkJggg==\n",
|
488 |
+
"text/plain": [
|
489 |
+
"<Figure size 432x288 with 1 Axes>"
|
490 |
+
]
|
491 |
+
},
|
492 |
+
"metadata": {
|
493 |
+
"needs_background": "light"
|
494 |
+
},
|
495 |
+
"output_type": "display_data"
|
496 |
+
}
|
497 |
+
],
|
498 |
+
"source": [
|
499 |
+
"from imblearn.over_sampling import SMOTE\n",
|
500 |
+
"\n",
|
501 |
+
"print(data['class'].value_counts())\n",
|
502 |
+
"X = data.drop('class', axis=1)\n",
|
503 |
+
"Y = data['class']\n",
|
504 |
+
"\n",
|
505 |
+
"sm = SMOTE(random_state=42)\n",
|
506 |
+
"X_res, Y_res = sm.fit_resample(X, Y)\n",
|
507 |
+
"\n",
|
508 |
+
"df_smote_over = pd.concat([pd.DataFrame(X_res), pd.DataFrame(Y_res, columns=['class'])], axis=1)\n",
|
509 |
+
"\n",
|
510 |
+
"print('SMOTE over-sampling:')\n",
|
511 |
+
"print(df_smote_over['class'].value_counts())\n",
|
512 |
+
"\n",
|
513 |
+
"df_smote_over['class'].value_counts().plot(kind='bar', title='Count (target)');"
|
514 |
+
]
|
515 |
+
}
|
516 |
+
],
|
517 |
+
"metadata": {
|
518 |
+
"kernelspec": {
|
519 |
+
"display_name": "Python 3",
|
520 |
+
"language": "python",
|
521 |
+
"name": "python3"
|
522 |
+
},
|
523 |
+
"language_info": {
|
524 |
+
"codemirror_mode": {
|
525 |
+
"name": "ipython",
|
526 |
+
"version": 3
|
527 |
+
},
|
528 |
+
"file_extension": ".py",
|
529 |
+
"mimetype": "text/x-python",
|
530 |
+
"name": "python",
|
531 |
+
"nbconvert_exporter": "python",
|
532 |
+
"pygments_lexer": "ipython3",
|
533 |
+
"version": "3.7.3"
|
534 |
+
}
|
535 |
+
},
|
536 |
+
"nbformat": 4,
|
537 |
+
"nbformat_minor": 2
|
538 |
+
}
|
Data Analitics/Week 4/L3-Data-Exploration-Preparation.pptx
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:8ec208d6c7d7f4026f5f36286b088708e382557babc48fd23a4af2e952f56ffd
|
3 |
+
size 6927381
|
Data Analitics/Week 4/Lab3-Data-Preparation.pptx
ADDED
Binary file (56 kB). View file
|
|
Data Analitics/Week 4/TU257-Lab3-1-DataExploration.ipynb
ADDED
The diff for this file is too large to render.
See raw diff
|
|
Data Analitics/Week 4/TU257-Lab3-2-Data-Transformations.ipynb
ADDED
The diff for this file is too large to render.
See raw diff
|
|
Data Analitics/Week 4/TU257-Lab3-3-Scaling-Data.ipynb
ADDED
@@ -0,0 +1,1996 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"cells": [
|
3 |
+
{
|
4 |
+
"cell_type": "markdown",
|
5 |
+
"metadata": {},
|
6 |
+
"source": [
|
7 |
+
"### In this demo and lab exercise we will look at scaling numerical data\n",
|
8 |
+
"### The examples given below illustrate using a min-max scaler\n",
|
9 |
+
"\n",
|
10 |
+
"\n",
|
11 |
+
"#### Let's beging with importing the data set\n",
|
12 |
+
"#### Download this from webpage"
|
13 |
+
]
|
14 |
+
},
|
15 |
+
{
|
16 |
+
"cell_type": "code",
|
17 |
+
"execution_count": 1,
|
18 |
+
"metadata": {},
|
19 |
+
"outputs": [],
|
20 |
+
"source": [
|
21 |
+
"import pandas as pd\n",
|
22 |
+
"import numpy as np \n",
|
23 |
+
"import seaborn as sns\n",
|
24 |
+
"import matplotlib.pyplot as plt\n",
|
25 |
+
"\n",
|
26 |
+
"from sklearn.preprocessing import MinMaxScaler\n"
|
27 |
+
]
|
28 |
+
},
|
29 |
+
{
|
30 |
+
"cell_type": "code",
|
31 |
+
"execution_count": 2,
|
32 |
+
"metadata": {},
|
33 |
+
"outputs": [],
|
34 |
+
"source": [
|
35 |
+
"#read in the data set\n",
|
36 |
+
"#NB. you will need to edit this command to change it to the directory you are using\n",
|
37 |
+
"data = pd.read_csv('/Users/brendan.tierney/Dropbox/4-Datasets/small_purchases.csv', )"
|
38 |
+
]
|
39 |
+
},
|
40 |
+
{
|
41 |
+
"cell_type": "code",
|
42 |
+
"execution_count": 3,
|
43 |
+
"metadata": {},
|
44 |
+
"outputs": [
|
45 |
+
{
|
46 |
+
"name": "stdout",
|
47 |
+
"output_type": "stream",
|
48 |
+
"text": [
|
49 |
+
"<class 'pandas.core.frame.DataFrame'>\n",
|
50 |
+
"RangeIndex: 10 entries, 0 to 9\n",
|
51 |
+
"Data columns (total 4 columns):\n",
|
52 |
+
" # Column Non-Null Count Dtype \n",
|
53 |
+
"--- ------ -------------- ----- \n",
|
54 |
+
" 0 Country 10 non-null object \n",
|
55 |
+
" 1 Age 9 non-null float64\n",
|
56 |
+
" 2 Salary 9 non-null float64\n",
|
57 |
+
" 3 Purchased 10 non-null object \n",
|
58 |
+
"dtypes: float64(2), object(2)\n",
|
59 |
+
"memory usage: 448.0+ bytes\n"
|
60 |
+
]
|
61 |
+
}
|
62 |
+
],
|
63 |
+
"source": [
|
64 |
+
"#Display basic information about the Pandas dataframe\n",
|
65 |
+
"data.info()"
|
66 |
+
]
|
67 |
+
},
|
68 |
+
{
|
69 |
+
"cell_type": "code",
|
70 |
+
"execution_count": 4,
|
71 |
+
"metadata": {},
|
72 |
+
"outputs": [
|
73 |
+
{
|
74 |
+
"data": {
|
75 |
+
"text/plain": [
|
76 |
+
"(10, 4)"
|
77 |
+
]
|
78 |
+
},
|
79 |
+
"execution_count": 4,
|
80 |
+
"metadata": {},
|
81 |
+
"output_type": "execute_result"
|
82 |
+
}
|
83 |
+
],
|
84 |
+
"source": [
|
85 |
+
"#How many rows and columns does the dataframe have?\n",
|
86 |
+
"data.shape"
|
87 |
+
]
|
88 |
+
},
|
89 |
+
{
|
90 |
+
"cell_type": "code",
|
91 |
+
"execution_count": 5,
|
92 |
+
"metadata": {},
|
93 |
+
"outputs": [
|
94 |
+
{
|
95 |
+
"data": {
|
96 |
+
"text/html": [
|
97 |
+
"<div>\n",
|
98 |
+
"<style scoped>\n",
|
99 |
+
" .dataframe tbody tr th:only-of-type {\n",
|
100 |
+
" vertical-align: middle;\n",
|
101 |
+
" }\n",
|
102 |
+
"\n",
|
103 |
+
" .dataframe tbody tr th {\n",
|
104 |
+
" vertical-align: top;\n",
|
105 |
+
" }\n",
|
106 |
+
"\n",
|
107 |
+
" .dataframe thead th {\n",
|
108 |
+
" text-align: right;\n",
|
109 |
+
" }\n",
|
110 |
+
"</style>\n",
|
111 |
+
"<table border=\"1\" class=\"dataframe\">\n",
|
112 |
+
" <thead>\n",
|
113 |
+
" <tr style=\"text-align: right;\">\n",
|
114 |
+
" <th></th>\n",
|
115 |
+
" <th>Country</th>\n",
|
116 |
+
" <th>Age</th>\n",
|
117 |
+
" <th>Salary</th>\n",
|
118 |
+
" <th>Purchased</th>\n",
|
119 |
+
" </tr>\n",
|
120 |
+
" </thead>\n",
|
121 |
+
" <tbody>\n",
|
122 |
+
" <tr>\n",
|
123 |
+
" <th>0</th>\n",
|
124 |
+
" <td>France</td>\n",
|
125 |
+
" <td>44.0</td>\n",
|
126 |
+
" <td>27000.0</td>\n",
|
127 |
+
" <td>No</td>\n",
|
128 |
+
" </tr>\n",
|
129 |
+
" <tr>\n",
|
130 |
+
" <th>1</th>\n",
|
131 |
+
" <td>Spain</td>\n",
|
132 |
+
" <td>27.0</td>\n",
|
133 |
+
" <td>48000.0</td>\n",
|
134 |
+
" <td>Yes</td>\n",
|
135 |
+
" </tr>\n",
|
136 |
+
" <tr>\n",
|
137 |
+
" <th>2</th>\n",
|
138 |
+
" <td>Germany</td>\n",
|
139 |
+
" <td>30.0</td>\n",
|
140 |
+
" <td>54000.0</td>\n",
|
141 |
+
" <td>No</td>\n",
|
142 |
+
" </tr>\n",
|
143 |
+
" <tr>\n",
|
144 |
+
" <th>3</th>\n",
|
145 |
+
" <td>Spain</td>\n",
|
146 |
+
" <td>38.0</td>\n",
|
147 |
+
" <td>61000.0</td>\n",
|
148 |
+
" <td>No</td>\n",
|
149 |
+
" </tr>\n",
|
150 |
+
" <tr>\n",
|
151 |
+
" <th>4</th>\n",
|
152 |
+
" <td>Germany</td>\n",
|
153 |
+
" <td>40.0</td>\n",
|
154 |
+
" <td>NaN</td>\n",
|
155 |
+
" <td>Yes</td>\n",
|
156 |
+
" </tr>\n",
|
157 |
+
" <tr>\n",
|
158 |
+
" <th>5</th>\n",
|
159 |
+
" <td>France</td>\n",
|
160 |
+
" <td>35.0</td>\n",
|
161 |
+
" <td>58000.0</td>\n",
|
162 |
+
" <td>Yes</td>\n",
|
163 |
+
" </tr>\n",
|
164 |
+
" <tr>\n",
|
165 |
+
" <th>6</th>\n",
|
166 |
+
" <td>Spain</td>\n",
|
167 |
+
" <td>NaN</td>\n",
|
168 |
+
" <td>52000.0</td>\n",
|
169 |
+
" <td>No</td>\n",
|
170 |
+
" </tr>\n",
|
171 |
+
" <tr>\n",
|
172 |
+
" <th>7</th>\n",
|
173 |
+
" <td>France</td>\n",
|
174 |
+
" <td>48.0</td>\n",
|
175 |
+
" <td>79000.0</td>\n",
|
176 |
+
" <td>Yes</td>\n",
|
177 |
+
" </tr>\n",
|
178 |
+
" <tr>\n",
|
179 |
+
" <th>8</th>\n",
|
180 |
+
" <td>Germany</td>\n",
|
181 |
+
" <td>50.0</td>\n",
|
182 |
+
" <td>83000.0</td>\n",
|
183 |
+
" <td>No</td>\n",
|
184 |
+
" </tr>\n",
|
185 |
+
" <tr>\n",
|
186 |
+
" <th>9</th>\n",
|
187 |
+
" <td>France</td>\n",
|
188 |
+
" <td>37.0</td>\n",
|
189 |
+
" <td>67000.0</td>\n",
|
190 |
+
" <td>Yes</td>\n",
|
191 |
+
" </tr>\n",
|
192 |
+
" </tbody>\n",
|
193 |
+
"</table>\n",
|
194 |
+
"</div>"
|
195 |
+
],
|
196 |
+
"text/plain": [
|
197 |
+
" Country Age Salary Purchased\n",
|
198 |
+
"0 France 44.0 27000.0 No\n",
|
199 |
+
"1 Spain 27.0 48000.0 Yes\n",
|
200 |
+
"2 Germany 30.0 54000.0 No\n",
|
201 |
+
"3 Spain 38.0 61000.0 No\n",
|
202 |
+
"4 Germany 40.0 NaN Yes\n",
|
203 |
+
"5 France 35.0 58000.0 Yes\n",
|
204 |
+
"6 Spain NaN 52000.0 No\n",
|
205 |
+
"7 France 48.0 79000.0 Yes\n",
|
206 |
+
"8 Germany 50.0 83000.0 No\n",
|
207 |
+
"9 France 37.0 67000.0 Yes"
|
208 |
+
]
|
209 |
+
},
|
210 |
+
"execution_count": 5,
|
211 |
+
"metadata": {},
|
212 |
+
"output_type": "execute_result"
|
213 |
+
}
|
214 |
+
],
|
215 |
+
"source": [
|
216 |
+
"#Display the first 10 rows\n",
|
217 |
+
"#Question: How many rows does the dataframe contain\n",
|
218 |
+
"#Question: Modify the code to display all the data\n",
|
219 |
+
"\n",
|
220 |
+
"data.head(10)"
|
221 |
+
]
|
222 |
+
},
|
223 |
+
{
|
224 |
+
"cell_type": "code",
|
225 |
+
"execution_count": 6,
|
226 |
+
"metadata": {},
|
227 |
+
"outputs": [
|
228 |
+
{
|
229 |
+
"data": {
|
230 |
+
"text/html": [
|
231 |
+
"<div>\n",
|
232 |
+
"<style scoped>\n",
|
233 |
+
" .dataframe tbody tr th:only-of-type {\n",
|
234 |
+
" vertical-align: middle;\n",
|
235 |
+
" }\n",
|
236 |
+
"\n",
|
237 |
+
" .dataframe tbody tr th {\n",
|
238 |
+
" vertical-align: top;\n",
|
239 |
+
" }\n",
|
240 |
+
"\n",
|
241 |
+
" .dataframe thead th {\n",
|
242 |
+
" text-align: right;\n",
|
243 |
+
" }\n",
|
244 |
+
"</style>\n",
|
245 |
+
"<table border=\"1\" class=\"dataframe\">\n",
|
246 |
+
" <thead>\n",
|
247 |
+
" <tr style=\"text-align: right;\">\n",
|
248 |
+
" <th></th>\n",
|
249 |
+
" <th>Age</th>\n",
|
250 |
+
" <th>Salary</th>\n",
|
251 |
+
" </tr>\n",
|
252 |
+
" </thead>\n",
|
253 |
+
" <tbody>\n",
|
254 |
+
" <tr>\n",
|
255 |
+
" <th>count</th>\n",
|
256 |
+
" <td>9.000000</td>\n",
|
257 |
+
" <td>9.000000</td>\n",
|
258 |
+
" </tr>\n",
|
259 |
+
" <tr>\n",
|
260 |
+
" <th>mean</th>\n",
|
261 |
+
" <td>38.777778</td>\n",
|
262 |
+
" <td>58777.777778</td>\n",
|
263 |
+
" </tr>\n",
|
264 |
+
" <tr>\n",
|
265 |
+
" <th>std</th>\n",
|
266 |
+
" <td>7.693793</td>\n",
|
267 |
+
" <td>16820.952543</td>\n",
|
268 |
+
" </tr>\n",
|
269 |
+
" <tr>\n",
|
270 |
+
" <th>min</th>\n",
|
271 |
+
" <td>27.000000</td>\n",
|
272 |
+
" <td>27000.000000</td>\n",
|
273 |
+
" </tr>\n",
|
274 |
+
" <tr>\n",
|
275 |
+
" <th>25%</th>\n",
|
276 |
+
" <td>35.000000</td>\n",
|
277 |
+
" <td>52000.000000</td>\n",
|
278 |
+
" </tr>\n",
|
279 |
+
" <tr>\n",
|
280 |
+
" <th>50%</th>\n",
|
281 |
+
" <td>38.000000</td>\n",
|
282 |
+
" <td>58000.000000</td>\n",
|
283 |
+
" </tr>\n",
|
284 |
+
" <tr>\n",
|
285 |
+
" <th>75%</th>\n",
|
286 |
+
" <td>44.000000</td>\n",
|
287 |
+
" <td>67000.000000</td>\n",
|
288 |
+
" </tr>\n",
|
289 |
+
" <tr>\n",
|
290 |
+
" <th>max</th>\n",
|
291 |
+
" <td>50.000000</td>\n",
|
292 |
+
" <td>83000.000000</td>\n",
|
293 |
+
" </tr>\n",
|
294 |
+
" </tbody>\n",
|
295 |
+
"</table>\n",
|
296 |
+
"</div>"
|
297 |
+
],
|
298 |
+
"text/plain": [
|
299 |
+
" Age Salary\n",
|
300 |
+
"count 9.000000 9.000000\n",
|
301 |
+
"mean 38.777778 58777.777778\n",
|
302 |
+
"std 7.693793 16820.952543\n",
|
303 |
+
"min 27.000000 27000.000000\n",
|
304 |
+
"25% 35.000000 52000.000000\n",
|
305 |
+
"50% 38.000000 58000.000000\n",
|
306 |
+
"75% 44.000000 67000.000000\n",
|
307 |
+
"max 50.000000 83000.000000"
|
308 |
+
]
|
309 |
+
},
|
310 |
+
"execution_count": 6,
|
311 |
+
"metadata": {},
|
312 |
+
"output_type": "execute_result"
|
313 |
+
}
|
314 |
+
],
|
315 |
+
"source": [
|
316 |
+
"#Crate summary statisics about the data in the dataframe\n",
|
317 |
+
"#This only provides summary statistics for Numerical data\n",
|
318 |
+
"data.describe()"
|
319 |
+
]
|
320 |
+
},
|
321 |
+
{
|
322 |
+
"cell_type": "code",
|
323 |
+
"execution_count": 7,
|
324 |
+
"metadata": {},
|
325 |
+
"outputs": [
|
326 |
+
{
|
327 |
+
"data": {
|
328 |
+
"text/html": [
|
329 |
+
"<div>\n",
|
330 |
+
"<style scoped>\n",
|
331 |
+
" .dataframe tbody tr th:only-of-type {\n",
|
332 |
+
" vertical-align: middle;\n",
|
333 |
+
" }\n",
|
334 |
+
"\n",
|
335 |
+
" .dataframe tbody tr th {\n",
|
336 |
+
" vertical-align: top;\n",
|
337 |
+
" }\n",
|
338 |
+
"\n",
|
339 |
+
" .dataframe thead th {\n",
|
340 |
+
" text-align: right;\n",
|
341 |
+
" }\n",
|
342 |
+
"</style>\n",
|
343 |
+
"<table border=\"1\" class=\"dataframe\">\n",
|
344 |
+
" <thead>\n",
|
345 |
+
" <tr style=\"text-align: right;\">\n",
|
346 |
+
" <th></th>\n",
|
347 |
+
" <th>count</th>\n",
|
348 |
+
" <th>mean</th>\n",
|
349 |
+
" <th>std</th>\n",
|
350 |
+
" <th>min</th>\n",
|
351 |
+
" <th>25%</th>\n",
|
352 |
+
" <th>50%</th>\n",
|
353 |
+
" <th>75%</th>\n",
|
354 |
+
" <th>max</th>\n",
|
355 |
+
" </tr>\n",
|
356 |
+
" </thead>\n",
|
357 |
+
" <tbody>\n",
|
358 |
+
" <tr>\n",
|
359 |
+
" <th>Age</th>\n",
|
360 |
+
" <td>9.0</td>\n",
|
361 |
+
" <td>38.777778</td>\n",
|
362 |
+
" <td>7.693793</td>\n",
|
363 |
+
" <td>27.0</td>\n",
|
364 |
+
" <td>35.0</td>\n",
|
365 |
+
" <td>38.0</td>\n",
|
366 |
+
" <td>44.0</td>\n",
|
367 |
+
" <td>50.0</td>\n",
|
368 |
+
" </tr>\n",
|
369 |
+
" <tr>\n",
|
370 |
+
" <th>Salary</th>\n",
|
371 |
+
" <td>9.0</td>\n",
|
372 |
+
" <td>58777.777778</td>\n",
|
373 |
+
" <td>16820.952543</td>\n",
|
374 |
+
" <td>27000.0</td>\n",
|
375 |
+
" <td>52000.0</td>\n",
|
376 |
+
" <td>58000.0</td>\n",
|
377 |
+
" <td>67000.0</td>\n",
|
378 |
+
" <td>83000.0</td>\n",
|
379 |
+
" </tr>\n",
|
380 |
+
" </tbody>\n",
|
381 |
+
"</table>\n",
|
382 |
+
"</div>"
|
383 |
+
],
|
384 |
+
"text/plain": [
|
385 |
+
" count mean std min 25% 50% 75% \\\n",
|
386 |
+
"Age 9.0 38.777778 7.693793 27.0 35.0 38.0 44.0 \n",
|
387 |
+
"Salary 9.0 58777.777778 16820.952543 27000.0 52000.0 58000.0 67000.0 \n",
|
388 |
+
"\n",
|
389 |
+
" max \n",
|
390 |
+
"Age 50.0 \n",
|
391 |
+
"Salary 83000.0 "
|
392 |
+
]
|
393 |
+
},
|
394 |
+
"execution_count": 7,
|
395 |
+
"metadata": {},
|
396 |
+
"output_type": "execute_result"
|
397 |
+
}
|
398 |
+
],
|
399 |
+
"source": [
|
400 |
+
"#An alternative was of viewing the summary statistics\n",
|
401 |
+
"data.describe().transpose()"
|
402 |
+
]
|
403 |
+
},
|
404 |
+
{
|
405 |
+
"cell_type": "code",
|
406 |
+
"execution_count": 8,
|
407 |
+
"metadata": {},
|
408 |
+
"outputs": [
|
409 |
+
{
|
410 |
+
"data": {
|
411 |
+
"text/html": [
|
412 |
+
"<div>\n",
|
413 |
+
"<style scoped>\n",
|
414 |
+
" .dataframe tbody tr th:only-of-type {\n",
|
415 |
+
" vertical-align: middle;\n",
|
416 |
+
" }\n",
|
417 |
+
"\n",
|
418 |
+
" .dataframe tbody tr th {\n",
|
419 |
+
" vertical-align: top;\n",
|
420 |
+
" }\n",
|
421 |
+
"\n",
|
422 |
+
" .dataframe thead th {\n",
|
423 |
+
" text-align: right;\n",
|
424 |
+
" }\n",
|
425 |
+
"</style>\n",
|
426 |
+
"<table border=\"1\" class=\"dataframe\">\n",
|
427 |
+
" <thead>\n",
|
428 |
+
" <tr style=\"text-align: right;\">\n",
|
429 |
+
" <th></th>\n",
|
430 |
+
" <th>Country</th>\n",
|
431 |
+
" <th>Age</th>\n",
|
432 |
+
" <th>Salary</th>\n",
|
433 |
+
" <th>Purchased</th>\n",
|
434 |
+
" </tr>\n",
|
435 |
+
" </thead>\n",
|
436 |
+
" <tbody>\n",
|
437 |
+
" <tr>\n",
|
438 |
+
" <th>0</th>\n",
|
439 |
+
" <td>France</td>\n",
|
440 |
+
" <td>44.0</td>\n",
|
441 |
+
" <td>27000.0</td>\n",
|
442 |
+
" <td>No</td>\n",
|
443 |
+
" </tr>\n",
|
444 |
+
" <tr>\n",
|
445 |
+
" <th>1</th>\n",
|
446 |
+
" <td>Spain</td>\n",
|
447 |
+
" <td>27.0</td>\n",
|
448 |
+
" <td>48000.0</td>\n",
|
449 |
+
" <td>Yes</td>\n",
|
450 |
+
" </tr>\n",
|
451 |
+
" <tr>\n",
|
452 |
+
" <th>2</th>\n",
|
453 |
+
" <td>Germany</td>\n",
|
454 |
+
" <td>30.0</td>\n",
|
455 |
+
" <td>54000.0</td>\n",
|
456 |
+
" <td>No</td>\n",
|
457 |
+
" </tr>\n",
|
458 |
+
" <tr>\n",
|
459 |
+
" <th>3</th>\n",
|
460 |
+
" <td>Spain</td>\n",
|
461 |
+
" <td>38.0</td>\n",
|
462 |
+
" <td>61000.0</td>\n",
|
463 |
+
" <td>No</td>\n",
|
464 |
+
" </tr>\n",
|
465 |
+
" <tr>\n",
|
466 |
+
" <th>4</th>\n",
|
467 |
+
" <td>Germany</td>\n",
|
468 |
+
" <td>40.0</td>\n",
|
469 |
+
" <td>NaN</td>\n",
|
470 |
+
" <td>Yes</td>\n",
|
471 |
+
" </tr>\n",
|
472 |
+
" <tr>\n",
|
473 |
+
" <th>5</th>\n",
|
474 |
+
" <td>France</td>\n",
|
475 |
+
" <td>35.0</td>\n",
|
476 |
+
" <td>58000.0</td>\n",
|
477 |
+
" <td>Yes</td>\n",
|
478 |
+
" </tr>\n",
|
479 |
+
" <tr>\n",
|
480 |
+
" <th>6</th>\n",
|
481 |
+
" <td>Spain</td>\n",
|
482 |
+
" <td>NaN</td>\n",
|
483 |
+
" <td>52000.0</td>\n",
|
484 |
+
" <td>No</td>\n",
|
485 |
+
" </tr>\n",
|
486 |
+
" <tr>\n",
|
487 |
+
" <th>7</th>\n",
|
488 |
+
" <td>France</td>\n",
|
489 |
+
" <td>48.0</td>\n",
|
490 |
+
" <td>79000.0</td>\n",
|
491 |
+
" <td>Yes</td>\n",
|
492 |
+
" </tr>\n",
|
493 |
+
" <tr>\n",
|
494 |
+
" <th>8</th>\n",
|
495 |
+
" <td>Germany</td>\n",
|
496 |
+
" <td>50.0</td>\n",
|
497 |
+
" <td>83000.0</td>\n",
|
498 |
+
" <td>No</td>\n",
|
499 |
+
" </tr>\n",
|
500 |
+
" <tr>\n",
|
501 |
+
" <th>9</th>\n",
|
502 |
+
" <td>France</td>\n",
|
503 |
+
" <td>37.0</td>\n",
|
504 |
+
" <td>67000.0</td>\n",
|
505 |
+
" <td>Yes</td>\n",
|
506 |
+
" </tr>\n",
|
507 |
+
" </tbody>\n",
|
508 |
+
"</table>\n",
|
509 |
+
"</div>"
|
510 |
+
],
|
511 |
+
"text/plain": [
|
512 |
+
" Country Age Salary Purchased\n",
|
513 |
+
"0 France 44.0 27000.0 No\n",
|
514 |
+
"1 Spain 27.0 48000.0 Yes\n",
|
515 |
+
"2 Germany 30.0 54000.0 No\n",
|
516 |
+
"3 Spain 38.0 61000.0 No\n",
|
517 |
+
"4 Germany 40.0 NaN Yes\n",
|
518 |
+
"5 France 35.0 58000.0 Yes\n",
|
519 |
+
"6 Spain NaN 52000.0 No\n",
|
520 |
+
"7 France 48.0 79000.0 Yes\n",
|
521 |
+
"8 Germany 50.0 83000.0 No\n",
|
522 |
+
"9 France 37.0 67000.0 Yes"
|
523 |
+
]
|
524 |
+
},
|
525 |
+
"execution_count": 8,
|
526 |
+
"metadata": {},
|
527 |
+
"output_type": "execute_result"
|
528 |
+
}
|
529 |
+
],
|
530 |
+
"source": [
|
531 |
+
"#Display the data from the dataframe\n",
|
532 |
+
"#NB: notices we have some values to NO values -> See question later in the notebook\n",
|
533 |
+
"data"
|
534 |
+
]
|
535 |
+
},
|
536 |
+
{
|
537 |
+
"cell_type": "code",
|
538 |
+
"execution_count": 9,
|
539 |
+
"metadata": {},
|
540 |
+
"outputs": [
|
541 |
+
{
|
542 |
+
"data": {
|
543 |
+
"text/html": [
|
544 |
+
"<div>\n",
|
545 |
+
"<style scoped>\n",
|
546 |
+
" .dataframe tbody tr th:only-of-type {\n",
|
547 |
+
" vertical-align: middle;\n",
|
548 |
+
" }\n",
|
549 |
+
"\n",
|
550 |
+
" .dataframe tbody tr th {\n",
|
551 |
+
" vertical-align: top;\n",
|
552 |
+
" }\n",
|
553 |
+
"\n",
|
554 |
+
" .dataframe thead th {\n",
|
555 |
+
" text-align: right;\n",
|
556 |
+
" }\n",
|
557 |
+
"</style>\n",
|
558 |
+
"<table border=\"1\" class=\"dataframe\">\n",
|
559 |
+
" <thead>\n",
|
560 |
+
" <tr style=\"text-align: right;\">\n",
|
561 |
+
" <th></th>\n",
|
562 |
+
" <th>Country</th>\n",
|
563 |
+
" <th>Age</th>\n",
|
564 |
+
" <th>Salary</th>\n",
|
565 |
+
" <th>Purchased</th>\n",
|
566 |
+
" </tr>\n",
|
567 |
+
" </thead>\n",
|
568 |
+
" <tbody>\n",
|
569 |
+
" <tr>\n",
|
570 |
+
" <th>0</th>\n",
|
571 |
+
" <td>France</td>\n",
|
572 |
+
" <td>0.739130</td>\n",
|
573 |
+
" <td>0.000000</td>\n",
|
574 |
+
" <td>No</td>\n",
|
575 |
+
" </tr>\n",
|
576 |
+
" <tr>\n",
|
577 |
+
" <th>1</th>\n",
|
578 |
+
" <td>Spain</td>\n",
|
579 |
+
" <td>0.000000</td>\n",
|
580 |
+
" <td>0.375000</td>\n",
|
581 |
+
" <td>Yes</td>\n",
|
582 |
+
" </tr>\n",
|
583 |
+
" <tr>\n",
|
584 |
+
" <th>2</th>\n",
|
585 |
+
" <td>Germany</td>\n",
|
586 |
+
" <td>0.130435</td>\n",
|
587 |
+
" <td>0.482143</td>\n",
|
588 |
+
" <td>No</td>\n",
|
589 |
+
" </tr>\n",
|
590 |
+
" <tr>\n",
|
591 |
+
" <th>3</th>\n",
|
592 |
+
" <td>Spain</td>\n",
|
593 |
+
" <td>0.478261</td>\n",
|
594 |
+
" <td>0.607143</td>\n",
|
595 |
+
" <td>No</td>\n",
|
596 |
+
" </tr>\n",
|
597 |
+
" <tr>\n",
|
598 |
+
" <th>4</th>\n",
|
599 |
+
" <td>Germany</td>\n",
|
600 |
+
" <td>0.565217</td>\n",
|
601 |
+
" <td>NaN</td>\n",
|
602 |
+
" <td>Yes</td>\n",
|
603 |
+
" </tr>\n",
|
604 |
+
" <tr>\n",
|
605 |
+
" <th>5</th>\n",
|
606 |
+
" <td>France</td>\n",
|
607 |
+
" <td>0.347826</td>\n",
|
608 |
+
" <td>0.553571</td>\n",
|
609 |
+
" <td>Yes</td>\n",
|
610 |
+
" </tr>\n",
|
611 |
+
" <tr>\n",
|
612 |
+
" <th>6</th>\n",
|
613 |
+
" <td>Spain</td>\n",
|
614 |
+
" <td>NaN</td>\n",
|
615 |
+
" <td>0.446429</td>\n",
|
616 |
+
" <td>No</td>\n",
|
617 |
+
" </tr>\n",
|
618 |
+
" <tr>\n",
|
619 |
+
" <th>7</th>\n",
|
620 |
+
" <td>France</td>\n",
|
621 |
+
" <td>0.913043</td>\n",
|
622 |
+
" <td>0.928571</td>\n",
|
623 |
+
" <td>Yes</td>\n",
|
624 |
+
" </tr>\n",
|
625 |
+
" <tr>\n",
|
626 |
+
" <th>8</th>\n",
|
627 |
+
" <td>Germany</td>\n",
|
628 |
+
" <td>1.000000</td>\n",
|
629 |
+
" <td>1.000000</td>\n",
|
630 |
+
" <td>No</td>\n",
|
631 |
+
" </tr>\n",
|
632 |
+
" <tr>\n",
|
633 |
+
" <th>9</th>\n",
|
634 |
+
" <td>France</td>\n",
|
635 |
+
" <td>0.434783</td>\n",
|
636 |
+
" <td>0.714286</td>\n",
|
637 |
+
" <td>Yes</td>\n",
|
638 |
+
" </tr>\n",
|
639 |
+
" </tbody>\n",
|
640 |
+
"</table>\n",
|
641 |
+
"</div>"
|
642 |
+
],
|
643 |
+
"text/plain": [
|
644 |
+
" Country Age Salary Purchased\n",
|
645 |
+
"0 France 0.739130 0.000000 No\n",
|
646 |
+
"1 Spain 0.000000 0.375000 Yes\n",
|
647 |
+
"2 Germany 0.130435 0.482143 No\n",
|
648 |
+
"3 Spain 0.478261 0.607143 No\n",
|
649 |
+
"4 Germany 0.565217 NaN Yes\n",
|
650 |
+
"5 France 0.347826 0.553571 Yes\n",
|
651 |
+
"6 Spain NaN 0.446429 No\n",
|
652 |
+
"7 France 0.913043 0.928571 Yes\n",
|
653 |
+
"8 Germany 1.000000 1.000000 No\n",
|
654 |
+
"9 France 0.434783 0.714286 Yes"
|
655 |
+
]
|
656 |
+
},
|
657 |
+
"execution_count": 9,
|
658 |
+
"metadata": {},
|
659 |
+
"output_type": "execute_result"
|
660 |
+
}
|
661 |
+
],
|
662 |
+
"source": [
|
663 |
+
"#Setup the MinMaxScaler\n",
|
664 |
+
"#This will only work on Numerical attributes/features\n",
|
665 |
+
"scaler = MinMaxScaler()\n",
|
666 |
+
"\n",
|
667 |
+
"#Apply the scaler to the numerical data\n",
|
668 |
+
"# and save the data back to the dataframe\n",
|
669 |
+
"# overwriting the original values\n",
|
670 |
+
"data[['Age', 'Salary']] = scaler.fit_transform(data[['Age', 'Salary']])\n",
|
671 |
+
"\n",
|
672 |
+
"#Display the dataframe\n",
|
673 |
+
"data"
|
674 |
+
]
|
675 |
+
},
|
676 |
+
{
|
677 |
+
"cell_type": "markdown",
|
678 |
+
"metadata": {},
|
679 |
+
"source": [
|
680 |
+
"### Copy and modify the above code to repace the NaN"
|
681 |
+
]
|
682 |
+
},
|
683 |
+
{
|
684 |
+
"cell_type": "markdown",
|
685 |
+
"metadata": {},
|
686 |
+
"source": [
|
687 |
+
"#### Modify the data set to replace the empty data (NaN) with an appropriate values\n",
|
688 |
+
"#### Rerun the scaler with this updated/modified dataframe\n"
|
689 |
+
]
|
690 |
+
},
|
691 |
+
{
|
692 |
+
"cell_type": "code",
|
693 |
+
"execution_count": null,
|
694 |
+
"metadata": {},
|
695 |
+
"outputs": [],
|
696 |
+
"source": []
|
697 |
+
},
|
698 |
+
{
|
699 |
+
"cell_type": "code",
|
700 |
+
"execution_count": null,
|
701 |
+
"metadata": {},
|
702 |
+
"outputs": [],
|
703 |
+
"source": []
|
704 |
+
},
|
705 |
+
{
|
706 |
+
"cell_type": "code",
|
707 |
+
"execution_count": null,
|
708 |
+
"metadata": {},
|
709 |
+
"outputs": [],
|
710 |
+
"source": []
|
711 |
+
},
|
712 |
+
{
|
713 |
+
"cell_type": "code",
|
714 |
+
"execution_count": null,
|
715 |
+
"metadata": {},
|
716 |
+
"outputs": [],
|
717 |
+
"source": []
|
718 |
+
},
|
719 |
+
{
|
720 |
+
"cell_type": "code",
|
721 |
+
"execution_count": null,
|
722 |
+
"metadata": {},
|
723 |
+
"outputs": [],
|
724 |
+
"source": []
|
725 |
+
},
|
726 |
+
{
|
727 |
+
"cell_type": "code",
|
728 |
+
"execution_count": null,
|
729 |
+
"metadata": {},
|
730 |
+
"outputs": [],
|
731 |
+
"source": [
|
732 |
+
"\n"
|
733 |
+
]
|
734 |
+
},
|
735 |
+
{
|
736 |
+
"cell_type": "code",
|
737 |
+
"execution_count": null,
|
738 |
+
"metadata": {},
|
739 |
+
"outputs": [],
|
740 |
+
"source": []
|
741 |
+
},
|
742 |
+
{
|
743 |
+
"cell_type": "markdown",
|
744 |
+
"metadata": {},
|
745 |
+
"source": [
|
746 |
+
"### Another Example and Exercise - Complete all the steps\n",
|
747 |
+
"\n",
|
748 |
+
"#### Data set = Pima Indian diabetes dataset \n",
|
749 |
+
"#### https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database"
|
750 |
+
]
|
751 |
+
},
|
752 |
+
{
|
753 |
+
"cell_type": "code",
|
754 |
+
"execution_count": 10,
|
755 |
+
"metadata": {},
|
756 |
+
"outputs": [
|
757 |
+
{
|
758 |
+
"data": {
|
759 |
+
"text/html": [
|
760 |
+
"<div>\n",
|
761 |
+
"<style scoped>\n",
|
762 |
+
" .dataframe tbody tr th:only-of-type {\n",
|
763 |
+
" vertical-align: middle;\n",
|
764 |
+
" }\n",
|
765 |
+
"\n",
|
766 |
+
" .dataframe tbody tr th {\n",
|
767 |
+
" vertical-align: top;\n",
|
768 |
+
" }\n",
|
769 |
+
"\n",
|
770 |
+
" .dataframe thead th {\n",
|
771 |
+
" text-align: right;\n",
|
772 |
+
" }\n",
|
773 |
+
"</style>\n",
|
774 |
+
"<table border=\"1\" class=\"dataframe\">\n",
|
775 |
+
" <thead>\n",
|
776 |
+
" <tr style=\"text-align: right;\">\n",
|
777 |
+
" <th></th>\n",
|
778 |
+
" <th>preg</th>\n",
|
779 |
+
" <th>plas</th>\n",
|
780 |
+
" <th>pres</th>\n",
|
781 |
+
" <th>skin</th>\n",
|
782 |
+
" <th>test</th>\n",
|
783 |
+
" <th>mass</th>\n",
|
784 |
+
" <th>pedi</th>\n",
|
785 |
+
" <th>age</th>\n",
|
786 |
+
" <th>class</th>\n",
|
787 |
+
" </tr>\n",
|
788 |
+
" </thead>\n",
|
789 |
+
" <tbody>\n",
|
790 |
+
" <tr>\n",
|
791 |
+
" <th>0</th>\n",
|
792 |
+
" <td>6</td>\n",
|
793 |
+
" <td>148</td>\n",
|
794 |
+
" <td>72</td>\n",
|
795 |
+
" <td>35</td>\n",
|
796 |
+
" <td>0</td>\n",
|
797 |
+
" <td>33.6</td>\n",
|
798 |
+
" <td>0.627</td>\n",
|
799 |
+
" <td>50</td>\n",
|
800 |
+
" <td>1</td>\n",
|
801 |
+
" </tr>\n",
|
802 |
+
" <tr>\n",
|
803 |
+
" <th>1</th>\n",
|
804 |
+
" <td>1</td>\n",
|
805 |
+
" <td>85</td>\n",
|
806 |
+
" <td>66</td>\n",
|
807 |
+
" <td>29</td>\n",
|
808 |
+
" <td>0</td>\n",
|
809 |
+
" <td>26.6</td>\n",
|
810 |
+
" <td>0.351</td>\n",
|
811 |
+
" <td>31</td>\n",
|
812 |
+
" <td>0</td>\n",
|
813 |
+
" </tr>\n",
|
814 |
+
" <tr>\n",
|
815 |
+
" <th>2</th>\n",
|
816 |
+
" <td>8</td>\n",
|
817 |
+
" <td>183</td>\n",
|
818 |
+
" <td>64</td>\n",
|
819 |
+
" <td>0</td>\n",
|
820 |
+
" <td>0</td>\n",
|
821 |
+
" <td>23.3</td>\n",
|
822 |
+
" <td>0.672</td>\n",
|
823 |
+
" <td>32</td>\n",
|
824 |
+
" <td>1</td>\n",
|
825 |
+
" </tr>\n",
|
826 |
+
" <tr>\n",
|
827 |
+
" <th>3</th>\n",
|
828 |
+
" <td>1</td>\n",
|
829 |
+
" <td>89</td>\n",
|
830 |
+
" <td>66</td>\n",
|
831 |
+
" <td>23</td>\n",
|
832 |
+
" <td>94</td>\n",
|
833 |
+
" <td>28.1</td>\n",
|
834 |
+
" <td>0.167</td>\n",
|
835 |
+
" <td>21</td>\n",
|
836 |
+
" <td>0</td>\n",
|
837 |
+
" </tr>\n",
|
838 |
+
" <tr>\n",
|
839 |
+
" <th>4</th>\n",
|
840 |
+
" <td>0</td>\n",
|
841 |
+
" <td>137</td>\n",
|
842 |
+
" <td>40</td>\n",
|
843 |
+
" <td>35</td>\n",
|
844 |
+
" <td>168</td>\n",
|
845 |
+
" <td>43.1</td>\n",
|
846 |
+
" <td>2.288</td>\n",
|
847 |
+
" <td>33</td>\n",
|
848 |
+
" <td>1</td>\n",
|
849 |
+
" </tr>\n",
|
850 |
+
" </tbody>\n",
|
851 |
+
"</table>\n",
|
852 |
+
"</div>"
|
853 |
+
],
|
854 |
+
"text/plain": [
|
855 |
+
" preg plas pres skin test mass pedi age class\n",
|
856 |
+
"0 6 148 72 35 0 33.6 0.627 50 1\n",
|
857 |
+
"1 1 85 66 29 0 26.6 0.351 31 0\n",
|
858 |
+
"2 8 183 64 0 0 23.3 0.672 32 1\n",
|
859 |
+
"3 1 89 66 23 94 28.1 0.167 21 0\n",
|
860 |
+
"4 0 137 40 35 168 43.1 2.288 33 1"
|
861 |
+
]
|
862 |
+
},
|
863 |
+
"execution_count": 10,
|
864 |
+
"metadata": {},
|
865 |
+
"output_type": "execute_result"
|
866 |
+
}
|
867 |
+
],
|
868 |
+
"source": [
|
869 |
+
"#Import the data set\n",
|
870 |
+
"\n",
|
871 |
+
"import pandas as pd\n",
|
872 |
+
"columns = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']\n",
|
873 |
+
"data = pd.read_csv('/Users/brendan.tierney/Dropbox/4-Datasets/pima-indians-diabetes.csv', names=columns)\n",
|
874 |
+
"data.head()"
|
875 |
+
]
|
876 |
+
},
|
877 |
+
{
|
878 |
+
"cell_type": "code",
|
879 |
+
"execution_count": 11,
|
880 |
+
"metadata": {},
|
881 |
+
"outputs": [
|
882 |
+
{
|
883 |
+
"data": {
|
884 |
+
"text/html": [
|
885 |
+
"<div>\n",
|
886 |
+
"<style scoped>\n",
|
887 |
+
" .dataframe tbody tr th:only-of-type {\n",
|
888 |
+
" vertical-align: middle;\n",
|
889 |
+
" }\n",
|
890 |
+
"\n",
|
891 |
+
" .dataframe tbody tr th {\n",
|
892 |
+
" vertical-align: top;\n",
|
893 |
+
" }\n",
|
894 |
+
"\n",
|
895 |
+
" .dataframe thead th {\n",
|
896 |
+
" text-align: right;\n",
|
897 |
+
" }\n",
|
898 |
+
"</style>\n",
|
899 |
+
"<table border=\"1\" class=\"dataframe\">\n",
|
900 |
+
" <thead>\n",
|
901 |
+
" <tr style=\"text-align: right;\">\n",
|
902 |
+
" <th></th>\n",
|
903 |
+
" <th>preg</th>\n",
|
904 |
+
" <th>plas</th>\n",
|
905 |
+
" <th>pres</th>\n",
|
906 |
+
" <th>skin</th>\n",
|
907 |
+
" <th>test</th>\n",
|
908 |
+
" <th>mass</th>\n",
|
909 |
+
" <th>pedi</th>\n",
|
910 |
+
" <th>age</th>\n",
|
911 |
+
" <th>class</th>\n",
|
912 |
+
" </tr>\n",
|
913 |
+
" </thead>\n",
|
914 |
+
" <tbody>\n",
|
915 |
+
" <tr>\n",
|
916 |
+
" <th>count</th>\n",
|
917 |
+
" <td>768.000000</td>\n",
|
918 |
+
" <td>768.000000</td>\n",
|
919 |
+
" <td>768.000000</td>\n",
|
920 |
+
" <td>768.000000</td>\n",
|
921 |
+
" <td>768.000000</td>\n",
|
922 |
+
" <td>768.000000</td>\n",
|
923 |
+
" <td>768.000000</td>\n",
|
924 |
+
" <td>768.000000</td>\n",
|
925 |
+
" <td>768.000000</td>\n",
|
926 |
+
" </tr>\n",
|
927 |
+
" <tr>\n",
|
928 |
+
" <th>mean</th>\n",
|
929 |
+
" <td>3.845052</td>\n",
|
930 |
+
" <td>120.894531</td>\n",
|
931 |
+
" <td>69.105469</td>\n",
|
932 |
+
" <td>20.536458</td>\n",
|
933 |
+
" <td>79.799479</td>\n",
|
934 |
+
" <td>31.992578</td>\n",
|
935 |
+
" <td>0.471876</td>\n",
|
936 |
+
" <td>33.240885</td>\n",
|
937 |
+
" <td>0.348958</td>\n",
|
938 |
+
" </tr>\n",
|
939 |
+
" <tr>\n",
|
940 |
+
" <th>std</th>\n",
|
941 |
+
" <td>3.369578</td>\n",
|
942 |
+
" <td>31.972618</td>\n",
|
943 |
+
" <td>19.355807</td>\n",
|
944 |
+
" <td>15.952218</td>\n",
|
945 |
+
" <td>115.244002</td>\n",
|
946 |
+
" <td>7.884160</td>\n",
|
947 |
+
" <td>0.331329</td>\n",
|
948 |
+
" <td>11.760232</td>\n",
|
949 |
+
" <td>0.476951</td>\n",
|
950 |
+
" </tr>\n",
|
951 |
+
" <tr>\n",
|
952 |
+
" <th>min</th>\n",
|
953 |
+
" <td>0.000000</td>\n",
|
954 |
+
" <td>0.000000</td>\n",
|
955 |
+
" <td>0.000000</td>\n",
|
956 |
+
" <td>0.000000</td>\n",
|
957 |
+
" <td>0.000000</td>\n",
|
958 |
+
" <td>0.000000</td>\n",
|
959 |
+
" <td>0.078000</td>\n",
|
960 |
+
" <td>21.000000</td>\n",
|
961 |
+
" <td>0.000000</td>\n",
|
962 |
+
" </tr>\n",
|
963 |
+
" <tr>\n",
|
964 |
+
" <th>25%</th>\n",
|
965 |
+
" <td>1.000000</td>\n",
|
966 |
+
" <td>99.000000</td>\n",
|
967 |
+
" <td>62.000000</td>\n",
|
968 |
+
" <td>0.000000</td>\n",
|
969 |
+
" <td>0.000000</td>\n",
|
970 |
+
" <td>27.300000</td>\n",
|
971 |
+
" <td>0.243750</td>\n",
|
972 |
+
" <td>24.000000</td>\n",
|
973 |
+
" <td>0.000000</td>\n",
|
974 |
+
" </tr>\n",
|
975 |
+
" <tr>\n",
|
976 |
+
" <th>50%</th>\n",
|
977 |
+
" <td>3.000000</td>\n",
|
978 |
+
" <td>117.000000</td>\n",
|
979 |
+
" <td>72.000000</td>\n",
|
980 |
+
" <td>23.000000</td>\n",
|
981 |
+
" <td>30.500000</td>\n",
|
982 |
+
" <td>32.000000</td>\n",
|
983 |
+
" <td>0.372500</td>\n",
|
984 |
+
" <td>29.000000</td>\n",
|
985 |
+
" <td>0.000000</td>\n",
|
986 |
+
" </tr>\n",
|
987 |
+
" <tr>\n",
|
988 |
+
" <th>75%</th>\n",
|
989 |
+
" <td>6.000000</td>\n",
|
990 |
+
" <td>140.250000</td>\n",
|
991 |
+
" <td>80.000000</td>\n",
|
992 |
+
" <td>32.000000</td>\n",
|
993 |
+
" <td>127.250000</td>\n",
|
994 |
+
" <td>36.600000</td>\n",
|
995 |
+
" <td>0.626250</td>\n",
|
996 |
+
" <td>41.000000</td>\n",
|
997 |
+
" <td>1.000000</td>\n",
|
998 |
+
" </tr>\n",
|
999 |
+
" <tr>\n",
|
1000 |
+
" <th>max</th>\n",
|
1001 |
+
" <td>17.000000</td>\n",
|
1002 |
+
" <td>199.000000</td>\n",
|
1003 |
+
" <td>122.000000</td>\n",
|
1004 |
+
" <td>99.000000</td>\n",
|
1005 |
+
" <td>846.000000</td>\n",
|
1006 |
+
" <td>67.100000</td>\n",
|
1007 |
+
" <td>2.420000</td>\n",
|
1008 |
+
" <td>81.000000</td>\n",
|
1009 |
+
" <td>1.000000</td>\n",
|
1010 |
+
" </tr>\n",
|
1011 |
+
" </tbody>\n",
|
1012 |
+
"</table>\n",
|
1013 |
+
"</div>"
|
1014 |
+
],
|
1015 |
+
"text/plain": [
|
1016 |
+
" preg plas pres skin test mass \\\n",
|
1017 |
+
"count 768.000000 768.000000 768.000000 768.000000 768.000000 768.000000 \n",
|
1018 |
+
"mean 3.845052 120.894531 69.105469 20.536458 79.799479 31.992578 \n",
|
1019 |
+
"std 3.369578 31.972618 19.355807 15.952218 115.244002 7.884160 \n",
|
1020 |
+
"min 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 \n",
|
1021 |
+
"25% 1.000000 99.000000 62.000000 0.000000 0.000000 27.300000 \n",
|
1022 |
+
"50% 3.000000 117.000000 72.000000 23.000000 30.500000 32.000000 \n",
|
1023 |
+
"75% 6.000000 140.250000 80.000000 32.000000 127.250000 36.600000 \n",
|
1024 |
+
"max 17.000000 199.000000 122.000000 99.000000 846.000000 67.100000 \n",
|
1025 |
+
"\n",
|
1026 |
+
" pedi age class \n",
|
1027 |
+
"count 768.000000 768.000000 768.000000 \n",
|
1028 |
+
"mean 0.471876 33.240885 0.348958 \n",
|
1029 |
+
"std 0.331329 11.760232 0.476951 \n",
|
1030 |
+
"min 0.078000 21.000000 0.000000 \n",
|
1031 |
+
"25% 0.243750 24.000000 0.000000 \n",
|
1032 |
+
"50% 0.372500 29.000000 0.000000 \n",
|
1033 |
+
"75% 0.626250 41.000000 1.000000 \n",
|
1034 |
+
"max 2.420000 81.000000 1.000000 "
|
1035 |
+
]
|
1036 |
+
},
|
1037 |
+
"execution_count": 11,
|
1038 |
+
"metadata": {},
|
1039 |
+
"output_type": "execute_result"
|
1040 |
+
}
|
1041 |
+
],
|
1042 |
+
"source": [
|
1043 |
+
"#Create the summary statistics\n",
|
1044 |
+
"data.describe()"
|
1045 |
+
]
|
1046 |
+
},
|
1047 |
+
{
|
1048 |
+
"cell_type": "code",
|
1049 |
+
"execution_count": 12,
|
1050 |
+
"metadata": {},
|
1051 |
+
"outputs": [
|
1052 |
+
{
|
1053 |
+
"data": {
|
1054 |
+
"text/html": [
|
1055 |
+
"<div>\n",
|
1056 |
+
"<style scoped>\n",
|
1057 |
+
" .dataframe tbody tr th:only-of-type {\n",
|
1058 |
+
" vertical-align: middle;\n",
|
1059 |
+
" }\n",
|
1060 |
+
"\n",
|
1061 |
+
" .dataframe tbody tr th {\n",
|
1062 |
+
" vertical-align: top;\n",
|
1063 |
+
" }\n",
|
1064 |
+
"\n",
|
1065 |
+
" .dataframe thead th {\n",
|
1066 |
+
" text-align: right;\n",
|
1067 |
+
" }\n",
|
1068 |
+
"</style>\n",
|
1069 |
+
"<table border=\"1\" class=\"dataframe\">\n",
|
1070 |
+
" <thead>\n",
|
1071 |
+
" <tr style=\"text-align: right;\">\n",
|
1072 |
+
" <th></th>\n",
|
1073 |
+
" <th>count</th>\n",
|
1074 |
+
" <th>mean</th>\n",
|
1075 |
+
" <th>std</th>\n",
|
1076 |
+
" <th>min</th>\n",
|
1077 |
+
" <th>25%</th>\n",
|
1078 |
+
" <th>50%</th>\n",
|
1079 |
+
" <th>75%</th>\n",
|
1080 |
+
" <th>max</th>\n",
|
1081 |
+
" </tr>\n",
|
1082 |
+
" </thead>\n",
|
1083 |
+
" <tbody>\n",
|
1084 |
+
" <tr>\n",
|
1085 |
+
" <th>preg</th>\n",
|
1086 |
+
" <td>768.0</td>\n",
|
1087 |
+
" <td>3.845052</td>\n",
|
1088 |
+
" <td>3.369578</td>\n",
|
1089 |
+
" <td>0.000</td>\n",
|
1090 |
+
" <td>1.00000</td>\n",
|
1091 |
+
" <td>3.0000</td>\n",
|
1092 |
+
" <td>6.00000</td>\n",
|
1093 |
+
" <td>17.00</td>\n",
|
1094 |
+
" </tr>\n",
|
1095 |
+
" <tr>\n",
|
1096 |
+
" <th>plas</th>\n",
|
1097 |
+
" <td>768.0</td>\n",
|
1098 |
+
" <td>120.894531</td>\n",
|
1099 |
+
" <td>31.972618</td>\n",
|
1100 |
+
" <td>0.000</td>\n",
|
1101 |
+
" <td>99.00000</td>\n",
|
1102 |
+
" <td>117.0000</td>\n",
|
1103 |
+
" <td>140.25000</td>\n",
|
1104 |
+
" <td>199.00</td>\n",
|
1105 |
+
" </tr>\n",
|
1106 |
+
" <tr>\n",
|
1107 |
+
" <th>pres</th>\n",
|
1108 |
+
" <td>768.0</td>\n",
|
1109 |
+
" <td>69.105469</td>\n",
|
1110 |
+
" <td>19.355807</td>\n",
|
1111 |
+
" <td>0.000</td>\n",
|
1112 |
+
" <td>62.00000</td>\n",
|
1113 |
+
" <td>72.0000</td>\n",
|
1114 |
+
" <td>80.00000</td>\n",
|
1115 |
+
" <td>122.00</td>\n",
|
1116 |
+
" </tr>\n",
|
1117 |
+
" <tr>\n",
|
1118 |
+
" <th>skin</th>\n",
|
1119 |
+
" <td>768.0</td>\n",
|
1120 |
+
" <td>20.536458</td>\n",
|
1121 |
+
" <td>15.952218</td>\n",
|
1122 |
+
" <td>0.000</td>\n",
|
1123 |
+
" <td>0.00000</td>\n",
|
1124 |
+
" <td>23.0000</td>\n",
|
1125 |
+
" <td>32.00000</td>\n",
|
1126 |
+
" <td>99.00</td>\n",
|
1127 |
+
" </tr>\n",
|
1128 |
+
" <tr>\n",
|
1129 |
+
" <th>test</th>\n",
|
1130 |
+
" <td>768.0</td>\n",
|
1131 |
+
" <td>79.799479</td>\n",
|
1132 |
+
" <td>115.244002</td>\n",
|
1133 |
+
" <td>0.000</td>\n",
|
1134 |
+
" <td>0.00000</td>\n",
|
1135 |
+
" <td>30.5000</td>\n",
|
1136 |
+
" <td>127.25000</td>\n",
|
1137 |
+
" <td>846.00</td>\n",
|
1138 |
+
" </tr>\n",
|
1139 |
+
" <tr>\n",
|
1140 |
+
" <th>mass</th>\n",
|
1141 |
+
" <td>768.0</td>\n",
|
1142 |
+
" <td>31.992578</td>\n",
|
1143 |
+
" <td>7.884160</td>\n",
|
1144 |
+
" <td>0.000</td>\n",
|
1145 |
+
" <td>27.30000</td>\n",
|
1146 |
+
" <td>32.0000</td>\n",
|
1147 |
+
" <td>36.60000</td>\n",
|
1148 |
+
" <td>67.10</td>\n",
|
1149 |
+
" </tr>\n",
|
1150 |
+
" <tr>\n",
|
1151 |
+
" <th>pedi</th>\n",
|
1152 |
+
" <td>768.0</td>\n",
|
1153 |
+
" <td>0.471876</td>\n",
|
1154 |
+
" <td>0.331329</td>\n",
|
1155 |
+
" <td>0.078</td>\n",
|
1156 |
+
" <td>0.24375</td>\n",
|
1157 |
+
" <td>0.3725</td>\n",
|
1158 |
+
" <td>0.62625</td>\n",
|
1159 |
+
" <td>2.42</td>\n",
|
1160 |
+
" </tr>\n",
|
1161 |
+
" <tr>\n",
|
1162 |
+
" <th>age</th>\n",
|
1163 |
+
" <td>768.0</td>\n",
|
1164 |
+
" <td>33.240885</td>\n",
|
1165 |
+
" <td>11.760232</td>\n",
|
1166 |
+
" <td>21.000</td>\n",
|
1167 |
+
" <td>24.00000</td>\n",
|
1168 |
+
" <td>29.0000</td>\n",
|
1169 |
+
" <td>41.00000</td>\n",
|
1170 |
+
" <td>81.00</td>\n",
|
1171 |
+
" </tr>\n",
|
1172 |
+
" <tr>\n",
|
1173 |
+
" <th>class</th>\n",
|
1174 |
+
" <td>768.0</td>\n",
|
1175 |
+
" <td>0.348958</td>\n",
|
1176 |
+
" <td>0.476951</td>\n",
|
1177 |
+
" <td>0.000</td>\n",
|
1178 |
+
" <td>0.00000</td>\n",
|
1179 |
+
" <td>0.0000</td>\n",
|
1180 |
+
" <td>1.00000</td>\n",
|
1181 |
+
" <td>1.00</td>\n",
|
1182 |
+
" </tr>\n",
|
1183 |
+
" </tbody>\n",
|
1184 |
+
"</table>\n",
|
1185 |
+
"</div>"
|
1186 |
+
],
|
1187 |
+
"text/plain": [
|
1188 |
+
" count mean std min 25% 50% 75% \\\n",
|
1189 |
+
"preg 768.0 3.845052 3.369578 0.000 1.00000 3.0000 6.00000 \n",
|
1190 |
+
"plas 768.0 120.894531 31.972618 0.000 99.00000 117.0000 140.25000 \n",
|
1191 |
+
"pres 768.0 69.105469 19.355807 0.000 62.00000 72.0000 80.00000 \n",
|
1192 |
+
"skin 768.0 20.536458 15.952218 0.000 0.00000 23.0000 32.00000 \n",
|
1193 |
+
"test 768.0 79.799479 115.244002 0.000 0.00000 30.5000 127.25000 \n",
|
1194 |
+
"mass 768.0 31.992578 7.884160 0.000 27.30000 32.0000 36.60000 \n",
|
1195 |
+
"pedi 768.0 0.471876 0.331329 0.078 0.24375 0.3725 0.62625 \n",
|
1196 |
+
"age 768.0 33.240885 11.760232 21.000 24.00000 29.0000 41.00000 \n",
|
1197 |
+
"class 768.0 0.348958 0.476951 0.000 0.00000 0.0000 1.00000 \n",
|
1198 |
+
"\n",
|
1199 |
+
" max \n",
|
1200 |
+
"preg 17.00 \n",
|
1201 |
+
"plas 199.00 \n",
|
1202 |
+
"pres 122.00 \n",
|
1203 |
+
"skin 99.00 \n",
|
1204 |
+
"test 846.00 \n",
|
1205 |
+
"mass 67.10 \n",
|
1206 |
+
"pedi 2.42 \n",
|
1207 |
+
"age 81.00 \n",
|
1208 |
+
"class 1.00 "
|
1209 |
+
]
|
1210 |
+
},
|
1211 |
+
"execution_count": 12,
|
1212 |
+
"metadata": {},
|
1213 |
+
"output_type": "execute_result"
|
1214 |
+
}
|
1215 |
+
],
|
1216 |
+
"source": [
|
1217 |
+
"data.describe().transpose()"
|
1218 |
+
]
|
1219 |
+
},
|
1220 |
+
{
|
1221 |
+
"cell_type": "code",
|
1222 |
+
"execution_count": 13,
|
1223 |
+
"metadata": {},
|
1224 |
+
"outputs": [
|
1225 |
+
{
|
1226 |
+
"data": {
|
1227 |
+
"image/png": "\n",
|
1228 |
+
"text/plain": [
|
1229 |
+
"<Figure size 864x2160 with 28 Axes>"
|
1230 |
+
]
|
1231 |
+
},
|
1232 |
+
"metadata": {
|
1233 |
+
"needs_background": "light"
|
1234 |
+
},
|
1235 |
+
"output_type": "display_data"
|
1236 |
+
}
|
1237 |
+
],
|
1238 |
+
"source": [
|
1239 |
+
"data[columns].hist(stacked=False, bins=100, figsize=(12,30), layout=(14,2));"
|
1240 |
+
]
|
1241 |
+
},
|
1242 |
+
{
|
1243 |
+
"cell_type": "code",
|
1244 |
+
"execution_count": 14,
|
1245 |
+
"metadata": {},
|
1246 |
+
"outputs": [
|
1247 |
+
{
|
1248 |
+
"data": {
|
1249 |
+
"text/html": [
|
1250 |
+
"<div>\n",
|
1251 |
+
"<style scoped>\n",
|
1252 |
+
" .dataframe tbody tr th:only-of-type {\n",
|
1253 |
+
" vertical-align: middle;\n",
|
1254 |
+
" }\n",
|
1255 |
+
"\n",
|
1256 |
+
" .dataframe tbody tr th {\n",
|
1257 |
+
" vertical-align: top;\n",
|
1258 |
+
" }\n",
|
1259 |
+
"\n",
|
1260 |
+
" .dataframe thead th {\n",
|
1261 |
+
" text-align: right;\n",
|
1262 |
+
" }\n",
|
1263 |
+
"</style>\n",
|
1264 |
+
"<table border=\"1\" class=\"dataframe\">\n",
|
1265 |
+
" <thead>\n",
|
1266 |
+
" <tr style=\"text-align: right;\">\n",
|
1267 |
+
" <th></th>\n",
|
1268 |
+
" <th>preg</th>\n",
|
1269 |
+
" <th>plas</th>\n",
|
1270 |
+
" <th>pres</th>\n",
|
1271 |
+
" <th>skin</th>\n",
|
1272 |
+
" <th>test</th>\n",
|
1273 |
+
" <th>mass</th>\n",
|
1274 |
+
" <th>pedi</th>\n",
|
1275 |
+
" <th>age</th>\n",
|
1276 |
+
" </tr>\n",
|
1277 |
+
" </thead>\n",
|
1278 |
+
" <tbody>\n",
|
1279 |
+
" <tr>\n",
|
1280 |
+
" <th>0</th>\n",
|
1281 |
+
" <td>6</td>\n",
|
1282 |
+
" <td>148</td>\n",
|
1283 |
+
" <td>72</td>\n",
|
1284 |
+
" <td>35</td>\n",
|
1285 |
+
" <td>0</td>\n",
|
1286 |
+
" <td>33.6</td>\n",
|
1287 |
+
" <td>0.627</td>\n",
|
1288 |
+
" <td>50</td>\n",
|
1289 |
+
" </tr>\n",
|
1290 |
+
" <tr>\n",
|
1291 |
+
" <th>1</th>\n",
|
1292 |
+
" <td>1</td>\n",
|
1293 |
+
" <td>85</td>\n",
|
1294 |
+
" <td>66</td>\n",
|
1295 |
+
" <td>29</td>\n",
|
1296 |
+
" <td>0</td>\n",
|
1297 |
+
" <td>26.6</td>\n",
|
1298 |
+
" <td>0.351</td>\n",
|
1299 |
+
" <td>31</td>\n",
|
1300 |
+
" </tr>\n",
|
1301 |
+
" <tr>\n",
|
1302 |
+
" <th>2</th>\n",
|
1303 |
+
" <td>8</td>\n",
|
1304 |
+
" <td>183</td>\n",
|
1305 |
+
" <td>64</td>\n",
|
1306 |
+
" <td>0</td>\n",
|
1307 |
+
" <td>0</td>\n",
|
1308 |
+
" <td>23.3</td>\n",
|
1309 |
+
" <td>0.672</td>\n",
|
1310 |
+
" <td>32</td>\n",
|
1311 |
+
" </tr>\n",
|
1312 |
+
" <tr>\n",
|
1313 |
+
" <th>3</th>\n",
|
1314 |
+
" <td>1</td>\n",
|
1315 |
+
" <td>89</td>\n",
|
1316 |
+
" <td>66</td>\n",
|
1317 |
+
" <td>23</td>\n",
|
1318 |
+
" <td>94</td>\n",
|
1319 |
+
" <td>28.1</td>\n",
|
1320 |
+
" <td>0.167</td>\n",
|
1321 |
+
" <td>21</td>\n",
|
1322 |
+
" </tr>\n",
|
1323 |
+
" <tr>\n",
|
1324 |
+
" <th>4</th>\n",
|
1325 |
+
" <td>0</td>\n",
|
1326 |
+
" <td>137</td>\n",
|
1327 |
+
" <td>40</td>\n",
|
1328 |
+
" <td>35</td>\n",
|
1329 |
+
" <td>168</td>\n",
|
1330 |
+
" <td>43.1</td>\n",
|
1331 |
+
" <td>2.288</td>\n",
|
1332 |
+
" <td>33</td>\n",
|
1333 |
+
" </tr>\n",
|
1334 |
+
" </tbody>\n",
|
1335 |
+
"</table>\n",
|
1336 |
+
"</div>"
|
1337 |
+
],
|
1338 |
+
"text/plain": [
|
1339 |
+
" preg plas pres skin test mass pedi age\n",
|
1340 |
+
"0 6 148 72 35 0 33.6 0.627 50\n",
|
1341 |
+
"1 1 85 66 29 0 26.6 0.351 31\n",
|
1342 |
+
"2 8 183 64 0 0 23.3 0.672 32\n",
|
1343 |
+
"3 1 89 66 23 94 28.1 0.167 21\n",
|
1344 |
+
"4 0 137 40 35 168 43.1 2.288 33"
|
1345 |
+
]
|
1346 |
+
},
|
1347 |
+
"execution_count": 14,
|
1348 |
+
"metadata": {},
|
1349 |
+
"output_type": "execute_result"
|
1350 |
+
}
|
1351 |
+
],
|
1352 |
+
"source": [
|
1353 |
+
"#The data set contains a Class attribute. \n",
|
1354 |
+
"#This is an indicator variable that is non-descriptive and only indicates if the \n",
|
1355 |
+
"# descriptive data indicates a particular event\n",
|
1356 |
+
"\n",
|
1357 |
+
"#Let's separate the data to into 2 dataframes.\n",
|
1358 |
+
"# - The first will contain the descriptive attributes\n",
|
1359 |
+
"# - The second will contain the indication attribute\n",
|
1360 |
+
"\n",
|
1361 |
+
"#Create a new dataframe (X) to contain the descriptive attributes, droping the indicitor attribute\n",
|
1362 |
+
"X = data.drop('class', axis=1)\n",
|
1363 |
+
"\n",
|
1364 |
+
"#Create a new dataframe (Y) to only contain the indicator attribute\n",
|
1365 |
+
"Y = data['class']\n",
|
1366 |
+
"X.head()"
|
1367 |
+
]
|
1368 |
+
},
|
1369 |
+
{
|
1370 |
+
"cell_type": "code",
|
1371 |
+
"execution_count": 15,
|
1372 |
+
"metadata": {},
|
1373 |
+
"outputs": [
|
1374 |
+
{
|
1375 |
+
"data": {
|
1376 |
+
"text/html": [
|
1377 |
+
"<div>\n",
|
1378 |
+
"<style scoped>\n",
|
1379 |
+
" .dataframe tbody tr th:only-of-type {\n",
|
1380 |
+
" vertical-align: middle;\n",
|
1381 |
+
" }\n",
|
1382 |
+
"\n",
|
1383 |
+
" .dataframe tbody tr th {\n",
|
1384 |
+
" vertical-align: top;\n",
|
1385 |
+
" }\n",
|
1386 |
+
"\n",
|
1387 |
+
" .dataframe thead th {\n",
|
1388 |
+
" text-align: right;\n",
|
1389 |
+
" }\n",
|
1390 |
+
"</style>\n",
|
1391 |
+
"<table border=\"1\" class=\"dataframe\">\n",
|
1392 |
+
" <thead>\n",
|
1393 |
+
" <tr style=\"text-align: right;\">\n",
|
1394 |
+
" <th></th>\n",
|
1395 |
+
" <th>preg</th>\n",
|
1396 |
+
" <th>plas</th>\n",
|
1397 |
+
" <th>pres</th>\n",
|
1398 |
+
" <th>skin</th>\n",
|
1399 |
+
" <th>test</th>\n",
|
1400 |
+
" <th>mass</th>\n",
|
1401 |
+
" <th>pedi</th>\n",
|
1402 |
+
" <th>age</th>\n",
|
1403 |
+
" </tr>\n",
|
1404 |
+
" </thead>\n",
|
1405 |
+
" <tbody>\n",
|
1406 |
+
" <tr>\n",
|
1407 |
+
" <th>0</th>\n",
|
1408 |
+
" <td>0.352941</td>\n",
|
1409 |
+
" <td>0.743719</td>\n",
|
1410 |
+
" <td>0.590164</td>\n",
|
1411 |
+
" <td>0.353535</td>\n",
|
1412 |
+
" <td>0.000000</td>\n",
|
1413 |
+
" <td>0.500745</td>\n",
|
1414 |
+
" <td>0.234415</td>\n",
|
1415 |
+
" <td>0.483333</td>\n",
|
1416 |
+
" </tr>\n",
|
1417 |
+
" <tr>\n",
|
1418 |
+
" <th>1</th>\n",
|
1419 |
+
" <td>0.058824</td>\n",
|
1420 |
+
" <td>0.427136</td>\n",
|
1421 |
+
" <td>0.540984</td>\n",
|
1422 |
+
" <td>0.292929</td>\n",
|
1423 |
+
" <td>0.000000</td>\n",
|
1424 |
+
" <td>0.396423</td>\n",
|
1425 |
+
" <td>0.116567</td>\n",
|
1426 |
+
" <td>0.166667</td>\n",
|
1427 |
+
" </tr>\n",
|
1428 |
+
" <tr>\n",
|
1429 |
+
" <th>2</th>\n",
|
1430 |
+
" <td>0.470588</td>\n",
|
1431 |
+
" <td>0.919598</td>\n",
|
1432 |
+
" <td>0.524590</td>\n",
|
1433 |
+
" <td>0.000000</td>\n",
|
1434 |
+
" <td>0.000000</td>\n",
|
1435 |
+
" <td>0.347243</td>\n",
|
1436 |
+
" <td>0.253629</td>\n",
|
1437 |
+
" <td>0.183333</td>\n",
|
1438 |
+
" </tr>\n",
|
1439 |
+
" <tr>\n",
|
1440 |
+
" <th>3</th>\n",
|
1441 |
+
" <td>0.058824</td>\n",
|
1442 |
+
" <td>0.447236</td>\n",
|
1443 |
+
" <td>0.540984</td>\n",
|
1444 |
+
" <td>0.232323</td>\n",
|
1445 |
+
" <td>0.111111</td>\n",
|
1446 |
+
" <td>0.418778</td>\n",
|
1447 |
+
" <td>0.038002</td>\n",
|
1448 |
+
" <td>0.000000</td>\n",
|
1449 |
+
" </tr>\n",
|
1450 |
+
" <tr>\n",
|
1451 |
+
" <th>4</th>\n",
|
1452 |
+
" <td>0.000000</td>\n",
|
1453 |
+
" <td>0.688442</td>\n",
|
1454 |
+
" <td>0.327869</td>\n",
|
1455 |
+
" <td>0.353535</td>\n",
|
1456 |
+
" <td>0.198582</td>\n",
|
1457 |
+
" <td>0.642325</td>\n",
|
1458 |
+
" <td>0.943638</td>\n",
|
1459 |
+
" <td>0.200000</td>\n",
|
1460 |
+
" </tr>\n",
|
1461 |
+
" </tbody>\n",
|
1462 |
+
"</table>\n",
|
1463 |
+
"</div>"
|
1464 |
+
],
|
1465 |
+
"text/plain": [
|
1466 |
+
" preg plas pres skin test mass pedi \\\n",
|
1467 |
+
"0 0.352941 0.743719 0.590164 0.353535 0.000000 0.500745 0.234415 \n",
|
1468 |
+
"1 0.058824 0.427136 0.540984 0.292929 0.000000 0.396423 0.116567 \n",
|
1469 |
+
"2 0.470588 0.919598 0.524590 0.000000 0.000000 0.347243 0.253629 \n",
|
1470 |
+
"3 0.058824 0.447236 0.540984 0.232323 0.111111 0.418778 0.038002 \n",
|
1471 |
+
"4 0.000000 0.688442 0.327869 0.353535 0.198582 0.642325 0.943638 \n",
|
1472 |
+
"\n",
|
1473 |
+
" age \n",
|
1474 |
+
"0 0.483333 \n",
|
1475 |
+
"1 0.166667 \n",
|
1476 |
+
"2 0.183333 \n",
|
1477 |
+
"3 0.000000 \n",
|
1478 |
+
"4 0.200000 "
|
1479 |
+
]
|
1480 |
+
},
|
1481 |
+
"execution_count": 15,
|
1482 |
+
"metadata": {},
|
1483 |
+
"output_type": "execute_result"
|
1484 |
+
}
|
1485 |
+
],
|
1486 |
+
"source": [
|
1487 |
+
"from sklearn.preprocessing import MinMaxScaler\n",
|
1488 |
+
"X_copy = X.copy() #We create a copy so we can still refer to the original dataframe later\n",
|
1489 |
+
"scaler = MinMaxScaler()\n",
|
1490 |
+
"#Create list of Columns to transform/scale\n",
|
1491 |
+
"X_columns = X.columns\n",
|
1492 |
+
"#Create a new dataframe\n",
|
1493 |
+
"X_scaled = pd.DataFrame(scaler.fit_transform(X_copy), columns=X_columns)\n",
|
1494 |
+
"X_scaled.head()"
|
1495 |
+
]
|
1496 |
+
},
|
1497 |
+
{
|
1498 |
+
"cell_type": "code",
|
1499 |
+
"execution_count": 16,
|
1500 |
+
"metadata": {},
|
1501 |
+
"outputs": [
|
1502 |
+
{
|
1503 |
+
"data": {
|
1504 |
+
"text/html": [
|
1505 |
+
"<div>\n",
|
1506 |
+
"<style scoped>\n",
|
1507 |
+
" .dataframe tbody tr th:only-of-type {\n",
|
1508 |
+
" vertical-align: middle;\n",
|
1509 |
+
" }\n",
|
1510 |
+
"\n",
|
1511 |
+
" .dataframe tbody tr th {\n",
|
1512 |
+
" vertical-align: top;\n",
|
1513 |
+
" }\n",
|
1514 |
+
"\n",
|
1515 |
+
" .dataframe thead th {\n",
|
1516 |
+
" text-align: right;\n",
|
1517 |
+
" }\n",
|
1518 |
+
"</style>\n",
|
1519 |
+
"<table border=\"1\" class=\"dataframe\">\n",
|
1520 |
+
" <thead>\n",
|
1521 |
+
" <tr style=\"text-align: right;\">\n",
|
1522 |
+
" <th></th>\n",
|
1523 |
+
" <th>preg</th>\n",
|
1524 |
+
" <th>plas</th>\n",
|
1525 |
+
" <th>pres</th>\n",
|
1526 |
+
" <th>skin</th>\n",
|
1527 |
+
" <th>test</th>\n",
|
1528 |
+
" <th>mass</th>\n",
|
1529 |
+
" <th>pedi</th>\n",
|
1530 |
+
" <th>age</th>\n",
|
1531 |
+
" <th>class</th>\n",
|
1532 |
+
" </tr>\n",
|
1533 |
+
" </thead>\n",
|
1534 |
+
" <tbody>\n",
|
1535 |
+
" <tr>\n",
|
1536 |
+
" <th>count</th>\n",
|
1537 |
+
" <td>768.000000</td>\n",
|
1538 |
+
" <td>768.000000</td>\n",
|
1539 |
+
" <td>768.000000</td>\n",
|
1540 |
+
" <td>768.000000</td>\n",
|
1541 |
+
" <td>768.000000</td>\n",
|
1542 |
+
" <td>768.000000</td>\n",
|
1543 |
+
" <td>768.000000</td>\n",
|
1544 |
+
" <td>768.000000</td>\n",
|
1545 |
+
" <td>768.000000</td>\n",
|
1546 |
+
" </tr>\n",
|
1547 |
+
" <tr>\n",
|
1548 |
+
" <th>mean</th>\n",
|
1549 |
+
" <td>3.845052</td>\n",
|
1550 |
+
" <td>120.894531</td>\n",
|
1551 |
+
" <td>69.105469</td>\n",
|
1552 |
+
" <td>20.536458</td>\n",
|
1553 |
+
" <td>79.799479</td>\n",
|
1554 |
+
" <td>31.992578</td>\n",
|
1555 |
+
" <td>0.471876</td>\n",
|
1556 |
+
" <td>33.240885</td>\n",
|
1557 |
+
" <td>0.348958</td>\n",
|
1558 |
+
" </tr>\n",
|
1559 |
+
" <tr>\n",
|
1560 |
+
" <th>std</th>\n",
|
1561 |
+
" <td>3.369578</td>\n",
|
1562 |
+
" <td>31.972618</td>\n",
|
1563 |
+
" <td>19.355807</td>\n",
|
1564 |
+
" <td>15.952218</td>\n",
|
1565 |
+
" <td>115.244002</td>\n",
|
1566 |
+
" <td>7.884160</td>\n",
|
1567 |
+
" <td>0.331329</td>\n",
|
1568 |
+
" <td>11.760232</td>\n",
|
1569 |
+
" <td>0.476951</td>\n",
|
1570 |
+
" </tr>\n",
|
1571 |
+
" <tr>\n",
|
1572 |
+
" <th>min</th>\n",
|
1573 |
+
" <td>0.000000</td>\n",
|
1574 |
+
" <td>0.000000</td>\n",
|
1575 |
+
" <td>0.000000</td>\n",
|
1576 |
+
" <td>0.000000</td>\n",
|
1577 |
+
" <td>0.000000</td>\n",
|
1578 |
+
" <td>0.000000</td>\n",
|
1579 |
+
" <td>0.078000</td>\n",
|
1580 |
+
" <td>21.000000</td>\n",
|
1581 |
+
" <td>0.000000</td>\n",
|
1582 |
+
" </tr>\n",
|
1583 |
+
" <tr>\n",
|
1584 |
+
" <th>25%</th>\n",
|
1585 |
+
" <td>1.000000</td>\n",
|
1586 |
+
" <td>99.000000</td>\n",
|
1587 |
+
" <td>62.000000</td>\n",
|
1588 |
+
" <td>0.000000</td>\n",
|
1589 |
+
" <td>0.000000</td>\n",
|
1590 |
+
" <td>27.300000</td>\n",
|
1591 |
+
" <td>0.243750</td>\n",
|
1592 |
+
" <td>24.000000</td>\n",
|
1593 |
+
" <td>0.000000</td>\n",
|
1594 |
+
" </tr>\n",
|
1595 |
+
" <tr>\n",
|
1596 |
+
" <th>50%</th>\n",
|
1597 |
+
" <td>3.000000</td>\n",
|
1598 |
+
" <td>117.000000</td>\n",
|
1599 |
+
" <td>72.000000</td>\n",
|
1600 |
+
" <td>23.000000</td>\n",
|
1601 |
+
" <td>30.500000</td>\n",
|
1602 |
+
" <td>32.000000</td>\n",
|
1603 |
+
" <td>0.372500</td>\n",
|
1604 |
+
" <td>29.000000</td>\n",
|
1605 |
+
" <td>0.000000</td>\n",
|
1606 |
+
" </tr>\n",
|
1607 |
+
" <tr>\n",
|
1608 |
+
" <th>75%</th>\n",
|
1609 |
+
" <td>6.000000</td>\n",
|
1610 |
+
" <td>140.250000</td>\n",
|
1611 |
+
" <td>80.000000</td>\n",
|
1612 |
+
" <td>32.000000</td>\n",
|
1613 |
+
" <td>127.250000</td>\n",
|
1614 |
+
" <td>36.600000</td>\n",
|
1615 |
+
" <td>0.626250</td>\n",
|
1616 |
+
" <td>41.000000</td>\n",
|
1617 |
+
" <td>1.000000</td>\n",
|
1618 |
+
" </tr>\n",
|
1619 |
+
" <tr>\n",
|
1620 |
+
" <th>max</th>\n",
|
1621 |
+
" <td>17.000000</td>\n",
|
1622 |
+
" <td>199.000000</td>\n",
|
1623 |
+
" <td>122.000000</td>\n",
|
1624 |
+
" <td>99.000000</td>\n",
|
1625 |
+
" <td>846.000000</td>\n",
|
1626 |
+
" <td>67.100000</td>\n",
|
1627 |
+
" <td>2.420000</td>\n",
|
1628 |
+
" <td>81.000000</td>\n",
|
1629 |
+
" <td>1.000000</td>\n",
|
1630 |
+
" </tr>\n",
|
1631 |
+
" </tbody>\n",
|
1632 |
+
"</table>\n",
|
1633 |
+
"</div>"
|
1634 |
+
],
|
1635 |
+
"text/plain": [
|
1636 |
+
" preg plas pres skin test mass \\\n",
|
1637 |
+
"count 768.000000 768.000000 768.000000 768.000000 768.000000 768.000000 \n",
|
1638 |
+
"mean 3.845052 120.894531 69.105469 20.536458 79.799479 31.992578 \n",
|
1639 |
+
"std 3.369578 31.972618 19.355807 15.952218 115.244002 7.884160 \n",
|
1640 |
+
"min 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 \n",
|
1641 |
+
"25% 1.000000 99.000000 62.000000 0.000000 0.000000 27.300000 \n",
|
1642 |
+
"50% 3.000000 117.000000 72.000000 23.000000 30.500000 32.000000 \n",
|
1643 |
+
"75% 6.000000 140.250000 80.000000 32.000000 127.250000 36.600000 \n",
|
1644 |
+
"max 17.000000 199.000000 122.000000 99.000000 846.000000 67.100000 \n",
|
1645 |
+
"\n",
|
1646 |
+
" pedi age class \n",
|
1647 |
+
"count 768.000000 768.000000 768.000000 \n",
|
1648 |
+
"mean 0.471876 33.240885 0.348958 \n",
|
1649 |
+
"std 0.331329 11.760232 0.476951 \n",
|
1650 |
+
"min 0.078000 21.000000 0.000000 \n",
|
1651 |
+
"25% 0.243750 24.000000 0.000000 \n",
|
1652 |
+
"50% 0.372500 29.000000 0.000000 \n",
|
1653 |
+
"75% 0.626250 41.000000 1.000000 \n",
|
1654 |
+
"max 2.420000 81.000000 1.000000 "
|
1655 |
+
]
|
1656 |
+
},
|
1657 |
+
"execution_count": 16,
|
1658 |
+
"metadata": {},
|
1659 |
+
"output_type": "execute_result"
|
1660 |
+
}
|
1661 |
+
],
|
1662 |
+
"source": [
|
1663 |
+
"data.describe()"
|
1664 |
+
]
|
1665 |
+
},
|
1666 |
+
{
|
1667 |
+
"cell_type": "code",
|
1668 |
+
"execution_count": 17,
|
1669 |
+
"metadata": {},
|
1670 |
+
"outputs": [
|
1671 |
+
{
|
1672 |
+
"data": {
|
1673 |
+
"text/html": [
|
1674 |
+
"<div>\n",
|
1675 |
+
"<style scoped>\n",
|
1676 |
+
" .dataframe tbody tr th:only-of-type {\n",
|
1677 |
+
" vertical-align: middle;\n",
|
1678 |
+
" }\n",
|
1679 |
+
"\n",
|
1680 |
+
" .dataframe tbody tr th {\n",
|
1681 |
+
" vertical-align: top;\n",
|
1682 |
+
" }\n",
|
1683 |
+
"\n",
|
1684 |
+
" .dataframe thead th {\n",
|
1685 |
+
" text-align: right;\n",
|
1686 |
+
" }\n",
|
1687 |
+
"</style>\n",
|
1688 |
+
"<table border=\"1\" class=\"dataframe\">\n",
|
1689 |
+
" <thead>\n",
|
1690 |
+
" <tr style=\"text-align: right;\">\n",
|
1691 |
+
" <th></th>\n",
|
1692 |
+
" <th>preg</th>\n",
|
1693 |
+
" <th>plas</th>\n",
|
1694 |
+
" <th>pres</th>\n",
|
1695 |
+
" <th>skin</th>\n",
|
1696 |
+
" <th>test</th>\n",
|
1697 |
+
" <th>mass</th>\n",
|
1698 |
+
" <th>pedi</th>\n",
|
1699 |
+
" <th>age</th>\n",
|
1700 |
+
" </tr>\n",
|
1701 |
+
" </thead>\n",
|
1702 |
+
" <tbody>\n",
|
1703 |
+
" <tr>\n",
|
1704 |
+
" <th>count</th>\n",
|
1705 |
+
" <td>768.000000</td>\n",
|
1706 |
+
" <td>768.000000</td>\n",
|
1707 |
+
" <td>768.000000</td>\n",
|
1708 |
+
" <td>768.000000</td>\n",
|
1709 |
+
" <td>768.000000</td>\n",
|
1710 |
+
" <td>768.000000</td>\n",
|
1711 |
+
" <td>768.000000</td>\n",
|
1712 |
+
" <td>768.000000</td>\n",
|
1713 |
+
" </tr>\n",
|
1714 |
+
" <tr>\n",
|
1715 |
+
" <th>mean</th>\n",
|
1716 |
+
" <td>0.226180</td>\n",
|
1717 |
+
" <td>0.607510</td>\n",
|
1718 |
+
" <td>0.566438</td>\n",
|
1719 |
+
" <td>0.207439</td>\n",
|
1720 |
+
" <td>0.094326</td>\n",
|
1721 |
+
" <td>0.476790</td>\n",
|
1722 |
+
" <td>0.168179</td>\n",
|
1723 |
+
" <td>0.204015</td>\n",
|
1724 |
+
" </tr>\n",
|
1725 |
+
" <tr>\n",
|
1726 |
+
" <th>std</th>\n",
|
1727 |
+
" <td>0.198210</td>\n",
|
1728 |
+
" <td>0.160666</td>\n",
|
1729 |
+
" <td>0.158654</td>\n",
|
1730 |
+
" <td>0.161134</td>\n",
|
1731 |
+
" <td>0.136222</td>\n",
|
1732 |
+
" <td>0.117499</td>\n",
|
1733 |
+
" <td>0.141473</td>\n",
|
1734 |
+
" <td>0.196004</td>\n",
|
1735 |
+
" </tr>\n",
|
1736 |
+
" <tr>\n",
|
1737 |
+
" <th>min</th>\n",
|
1738 |
+
" <td>0.000000</td>\n",
|
1739 |
+
" <td>0.000000</td>\n",
|
1740 |
+
" <td>0.000000</td>\n",
|
1741 |
+
" <td>0.000000</td>\n",
|
1742 |
+
" <td>0.000000</td>\n",
|
1743 |
+
" <td>0.000000</td>\n",
|
1744 |
+
" <td>0.000000</td>\n",
|
1745 |
+
" <td>0.000000</td>\n",
|
1746 |
+
" </tr>\n",
|
1747 |
+
" <tr>\n",
|
1748 |
+
" <th>25%</th>\n",
|
1749 |
+
" <td>0.058824</td>\n",
|
1750 |
+
" <td>0.497487</td>\n",
|
1751 |
+
" <td>0.508197</td>\n",
|
1752 |
+
" <td>0.000000</td>\n",
|
1753 |
+
" <td>0.000000</td>\n",
|
1754 |
+
" <td>0.406855</td>\n",
|
1755 |
+
" <td>0.070773</td>\n",
|
1756 |
+
" <td>0.050000</td>\n",
|
1757 |
+
" </tr>\n",
|
1758 |
+
" <tr>\n",
|
1759 |
+
" <th>50%</th>\n",
|
1760 |
+
" <td>0.176471</td>\n",
|
1761 |
+
" <td>0.587940</td>\n",
|
1762 |
+
" <td>0.590164</td>\n",
|
1763 |
+
" <td>0.232323</td>\n",
|
1764 |
+
" <td>0.036052</td>\n",
|
1765 |
+
" <td>0.476900</td>\n",
|
1766 |
+
" <td>0.125747</td>\n",
|
1767 |
+
" <td>0.133333</td>\n",
|
1768 |
+
" </tr>\n",
|
1769 |
+
" <tr>\n",
|
1770 |
+
" <th>75%</th>\n",
|
1771 |
+
" <td>0.352941</td>\n",
|
1772 |
+
" <td>0.704774</td>\n",
|
1773 |
+
" <td>0.655738</td>\n",
|
1774 |
+
" <td>0.323232</td>\n",
|
1775 |
+
" <td>0.150414</td>\n",
|
1776 |
+
" <td>0.545455</td>\n",
|
1777 |
+
" <td>0.234095</td>\n",
|
1778 |
+
" <td>0.333333</td>\n",
|
1779 |
+
" </tr>\n",
|
1780 |
+
" <tr>\n",
|
1781 |
+
" <th>max</th>\n",
|
1782 |
+
" <td>1.000000</td>\n",
|
1783 |
+
" <td>1.000000</td>\n",
|
1784 |
+
" <td>1.000000</td>\n",
|
1785 |
+
" <td>1.000000</td>\n",
|
1786 |
+
" <td>1.000000</td>\n",
|
1787 |
+
" <td>1.000000</td>\n",
|
1788 |
+
" <td>1.000000</td>\n",
|
1789 |
+
" <td>1.000000</td>\n",
|
1790 |
+
" </tr>\n",
|
1791 |
+
" </tbody>\n",
|
1792 |
+
"</table>\n",
|
1793 |
+
"</div>"
|
1794 |
+
],
|
1795 |
+
"text/plain": [
|
1796 |
+
" preg plas pres skin test mass \\\n",
|
1797 |
+
"count 768.000000 768.000000 768.000000 768.000000 768.000000 768.000000 \n",
|
1798 |
+
"mean 0.226180 0.607510 0.566438 0.207439 0.094326 0.476790 \n",
|
1799 |
+
"std 0.198210 0.160666 0.158654 0.161134 0.136222 0.117499 \n",
|
1800 |
+
"min 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 \n",
|
1801 |
+
"25% 0.058824 0.497487 0.508197 0.000000 0.000000 0.406855 \n",
|
1802 |
+
"50% 0.176471 0.587940 0.590164 0.232323 0.036052 0.476900 \n",
|
1803 |
+
"75% 0.352941 0.704774 0.655738 0.323232 0.150414 0.545455 \n",
|
1804 |
+
"max 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 \n",
|
1805 |
+
"\n",
|
1806 |
+
" pedi age \n",
|
1807 |
+
"count 768.000000 768.000000 \n",
|
1808 |
+
"mean 0.168179 0.204015 \n",
|
1809 |
+
"std 0.141473 0.196004 \n",
|
1810 |
+
"min 0.000000 0.000000 \n",
|
1811 |
+
"25% 0.070773 0.050000 \n",
|
1812 |
+
"50% 0.125747 0.133333 \n",
|
1813 |
+
"75% 0.234095 0.333333 \n",
|
1814 |
+
"max 1.000000 1.000000 "
|
1815 |
+
]
|
1816 |
+
},
|
1817 |
+
"execution_count": 17,
|
1818 |
+
"metadata": {},
|
1819 |
+
"output_type": "execute_result"
|
1820 |
+
}
|
1821 |
+
],
|
1822 |
+
"source": [
|
1823 |
+
"X_scaled.describe()"
|
1824 |
+
]
|
1825 |
+
},
|
1826 |
+
{
|
1827 |
+
"cell_type": "code",
|
1828 |
+
"execution_count": 18,
|
1829 |
+
"metadata": {},
|
1830 |
+
"outputs": [],
|
1831 |
+
"source": [
|
1832 |
+
"#Question: Add code (below) to create a new dataframe, where only the 'preg' and 'plas' attributes are transformed"
|
1833 |
+
]
|
1834 |
+
},
|
1835 |
+
{
|
1836 |
+
"cell_type": "code",
|
1837 |
+
"execution_count": 19,
|
1838 |
+
"metadata": {},
|
1839 |
+
"outputs": [
|
1840 |
+
{
|
1841 |
+
"data": {
|
1842 |
+
"text/html": [
|
1843 |
+
"<div>\n",
|
1844 |
+
"<style scoped>\n",
|
1845 |
+
" .dataframe tbody tr th:only-of-type {\n",
|
1846 |
+
" vertical-align: middle;\n",
|
1847 |
+
" }\n",
|
1848 |
+
"\n",
|
1849 |
+
" .dataframe tbody tr th {\n",
|
1850 |
+
" vertical-align: top;\n",
|
1851 |
+
" }\n",
|
1852 |
+
"\n",
|
1853 |
+
" .dataframe thead th {\n",
|
1854 |
+
" text-align: right;\n",
|
1855 |
+
" }\n",
|
1856 |
+
"</style>\n",
|
1857 |
+
"<table border=\"1\" class=\"dataframe\">\n",
|
1858 |
+
" <thead>\n",
|
1859 |
+
" <tr style=\"text-align: right;\">\n",
|
1860 |
+
" <th></th>\n",
|
1861 |
+
" <th>preg</th>\n",
|
1862 |
+
" <th>plas</th>\n",
|
1863 |
+
" <th>pres</th>\n",
|
1864 |
+
" <th>skin</th>\n",
|
1865 |
+
" <th>test</th>\n",
|
1866 |
+
" <th>mass</th>\n",
|
1867 |
+
" <th>pedi</th>\n",
|
1868 |
+
" <th>age</th>\n",
|
1869 |
+
" </tr>\n",
|
1870 |
+
" </thead>\n",
|
1871 |
+
" <tbody>\n",
|
1872 |
+
" <tr>\n",
|
1873 |
+
" <th>0</th>\n",
|
1874 |
+
" <td>0.352941</td>\n",
|
1875 |
+
" <td>0.743719</td>\n",
|
1876 |
+
" <td>72</td>\n",
|
1877 |
+
" <td>35</td>\n",
|
1878 |
+
" <td>0</td>\n",
|
1879 |
+
" <td>33.6</td>\n",
|
1880 |
+
" <td>0.627</td>\n",
|
1881 |
+
" <td>50</td>\n",
|
1882 |
+
" </tr>\n",
|
1883 |
+
" <tr>\n",
|
1884 |
+
" <th>1</th>\n",
|
1885 |
+
" <td>0.058824</td>\n",
|
1886 |
+
" <td>0.427136</td>\n",
|
1887 |
+
" <td>66</td>\n",
|
1888 |
+
" <td>29</td>\n",
|
1889 |
+
" <td>0</td>\n",
|
1890 |
+
" <td>26.6</td>\n",
|
1891 |
+
" <td>0.351</td>\n",
|
1892 |
+
" <td>31</td>\n",
|
1893 |
+
" </tr>\n",
|
1894 |
+
" <tr>\n",
|
1895 |
+
" <th>2</th>\n",
|
1896 |
+
" <td>0.470588</td>\n",
|
1897 |
+
" <td>0.919598</td>\n",
|
1898 |
+
" <td>64</td>\n",
|
1899 |
+
" <td>0</td>\n",
|
1900 |
+
" <td>0</td>\n",
|
1901 |
+
" <td>23.3</td>\n",
|
1902 |
+
" <td>0.672</td>\n",
|
1903 |
+
" <td>32</td>\n",
|
1904 |
+
" </tr>\n",
|
1905 |
+
" <tr>\n",
|
1906 |
+
" <th>3</th>\n",
|
1907 |
+
" <td>0.058824</td>\n",
|
1908 |
+
" <td>0.447236</td>\n",
|
1909 |
+
" <td>66</td>\n",
|
1910 |
+
" <td>23</td>\n",
|
1911 |
+
" <td>94</td>\n",
|
1912 |
+
" <td>28.1</td>\n",
|
1913 |
+
" <td>0.167</td>\n",
|
1914 |
+
" <td>21</td>\n",
|
1915 |
+
" </tr>\n",
|
1916 |
+
" <tr>\n",
|
1917 |
+
" <th>4</th>\n",
|
1918 |
+
" <td>0.000000</td>\n",
|
1919 |
+
" <td>0.688442</td>\n",
|
1920 |
+
" <td>40</td>\n",
|
1921 |
+
" <td>35</td>\n",
|
1922 |
+
" <td>168</td>\n",
|
1923 |
+
" <td>43.1</td>\n",
|
1924 |
+
" <td>2.288</td>\n",
|
1925 |
+
" <td>33</td>\n",
|
1926 |
+
" </tr>\n",
|
1927 |
+
" </tbody>\n",
|
1928 |
+
"</table>\n",
|
1929 |
+
"</div>"
|
1930 |
+
],
|
1931 |
+
"text/plain": [
|
1932 |
+
" preg plas pres skin test mass pedi age\n",
|
1933 |
+
"0 0.352941 0.743719 72 35 0 33.6 0.627 50\n",
|
1934 |
+
"1 0.058824 0.427136 66 29 0 26.6 0.351 31\n",
|
1935 |
+
"2 0.470588 0.919598 64 0 0 23.3 0.672 32\n",
|
1936 |
+
"3 0.058824 0.447236 66 23 94 28.1 0.167 21\n",
|
1937 |
+
"4 0.000000 0.688442 40 35 168 43.1 2.288 33"
|
1938 |
+
]
|
1939 |
+
},
|
1940 |
+
"execution_count": 19,
|
1941 |
+
"metadata": {},
|
1942 |
+
"output_type": "execute_result"
|
1943 |
+
}
|
1944 |
+
],
|
1945 |
+
"source": [
|
1946 |
+
"from sklearn.preprocessing import MinMaxScaler\n",
|
1947 |
+
"X_copy = X.copy()\n",
|
1948 |
+
"scaler = MinMaxScaler()\n",
|
1949 |
+
"X_copy[['preg', 'plas']] = scaler.fit_transform(X_copy[['preg', 'plas']])\n",
|
1950 |
+
"X_copy.head()"
|
1951 |
+
]
|
1952 |
+
},
|
1953 |
+
{
|
1954 |
+
"cell_type": "code",
|
1955 |
+
"execution_count": null,
|
1956 |
+
"metadata": {},
|
1957 |
+
"outputs": [],
|
1958 |
+
"source": []
|
1959 |
+
},
|
1960 |
+
{
|
1961 |
+
"cell_type": "code",
|
1962 |
+
"execution_count": null,
|
1963 |
+
"metadata": {},
|
1964 |
+
"outputs": [],
|
1965 |
+
"source": []
|
1966 |
+
},
|
1967 |
+
{
|
1968 |
+
"cell_type": "code",
|
1969 |
+
"execution_count": null,
|
1970 |
+
"metadata": {},
|
1971 |
+
"outputs": [],
|
1972 |
+
"source": []
|
1973 |
+
}
|
1974 |
+
],
|
1975 |
+
"metadata": {
|
1976 |
+
"kernelspec": {
|
1977 |
+
"display_name": "Python 3 (ipykernel)",
|
1978 |
+
"language": "python",
|
1979 |
+
"name": "python3"
|
1980 |
+
},
|
1981 |
+
"language_info": {
|
1982 |
+
"codemirror_mode": {
|
1983 |
+
"name": "ipython",
|
1984 |
+
"version": 3
|
1985 |
+
},
|
1986 |
+
"file_extension": ".py",
|
1987 |
+
"mimetype": "text/x-python",
|
1988 |
+
"name": "python",
|
1989 |
+
"nbconvert_exporter": "python",
|
1990 |
+
"pygments_lexer": "ipython3",
|
1991 |
+
"version": "3.12.9"
|
1992 |
+
}
|
1993 |
+
},
|
1994 |
+
"nbformat": 4,
|
1995 |
+
"nbformat_minor": 4
|
1996 |
+
}
|
Data Analitics/Week 4/TU257-Lab3-4-Correlation.ipynb
ADDED
The diff for this file is too large to render.
See raw diff
|
|
Data Analitics/Week 4/TU257-Lab3-5-Sampling-and-Unbalanced.ipynb
ADDED
@@ -0,0 +1,538 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"cells": [
|
3 |
+
{
|
4 |
+
"cell_type": "code",
|
5 |
+
"execution_count": 1,
|
6 |
+
"metadata": {},
|
7 |
+
"outputs": [
|
8 |
+
{
|
9 |
+
"data": {
|
10 |
+
"text/html": [
|
11 |
+
"<div>\n",
|
12 |
+
"<style scoped>\n",
|
13 |
+
" .dataframe tbody tr th:only-of-type {\n",
|
14 |
+
" vertical-align: middle;\n",
|
15 |
+
" }\n",
|
16 |
+
"\n",
|
17 |
+
" .dataframe tbody tr th {\n",
|
18 |
+
" vertical-align: top;\n",
|
19 |
+
" }\n",
|
20 |
+
"\n",
|
21 |
+
" .dataframe thead th {\n",
|
22 |
+
" text-align: right;\n",
|
23 |
+
" }\n",
|
24 |
+
"</style>\n",
|
25 |
+
"<table border=\"1\" class=\"dataframe\">\n",
|
26 |
+
" <thead>\n",
|
27 |
+
" <tr style=\"text-align: right;\">\n",
|
28 |
+
" <th></th>\n",
|
29 |
+
" <th>preg</th>\n",
|
30 |
+
" <th>plas</th>\n",
|
31 |
+
" <th>pres</th>\n",
|
32 |
+
" <th>skin</th>\n",
|
33 |
+
" <th>test</th>\n",
|
34 |
+
" <th>mass</th>\n",
|
35 |
+
" <th>pedi</th>\n",
|
36 |
+
" <th>age</th>\n",
|
37 |
+
" <th>class</th>\n",
|
38 |
+
" </tr>\n",
|
39 |
+
" </thead>\n",
|
40 |
+
" <tbody>\n",
|
41 |
+
" <tr>\n",
|
42 |
+
" <th>0</th>\n",
|
43 |
+
" <td>6</td>\n",
|
44 |
+
" <td>148</td>\n",
|
45 |
+
" <td>72</td>\n",
|
46 |
+
" <td>35</td>\n",
|
47 |
+
" <td>0</td>\n",
|
48 |
+
" <td>33.6</td>\n",
|
49 |
+
" <td>0.627</td>\n",
|
50 |
+
" <td>50</td>\n",
|
51 |
+
" <td>1</td>\n",
|
52 |
+
" </tr>\n",
|
53 |
+
" <tr>\n",
|
54 |
+
" <th>1</th>\n",
|
55 |
+
" <td>1</td>\n",
|
56 |
+
" <td>85</td>\n",
|
57 |
+
" <td>66</td>\n",
|
58 |
+
" <td>29</td>\n",
|
59 |
+
" <td>0</td>\n",
|
60 |
+
" <td>26.6</td>\n",
|
61 |
+
" <td>0.351</td>\n",
|
62 |
+
" <td>31</td>\n",
|
63 |
+
" <td>0</td>\n",
|
64 |
+
" </tr>\n",
|
65 |
+
" <tr>\n",
|
66 |
+
" <th>2</th>\n",
|
67 |
+
" <td>8</td>\n",
|
68 |
+
" <td>183</td>\n",
|
69 |
+
" <td>64</td>\n",
|
70 |
+
" <td>0</td>\n",
|
71 |
+
" <td>0</td>\n",
|
72 |
+
" <td>23.3</td>\n",
|
73 |
+
" <td>0.672</td>\n",
|
74 |
+
" <td>32</td>\n",
|
75 |
+
" <td>1</td>\n",
|
76 |
+
" </tr>\n",
|
77 |
+
" <tr>\n",
|
78 |
+
" <th>3</th>\n",
|
79 |
+
" <td>1</td>\n",
|
80 |
+
" <td>89</td>\n",
|
81 |
+
" <td>66</td>\n",
|
82 |
+
" <td>23</td>\n",
|
83 |
+
" <td>94</td>\n",
|
84 |
+
" <td>28.1</td>\n",
|
85 |
+
" <td>0.167</td>\n",
|
86 |
+
" <td>21</td>\n",
|
87 |
+
" <td>0</td>\n",
|
88 |
+
" </tr>\n",
|
89 |
+
" <tr>\n",
|
90 |
+
" <th>4</th>\n",
|
91 |
+
" <td>0</td>\n",
|
92 |
+
" <td>137</td>\n",
|
93 |
+
" <td>40</td>\n",
|
94 |
+
" <td>35</td>\n",
|
95 |
+
" <td>168</td>\n",
|
96 |
+
" <td>43.1</td>\n",
|
97 |
+
" <td>2.288</td>\n",
|
98 |
+
" <td>33</td>\n",
|
99 |
+
" <td>1</td>\n",
|
100 |
+
" </tr>\n",
|
101 |
+
" </tbody>\n",
|
102 |
+
"</table>\n",
|
103 |
+
"</div>"
|
104 |
+
],
|
105 |
+
"text/plain": [
|
106 |
+
" preg plas pres skin test mass pedi age class\n",
|
107 |
+
"0 6 148 72 35 0 33.6 0.627 50 1\n",
|
108 |
+
"1 1 85 66 29 0 26.6 0.351 31 0\n",
|
109 |
+
"2 8 183 64 0 0 23.3 0.672 32 1\n",
|
110 |
+
"3 1 89 66 23 94 28.1 0.167 21 0\n",
|
111 |
+
"4 0 137 40 35 168 43.1 2.288 33 1"
|
112 |
+
]
|
113 |
+
},
|
114 |
+
"execution_count": 1,
|
115 |
+
"metadata": {},
|
116 |
+
"output_type": "execute_result"
|
117 |
+
}
|
118 |
+
],
|
119 |
+
"source": [
|
120 |
+
"#Import the data set\n",
|
121 |
+
"\n",
|
122 |
+
"import pandas as pd\n",
|
123 |
+
"columns = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']\n",
|
124 |
+
"data = pd.read_csv('/Users/brendan.tierney/Dropbox/4-Datasets/pima-indians-diabetes.csv', names=columns)\n",
|
125 |
+
"data.head()"
|
126 |
+
]
|
127 |
+
},
|
128 |
+
{
|
129 |
+
"cell_type": "code",
|
130 |
+
"execution_count": 2,
|
131 |
+
"metadata": {},
|
132 |
+
"outputs": [
|
133 |
+
{
|
134 |
+
"data": {
|
135 |
+
"text/plain": [
|
136 |
+
"(768, 9)"
|
137 |
+
]
|
138 |
+
},
|
139 |
+
"execution_count": 2,
|
140 |
+
"metadata": {},
|
141 |
+
"output_type": "execute_result"
|
142 |
+
}
|
143 |
+
],
|
144 |
+
"source": [
|
145 |
+
"data.shape"
|
146 |
+
]
|
147 |
+
},
|
148 |
+
{
|
149 |
+
"cell_type": "code",
|
150 |
+
"execution_count": 4,
|
151 |
+
"metadata": {},
|
152 |
+
"outputs": [
|
153 |
+
{
|
154 |
+
"data": {
|
155 |
+
"text/plain": [
|
156 |
+
"0 500\n",
|
157 |
+
"1 268\n",
|
158 |
+
"Name: class, dtype: int64"
|
159 |
+
]
|
160 |
+
},
|
161 |
+
"execution_count": 4,
|
162 |
+
"metadata": {},
|
163 |
+
"output_type": "execute_result"
|
164 |
+
}
|
165 |
+
],
|
166 |
+
"source": [
|
167 |
+
"data['class'].value_counts()"
|
168 |
+
]
|
169 |
+
},
|
170 |
+
{
|
171 |
+
"cell_type": "code",
|
172 |
+
"execution_count": 9,
|
173 |
+
"metadata": {},
|
174 |
+
"outputs": [
|
175 |
+
{
|
176 |
+
"data": {
|
177 |
+
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEFCAYAAAAYKqc0AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/YYfK9AAAACXBIWXMAAAsTAAALEwEAmpwYAAAQN0lEQVR4nO3df6zddX3H8edrVECFUaB3FVuwbNQ5zAKaihB/xEk2BefKH8pQp5WwNFkg0TB/dGoUjS64ZAPNnFkzDFX8AUORTpgOUaJG+VEUUESlY7C2Aq3QVpT5A3nvj/MpnNZ7e2/be++hnz4fycn5fD+fz/d836e9ffV7P+d7zklVIUnqy++MugBJ0vQz3CWpQ4a7JHXIcJekDhnuktQhw12SOmS4SxNIMpbkB0mePOpaxpPkgFbf2Khr0ROP4a6RSvLaJGuS/CzJvUn+M8kLZ+G4leSYSaatAC6uqv9r+1yX5K9nuraJ7Hj8qvol8DEGdUrbMdw1MknOBS4E/h6YDxwF/AuwdIRlAYOzYmAZcMk0Puac6XqsIZ8ClrV6pccY7hqJJIcA7wPOrqrPVdXPq+rXVfUfVfXWNueAJBcm+XG7XbgtxJK8Mck3dnjMx87Gk1yc5CNJrkryUJIbkvxBG/ta2+XW9hvDX45T4vOBLVW1vu3zAeBFwD+3ff659X8oybokP01yc5IXDdVzXpLLk1yS5KfAG5McneRrraYvtxovGdrnxCTfTLIlya1JXrKz47f6NgMn7v7fhnpkuGtUTgIOBK7YyZx3Mgit44HjgBOAd+3CMc4A3gscCqwFPgBQVS9u48dV1UFVdek4+/4x8MNtG1X1TuDrwDltn3Pa0E2tvsMYnEX/e5IDhx5nKXA5MBf4ZJtzI3A4cB7w+m0TkywArgLe3x7vLcBnk4zt5PgAdzD485EeY7hrVA4HflJVj+xkzuuA91XVxqraxCCoX7+T+Tu6oqpubMf4JIMQnqq5wEOTTaqqS6rqgap6pKr+ETgA+MOhKd+qqs9X1aPAGPA84N1V9auq+gawemjuXwFXV9XVVfVoVV0DrAFOnaSMh1q90mMMd43KA8C8Sdahnw7cM7R9T+ubqvuG2g8DB+3CvpuBgyeblOQtSe5IsjXJFuAQYN7QlHVD7acDD1bVwxOMPwN4dVuS2dIe74XAEZOUcTCwZbJatW8x3DUq3wJ+CZy2kzk/ZhB42xzV+gB+Djxl20CSp01zfbcBz9yhb7uPUG3r628DTgcOraq5wFYgE+xzL3BYkqcM9R051F4HfKKq5g7dnlpV5493/CF/BNw6heekfYjhrpGoqq3Au4GPJDktyVOSPCnJKUn+oU37NPCudr35vDZ/24uPtwLPTnJ8W+M+bxdLuB/4/Z2M3wjMbevgE+1zMPAIsAmYk+TdwO9O9IBVdQ+DZZbzkuyf5CTglUNTLgFemeRlSfZLcmCSlyRZOFHNrb7DgOt38ly0DzLcNTJtjfpcBi+SbmJw5noO8Pk25f0MwvA24LvAt1sfVfUjBlfbfBm4E9juypkpOA9Y1ZY/Th+ntl8BFzNYB9/mQ8CrkmxO8mHgS8AXgR8xWDL6Bdsvs4zndQxeTH6gPZdLGfwGQ1WtY/AC7Dt4/M/jrTz+73TH4wO8FljVrnmXHhO/rEMaX3vn59eB52x7I9MMHONS4AdV9Z7d2PcABr/BvLiqNk57cdqrGe7SLEryPOBB4H+AP2PwW8pJVfWdUdal/szEO+YkTexpwOcYXAq6Hvgbg10zwTN3SeqQL6hKUocMd0nq0BNizX3evHm1aNGiUZchSXuVm2+++SdVNe7n+T8hwn3RokWsWbNm1GVI0l4lyT0TjbksI0kdMtwlqUOGuyR1yHCXpA4Z7pLUoSmFe5K7k3w3yS1J1rS+w5Jck+TOdn9o60+SDydZm+S2JM+dyScgSfptu3Lm/idVdXxVLWnbK4Brq2oxcG3bBjgFWNxuy4GPTlexkqSp2ZNlmaXAqtZexePfqLMU+HgNXM/gCw8m+5owSdI0muqbmAr4ryQF/GtVrQTmV9W9bfw+YH5rL2D7LyxY3/ruHeojyXIGZ/YcddRRu1f9LFu04qpRl9CVu89/xahLkLo11XB/YVVtSPJ7wDVJfjA8WFXVgn/K2n8QKwGWLFniR1NK0jSa0rJMVW1o9xuBK4ATgPu3Lbe0+23fBLOB7b/0d2HrkyTNkknDPclTkxy8rc3g22O+B6wGlrVpy4ArW3s18IZ21cyJwNah5RtJ0iyYyrLMfOCKJNvmf6qqvpjkJuCyJGcx+HLgbV8yfDVwKrAWeBg4c9qrliTt1KThXlV3AceN0/8AcPI4/QWcPS3VSZJ2i+9QlaQOGe6S1CHDXZI6ZLhLUocMd0nqkOEuSR0y3CWpQ4a7JHXIcJekDhnuktQhw12SOmS4S1KHDHdJ6pDhLkkdMtwlqUOGuyR1yHCXpA4Z7pLUIcNdkjpkuEtShwx3SeqQ4S5JHTLcJalDhrskdchwl6QOGe6S1CHDXZI6ZLhLUocMd0nqkOEuSR0y3CWpQ4a7JHVoyuGeZL8k30nyhbZ9dJIbkqxNcmmS/Vv/AW17bRtfNEO1S5ImsCtn7m8C7hja/iBwQVUdA2wGzmr9ZwGbW/8FbZ4kaRZNKdyTLAReAfxb2w7wUuDyNmUVcFprL23btPGT23xJ0iyZ6pn7hcDbgEfb9uHAlqp6pG2vBxa09gJgHUAb39rmS5JmyaThnuTPgY1VdfN0HjjJ8iRrkqzZtGnTdD60JO3zpnLm/gLgL5LcDXyGwXLMh4C5Sea0OQuBDa29ATgSoI0fAjyw44NW1cqqWlJVS8bGxvboSUiStjdpuFfV31XVwqpaBJwBfKWqXgd8FXhVm7YMuLK1V7dt2vhXqqqmtWpJ0k7tyXXubwfOTbKWwZr6Ra3/IuDw1n8usGLPSpQk7ao5k095XFVdB1zX2ncBJ4wz5xfAq6ehNknSbvIdqpLUIcNdkjpkuEtShwx3SeqQ4S5JHTLcJalDhrskdchwl6QOGe6S1CHDXZI6ZLhLUocMd0nqkOEuSR0y3CWpQ4a7JHXIcJekDhnuktQhw12SOmS4S1KHDHdJ6pDhLkkdMtwlqUOGuyR1yHCXpA4Z7pLUIcNdkjpkuEtShwx3SeqQ4S5JHTLcJalDhrskdchwl6QOTRruSQ5McmOSW5PcnuS9rf/oJDckWZvk0iT7t/4D2vbaNr5ohp+DJGkHUzlz/yXw0qo6DjgeeHmSE4EPAhdU1THAZuCsNv8sYHPrv6DNkyTNoknDvQZ+1jaf1G4FvBS4vPWvAk5r7aVtmzZ+cpJMV8GSpMlNac09yX5JbgE2AtcA/w1sqapH2pT1wILWXgCsA2jjW4HDp7FmSdIkphTuVfWbqjoeWAicADxrTw+cZHmSNUnWbNq0aU8fTpI0ZJeulqmqLcBXgZOAuUnmtKGFwIbW3gAcCdDGDwEeGOexVlbVkqpaMjY2tnvVS5LGNZWrZcaSzG3tJwN/CtzBIORf1aYtA65s7dVtmzb+laqqaaxZkjSJOZNP4QhgVZL9GPxncFlVfSHJ94HPJHk/8B3gojb/IuATSdYCDwJnzEDdkqSdmDTcq+o24Dnj9N/FYP19x/5fAK+eluokSbvFd6hKUocMd0nqkOEuSR0y3CWpQ4a7JHXIcJekDhnuktQhw12SOjSVd6hKeoJbtOKqUZfQlbvPf8WoS9hjnrlLUocMd0nqkOEuSR0y3CWpQ4a7JHXIcJekDhnuktQhw12SOmS4S1KHDHdJ6pDhLkkdMtwlqUOGuyR1yHCXpA4Z7pLUIcNdkjpkuEtShwx3SeqQ4S5JHTLcJalDhrskdchwl6QOGe6S1CHDXZI6NGm4JzkyyVeTfD/J7Une1PoPS3JNkjvb/aGtP0k+nGRtktuSPHemn4QkaXtTOXN/BPjbqjoWOBE4O8mxwArg2qpaDFzbtgFOARa323Lgo9NetSRppyYN96q6t6q+3doPAXcAC4ClwKo2bRVwWmsvBT5eA9cDc5McMd2FS5Imtktr7kkWAc8BbgDmV9W9beg+YH5rLwDWDe22vvVJkmbJlMM9yUHAZ4E3V9VPh8eqqoDalQMnWZ5kTZI1mzZt2pVdJUmTmFK4J3kSg2D/ZFV9rnXfv225pd1vbP0bgCOHdl/Y+rZTVSuraklVLRkbG9vd+iVJ45jK1TIBLgLuqKp/GhpaDSxr7WXAlUP9b2hXzZwIbB1avpEkzYI5U5jzAuD1wHeT3NL63gGcD1yW5CzgHuD0NnY1cCqwFngYOHM6C5YkTW7ScK+qbwCZYPjkceYXcPYe1iVJ2gO+Q1WSOmS4S1KHDHdJ6pDhLkkdMtwlqUOGuyR1yHCXpA4Z7pLUIcNdkjpkuEtShwx3SeqQ4S5JHTLcJalDhrskdchwl6QOGe6S1CHDXZI6ZLhLUocMd0nqkOEuSR0y3CWpQ4a7JHXIcJekDhnuktQhw12SOmS4S1KHDHdJ6pDhLkkdMtwlqUOGuyR1yHCXpA4Z7pLUoUnDPcnHkmxM8r2hvsOSXJPkznZ/aOtPkg8nWZvktiTPncniJUnjm8qZ+8XAy3foWwFcW1WLgWvbNsApwOJ2Ww58dHrKlCTtiknDvaq+Bjy4Q/dSYFVrrwJOG+r/eA1cD8xNcsQ01SpJmqLdXXOfX1X3tvZ9wPzWXgCsG5q3vvVJkmbRHr+gWlUF1K7ul2R5kjVJ1mzatGlPy5AkDdndcL9/23JLu9/Y+jcARw7NW9j6fktVrayqJVW1ZGxsbDfLkCSNZ3fDfTWwrLWXAVcO9b+hXTVzIrB1aPlGkjRL5kw2IcmngZcA85KsB94DnA9cluQs4B7g9Db9auBUYC3wMHDmDNQsSZrEpOFeVa+ZYOjkceYWcPaeFiVJ2jO+Q1WSOmS4S1KHDHdJ6pDhLkkdMtwlqUOGuyR1yHCXpA4Z7pLUIcNdkjpkuEtShwx3SeqQ4S5JHTLcJalDhrskdchwl6QOGe6S1CHDXZI6ZLhLUocMd0nqkOEuSR0y3CWpQ4a7JHXIcJekDhnuktQhw12SOmS4S1KHDHdJ6pDhLkkdMtwlqUOGuyR1yHCXpA4Z7pLUIcNdkjo0I+Ge5OVJfphkbZIVM3EMSdLEpj3ck+wHfAQ4BTgWeE2SY6f7OJKkic3EmfsJwNqququqfgV8Blg6A8eRJE1gzgw85gJg3dD2euD5O05KshxY3jZ/luSHM1DLvmoe8JNRFzGZfHDUFWgE/NmcXs+YaGAmwn1KqmolsHJUx+9ZkjVVtWTUdUg78mdz9szEsswG4Mih7YWtT5I0S2Yi3G8CFic5Osn+wBnA6hk4jiRpAtO+LFNVjyQ5B/gSsB/wsaq6fbqPo51yuUtPVP5szpJU1ahrkCRNM9+hKkkdMtwlqUOGuyR1aGTXuWt6JHkWg3cAL2hdG4DVVXXH6KqSNGqeue/Fkrydwcc7BLix3QJ82g9s0xNZkjNHXUPvvFpmL5bkR8Czq+rXO/TvD9xeVYtHU5m0c0n+t6qOGnUdPXNZZu/2KPB04J4d+o9oY9LIJLltoiFg/mzWsi8y3PdubwauTXInj39Y21HAMcA5oypKauYDLwM279Af4JuzX86+xXDfi1XVF5M8k8HHLA+/oHpTVf1mdJVJAHwBOKiqbtlxIMl1s17NPsY1d0nqkFfLSFKHDHdJ6pDhLkkdMtwlqUOGuyR16P8BtIgOjfRADRgAAAAASUVORK5CYII=\n",
|
178 |
+
"text/plain": [
|
179 |
+
"<Figure size 432x288 with 1 Axes>"
|
180 |
+
]
|
181 |
+
},
|
182 |
+
"metadata": {
|
183 |
+
"needs_background": "light"
|
184 |
+
},
|
185 |
+
"output_type": "display_data"
|
186 |
+
}
|
187 |
+
],
|
188 |
+
"source": [
|
189 |
+
"#print bar chart\n",
|
190 |
+
"data['class'].value_counts().plot(kind='bar', title='Count (target)');"
|
191 |
+
]
|
192 |
+
},
|
193 |
+
{
|
194 |
+
"cell_type": "markdown",
|
195 |
+
"metadata": {},
|
196 |
+
"source": [
|
197 |
+
"#### Down Sampling - Majority Class - Using Random Sampling"
|
198 |
+
]
|
199 |
+
},
|
200 |
+
{
|
201 |
+
"cell_type": "code",
|
202 |
+
"execution_count": 14,
|
203 |
+
"metadata": {},
|
204 |
+
"outputs": [
|
205 |
+
{
|
206 |
+
"name": "stdout",
|
207 |
+
"output_type": "stream",
|
208 |
+
"text": [
|
209 |
+
"Class = 0 500\n",
|
210 |
+
"Class = 1 268\n"
|
211 |
+
]
|
212 |
+
}
|
213 |
+
],
|
214 |
+
"source": [
|
215 |
+
"count_class_0, count_class_1 = data['class'].value_counts()\n",
|
216 |
+
"\n",
|
217 |
+
"# Divide by class\n",
|
218 |
+
"df_class_0 = data[data['class'] == 0] #majority class\n",
|
219 |
+
"df_class_1 = data[data['class'] == 1] #minority class\n",
|
220 |
+
"\n",
|
221 |
+
"print('Class = 0 ', df_class_0.shape[0])\n",
|
222 |
+
"print('Class = 1 ', df_class_1.shape[0])"
|
223 |
+
]
|
224 |
+
},
|
225 |
+
{
|
226 |
+
"cell_type": "code",
|
227 |
+
"execution_count": 16,
|
228 |
+
"metadata": {},
|
229 |
+
"outputs": [
|
230 |
+
{
|
231 |
+
"data": {
|
232 |
+
"text/plain": [
|
233 |
+
"(268, 9)"
|
234 |
+
]
|
235 |
+
},
|
236 |
+
"execution_count": 16,
|
237 |
+
"metadata": {},
|
238 |
+
"output_type": "execute_result"
|
239 |
+
}
|
240 |
+
],
|
241 |
+
"source": [
|
242 |
+
"# Sample Majority class (y=0, to have same number of records as minority calls (y=1)\n",
|
243 |
+
"df_class_0_under = df_class_0.sample(count_class_1)\n",
|
244 |
+
"\n",
|
245 |
+
"df_class_0_under.shape"
|
246 |
+
]
|
247 |
+
},
|
248 |
+
{
|
249 |
+
"cell_type": "code",
|
250 |
+
"execution_count": 19,
|
251 |
+
"metadata": {},
|
252 |
+
"outputs": [
|
253 |
+
{
|
254 |
+
"name": "stdout",
|
255 |
+
"output_type": "stream",
|
256 |
+
"text": [
|
257 |
+
"Random under-sampling:\n",
|
258 |
+
"0 268\n",
|
259 |
+
"1 268\n",
|
260 |
+
"Name: class, dtype: int64\n",
|
261 |
+
"Num records = 536\n"
|
262 |
+
]
|
263 |
+
},
|
264 |
+
{
|
265 |
+
"data": {
|
266 |
+
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEFCAYAAAAYKqc0AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/YYfK9AAAACXBIWXMAAAsTAAALEwEAmpwYAAAPyklEQVR4nO3df6zddX3H8edrVHEKs2DvaimtZVq3QRarqYjxR1hMRFhMMdkY6LAal5qFJhp/bPgj2hlZ2DJ/RiWpkVAFEaag3WQ6bDRI/AGFQQUq0ihdWwu98lvZ0MJ7f5xv4fRyb+/ve+inz0dyc8/5fL/f831fuDx7+r3nXFJVSJLa8nuDHkCSNPOMuyQ1yLhLUoOMuyQ1yLhLUoOMuyQ1yLhLY0gylOSnSX5/0LOMJsnh3XxDg55FTz3GXQOV5I1JNif5dZLdSf4zySvn4LyV5AXj7HYucFFV/W93zPeS/O1szzaWkeevqkeAC+nNKe3HuGtgkrwL+CTwT8BCYCnwOWDVAMcCes+KgdXAxTP4mPNm6rH6fBlY3c0rPc64ayCSPBv4CHBOVV1RVb+pqt9V1b9X1Xu7fQ5P8skkv+w+PrkvYknekuTaEY/5+LPxJBcl+WySbyZ5KMmPkzy/23ZNd8jN3d8Y/nqUEV8G3F9VO7tjzgNeBXymO+Yz3fqnkuxI8mCSG5K8qm+edUm+muTiJA8Cb0lyXJJrupm+0814cd8xJyX5QZL7k9yc5OQDnb+b7z7gpKn/21CLjLsG5eXAM4ArD7DPB+hFawXwIuBE4IOTOMeZwD8CRwHbgPMAqurV3fYXVdURVXXZKMf+GXD7vjtV9QHg+8Da7pi13abru/mOpvcs+t+SPKPvcVYBXwXmA5d0+1wHPAdYB5y9b8cki4FvAh/tHu89wNeSDB3g/ABb6f3zkR5n3DUozwF+VVV7D7DPm4CPVNWeqhqmF+qzD7D/SFdW1XXdOS6hF+GJmg88NN5OVXVxVd1TVXur6mPA4cAf9+3yw6r6elU9BgwBLwU+VFW/raprgY19+/4NcFVVXVVVj1XV1cBm4LRxxniom1d6nHHXoNwDLBjnOvQxwPa++9u7tYm6q+/2w8ARkzj2PuDI8XZK8p4kW5M8kOR+4NnAgr5ddvTdPga4t6oeHmP784C/6i7J3N893iuBReOMcSRw/3iz6tBi3DUoPwQeAU4/wD6/pBe8fZZ2awC/AZ65b0OS587wfFuAF45Y2+9XqHbX1/8eOAM4qqrmAw8AGeOY3cDRSZ7Zt7ak7/YO4EtVNb/v41lVdf5o5+/zp8DNE/iadAgx7hqIqnoA+BDw2SSnJ3lmkqclOTXJv3S7XQp8sHu9+YJu/30/fLwZOCHJiu4a97pJjnA38EcH2H4dML+7Dj7WMUcCe4FhYF6SDwF/MNYDVtV2epdZ1iV5epKXA6/v2+Vi4PVJTklyWJJnJDk5ybFjzdzNdzTwowN8LToEGXcNTHeN+l30fkg6TO+Z61rg690uH6UXwy3AT4AbuzWq6mf0Xm3zHeAOYL9XzkzAOmBDd/njjFFm+y1wEb3r4Pt8CvjLJPcl+TTwbeBbwM/oXTL6P/a/zDKaN9H7YfI93ddyGb2/wVBVO+j9APb9PPHP47088d/pyPMDvBHY0L3mXXpc/J91SKPr3vn5feDF+97INAvnuAz4aVV9eArHHk7vbzCvrqo9Mz6cDmrGXZpDSV4K3Av8Angtvb+lvLyq/nuQc6k9s/GOOUljey5wBb2Xgu4E/s6wazb4zF2SGuQPVCWpQcZdkhr0lLjmvmDBglq2bNmgx5Ckg8oNN9zwq6oa9ff5PyXivmzZMjZv3jzoMSTpoJJk+1jbvCwjSQ0y7pLUIOMuSQ0y7pLUIOMuSQ0y7pLUIOMuSQ0y7pLUoKfEm5gOFsvO/eagR2jKnef/xaBHaIbfmzOrhe9Nn7lLUoOMuyQ1yLhLUoOMuyQ1yLhLUoOMuyQ1yLhLUoOMuyQ1yLhLUoPGjXuSJUm+m+S2JLcmeUe3vi7JriQ3dR+n9R3zviTbktye5JTZ/AIkSU82kV8/sBd4d1XdmORI4IYkV3fbPlFV/9q/c5LjgTOBE4BjgO8keWFVPTqTg0uSxjbuM/eq2l1VN3a3HwK2AosPcMgq4CtV9UhV/QLYBpw4E8NKkiZmUtfckywDXgz8uFtam2RLkguTHNWtLQZ29B22kwP/YSBJmmETjnuSI4CvAe+sqgeBC4DnAyuA3cDHJnPiJGuSbE6yeXh4eDKHSpLGMaG4J3kavbBfUlVXAFTV3VX1aFU9BnyeJy697AKW9B1+bLe2n6paX1Urq2rl0NDQdL4GSdIIE3m1TIAvAFur6uN964v6dnsDcEt3eyNwZpLDkxwHLAeum7mRJUnjmcirZV4BnA38JMlN3dr7gbOSrAAKuBN4O0BV3ZrkcuA2eq+0OcdXykjS3Bo37lV1LZBRNl11gGPOA86bxlySpGnwHaqS1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNGjfuSZYk+W6S25LcmuQd3frRSa5Ockf3+ahuPUk+nWRbki1JXjLbX4QkaX8Teea+F3h3VR0PnASck+R44FxgU1UtBzZ19wFOBZZ3H2uAC2Z8aknSAY0b96raXVU3drcfArYCi4FVwIZutw3A6d3tVcAXq+dHwPwki2Z6cEnS2CZ1zT3JMuDFwI+BhVW1u9t0F7Cwu70Y2NF32M5uTZI0RyYc9yRHAF8D3llVD/Zvq6oCajInTrImyeYkm4eHhydzqCRpHBOKe5Kn0Qv7JVV1Rbd8977LLd3nPd36LmBJ3+HHdmv7qar1VbWyqlYODQ1NdX5J0igm8mqZAF8AtlbVx/s2bQRWd7dXA9/oW39z96qZk4AH+i7fSJLmwLwJ7PMK4GzgJ0lu6tbeD5wPXJ7kbcB24Ixu21XAacA24GHgrTM5sCRpfOPGvaquBTLG5teMsn8B50xzLknSNPgOVUlqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAaNG/ckFybZk+SWvrV1SXYluan7OK1v2/uSbEtye5JTZmtwSdLYJvLM/SLgdaOsf6KqVnQfVwEkOR44EzihO+ZzSQ6bqWElSRMzbtyr6hrg3gk+3irgK1X1SFX9AtgGnDiN+SRJUzCda+5rk2zpLtsc1a0tBnb07bOzW5MkzaGpxv0C4PnACmA38LHJPkCSNUk2J9k8PDw8xTEkSaOZUtyr6u6qerSqHgM+zxOXXnYBS/p2PbZbG+0x1lfVyqpaOTQ0NJUxJEljmFLckyzqu/sGYN8raTYCZyY5PMlxwHLguumNKEmarHnj7ZDkUuBkYEGSncCHgZOTrAAKuBN4O0BV3ZrkcuA2YC9wTlU9OiuTS5LGNG7cq+qsUZa/cID9zwPOm85QkqTp8R2qktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktSgceOe5MIke5Lc0rd2dJKrk9zRfT6qW0+STyfZlmRLkpfM5vCSpNFN5Jn7RcDrRqydC2yqquXApu4+wKnA8u5jDXDBzIwpSZqMceNeVdcA945YXgVs6G5vAE7vW/9i9fwImJ9k0QzNKkmaoKlec19YVbu723cBC7vbi4Edffvt7NYkSXNo2j9QraoCarLHJVmTZHOSzcPDw9MdQ5LUZ6pxv3vf5Zbu855ufRewpG+/Y7u1J6mq9VW1sqpWDg0NTXEMSdJophr3jcDq7vZq4Bt962/uXjVzEvBA3+UbSdIcmTfeDkkuBU4GFiTZCXwYOB+4PMnbgO3AGd3uVwGnAduAh4G3zsLMkqRxjBv3qjprjE2vGWXfAs6Z7lCSpOnxHaqS1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNmjedg5PcCTwEPArsraqVSY4GLgOWAXcCZ1TVfdMbU5I0GTPxzP3Pq2pFVa3s7p8LbKqq5cCm7r4kaQ7NxmWZVcCG7vYG4PRZOIck6QCmG/cC/ivJDUnWdGsLq2p3d/suYOE0zyFJmqRpXXMHXllVu5L8IXB1kp/2b6yqSlKjHdj9YbAGYOnSpdMcQ5LUb1rP3KtqV/d5D3AlcCJwd5JFAN3nPWMcu76qVlbVyqGhoemMIUkaYcpxT/KsJEfuuw28FrgF2Ais7nZbDXxjukNKkiZnOpdlFgJXJtn3OF+uqm8luR64PMnbgO3AGdMfU5I0GVOOe1X9HHjRKOv3AK+ZzlCSpOnxHaqS1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNmrW4J3ldktuTbEty7mydR5L0ZLMS9ySHAZ8FTgWOB85KcvxsnEuS9GSz9cz9RGBbVf28qn4LfAVYNUvnkiSNMG+WHncxsKPv/k7gZf07JFkDrOnu/jrJ7bM0y6FoAfCrQQ8xnvzzoCfQAPi9ObOeN9aG2Yr7uKpqPbB+UOdvWZLNVbVy0HNII/m9OXdm67LMLmBJ3/1juzVJ0hyYrbhfDyxPclySpwNnAhtn6VySpBFm5bJMVe1Nshb4NnAYcGFV3Tob59KovNylpyq/N+dIqmrQM0iSZpjvUJWkBhl3SWqQcZekBg3sde6aOUn+hN47gBd3S7uAjVW1dXBTSRokn7kf5JL8A71f7xDguu4jwKX+wjY9VSV566BnaJ2vljnIJfkZcEJV/W7E+tOBW6tq+WAmk8aW5H+qaumg52iZl2UOfo8BxwDbR6wv6rZJA5Fky1ibgIVzOcuhyLgf/N4JbEpyB0/8sralwAuAtYMaSqIX8FOA+0asB/jB3I9zaDHuB7mq+laSF9L7Ncv9P1C9vqoeHdxkEv8BHFFVN43ckOR7cz7NIcZr7pLUIF8tI0kNMu6S1CDjLkkNMu6S1CDjLkkN+n9FlP1ETWJfHAAAAABJRU5ErkJggg==\n",
|
267 |
+
"text/plain": [
|
268 |
+
"<Figure size 432x288 with 1 Axes>"
|
269 |
+
]
|
270 |
+
},
|
271 |
+
"metadata": {
|
272 |
+
"needs_background": "light"
|
273 |
+
},
|
274 |
+
"output_type": "display_data"
|
275 |
+
}
|
276 |
+
],
|
277 |
+
"source": [
|
278 |
+
"# join the dataframes containing y=1 and y=0\n",
|
279 |
+
"df_test_under = pd.concat([df_class_0_under, df_class_1])\n",
|
280 |
+
"\n",
|
281 |
+
"print('Random under-sampling:')\n",
|
282 |
+
"print(df_test_under['class'].value_counts())\n",
|
283 |
+
"print(\"Num records = \", df_test_under.shape[0])\n",
|
284 |
+
"\n",
|
285 |
+
"df_test_under['class'].value_counts().plot(kind='bar', title='Count (target)');"
|
286 |
+
]
|
287 |
+
},
|
288 |
+
{
|
289 |
+
"cell_type": "markdown",
|
290 |
+
"metadata": {},
|
291 |
+
"source": [
|
292 |
+
"#### Down Sampling - Majority Class - Using imblearn "
|
293 |
+
]
|
294 |
+
},
|
295 |
+
{
|
296 |
+
"cell_type": "code",
|
297 |
+
"execution_count": 23,
|
298 |
+
"metadata": {},
|
299 |
+
"outputs": [
|
300 |
+
{
|
301 |
+
"name": "stdout",
|
302 |
+
"output_type": "stream",
|
303 |
+
"text": [
|
304 |
+
"imblearn over-sampling:\n",
|
305 |
+
"0 268\n",
|
306 |
+
"1 268\n",
|
307 |
+
"Name: class, dtype: int64\n",
|
308 |
+
"Num records = 536\n"
|
309 |
+
]
|
310 |
+
},
|
311 |
+
{
|
312 |
+
"data": {
|
313 |
+
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEFCAYAAAAYKqc0AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/YYfK9AAAACXBIWXMAAAsTAAALEwEAmpwYAAAPyklEQVR4nO3df6zddX3H8edrVHEKs2DvaimtZVq3QRarqYjxR1hMRFhMMdkY6LAal5qFJhp/bPgj2hlZ2DJ/RiWpkVAFEaag3WQ6bDRI/AGFQQUq0ihdWwu98lvZ0MJ7f5xv4fRyb+/ve+inz0dyc8/5fL/f831fuDx7+r3nXFJVSJLa8nuDHkCSNPOMuyQ1yLhLUoOMuyQ1yLhLUoOMuyQ1yLhLY0gylOSnSX5/0LOMJsnh3XxDg55FTz3GXQOV5I1JNif5dZLdSf4zySvn4LyV5AXj7HYucFFV/W93zPeS/O1szzaWkeevqkeAC+nNKe3HuGtgkrwL+CTwT8BCYCnwOWDVAMcCes+KgdXAxTP4mPNm6rH6fBlY3c0rPc64ayCSPBv4CHBOVV1RVb+pqt9V1b9X1Xu7fQ5P8skkv+w+PrkvYknekuTaEY/5+LPxJBcl+WySbyZ5KMmPkzy/23ZNd8jN3d8Y/nqUEV8G3F9VO7tjzgNeBXymO+Yz3fqnkuxI8mCSG5K8qm+edUm+muTiJA8Cb0lyXJJrupm+0814cd8xJyX5QZL7k9yc5OQDnb+b7z7gpKn/21CLjLsG5eXAM4ArD7DPB+hFawXwIuBE4IOTOMeZwD8CRwHbgPMAqurV3fYXVdURVXXZKMf+GXD7vjtV9QHg+8Da7pi13abru/mOpvcs+t+SPKPvcVYBXwXmA5d0+1wHPAdYB5y9b8cki4FvAh/tHu89wNeSDB3g/ABb6f3zkR5n3DUozwF+VVV7D7DPm4CPVNWeqhqmF+qzD7D/SFdW1XXdOS6hF+GJmg88NN5OVXVxVd1TVXur6mPA4cAf9+3yw6r6elU9BgwBLwU+VFW/raprgY19+/4NcFVVXVVVj1XV1cBm4LRxxniom1d6nHHXoNwDLBjnOvQxwPa++9u7tYm6q+/2w8ARkzj2PuDI8XZK8p4kW5M8kOR+4NnAgr5ddvTdPga4t6oeHmP784C/6i7J3N893iuBReOMcSRw/3iz6tBi3DUoPwQeAU4/wD6/pBe8fZZ2awC/AZ65b0OS587wfFuAF45Y2+9XqHbX1/8eOAM4qqrmAw8AGeOY3cDRSZ7Zt7ak7/YO4EtVNb/v41lVdf5o5+/zp8DNE/iadAgx7hqIqnoA+BDw2SSnJ3lmkqclOTXJv3S7XQp8sHu9+YJu/30/fLwZOCHJiu4a97pJjnA38EcH2H4dML+7Dj7WMUcCe4FhYF6SDwF/MNYDVtV2epdZ1iV5epKXA6/v2+Vi4PVJTklyWJJnJDk5ybFjzdzNdzTwowN8LToEGXcNTHeN+l30fkg6TO+Z61rg690uH6UXwy3AT4AbuzWq6mf0Xm3zHeAOYL9XzkzAOmBDd/njjFFm+y1wEb3r4Pt8CvjLJPcl+TTwbeBbwM/oXTL6P/a/zDKaN9H7YfI93ddyGb2/wVBVO+j9APb9PPHP47088d/pyPMDvBHY0L3mXXpc/J91SKPr3vn5feDF+97INAvnuAz4aVV9eArHHk7vbzCvrqo9Mz6cDmrGXZpDSV4K3Av8Angtvb+lvLyq/nuQc6k9s/GOOUljey5wBb2Xgu4E/s6wazb4zF2SGuQPVCWpQcZdkhr0lLjmvmDBglq2bNmgx5Ckg8oNN9zwq6oa9ff5PyXivmzZMjZv3jzoMSTpoJJk+1jbvCwjSQ0y7pLUIOMuSQ0y7pLUIOMuSQ0y7pLUIOMuSQ0y7pLUoKfEm5gOFsvO/eagR2jKnef/xaBHaIbfmzOrhe9Nn7lLUoOMuyQ1yLhLUoOMuyQ1yLhLUoOMuyQ1yLhLUoOMuyQ1yLhLUoPGjXuSJUm+m+S2JLcmeUe3vi7JriQ3dR+n9R3zviTbktye5JTZ/AIkSU82kV8/sBd4d1XdmORI4IYkV3fbPlFV/9q/c5LjgTOBE4BjgO8keWFVPTqTg0uSxjbuM/eq2l1VN3a3HwK2AosPcMgq4CtV9UhV/QLYBpw4E8NKkiZmUtfckywDXgz8uFtam2RLkguTHNWtLQZ29B22kwP/YSBJmmETjnuSI4CvAe+sqgeBC4DnAyuA3cDHJnPiJGuSbE6yeXh4eDKHSpLGMaG4J3kavbBfUlVXAFTV3VX1aFU9BnyeJy697AKW9B1+bLe2n6paX1Urq2rl0NDQdL4GSdIIE3m1TIAvAFur6uN964v6dnsDcEt3eyNwZpLDkxwHLAeum7mRJUnjmcirZV4BnA38JMlN3dr7gbOSrAAKuBN4O0BV3ZrkcuA2eq+0OcdXykjS3Bo37lV1LZBRNl11gGPOA86bxlySpGnwHaqS1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNGjfuSZYk+W6S25LcmuQd3frRSa5Ockf3+ahuPUk+nWRbki1JXjLbX4QkaX8Teea+F3h3VR0PnASck+R44FxgU1UtBzZ19wFOBZZ3H2uAC2Z8aknSAY0b96raXVU3drcfArYCi4FVwIZutw3A6d3tVcAXq+dHwPwki2Z6cEnS2CZ1zT3JMuDFwI+BhVW1u9t0F7Cwu70Y2NF32M5uTZI0RyYc9yRHAF8D3llVD/Zvq6oCajInTrImyeYkm4eHhydzqCRpHBOKe5Kn0Qv7JVV1Rbd8977LLd3nPd36LmBJ3+HHdmv7qar1VbWyqlYODQ1NdX5J0igm8mqZAF8AtlbVx/s2bQRWd7dXA9/oW39z96qZk4AH+i7fSJLmwLwJ7PMK4GzgJ0lu6tbeD5wPXJ7kbcB24Ixu21XAacA24GHgrTM5sCRpfOPGvaquBTLG5teMsn8B50xzLknSNPgOVUlqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAaNG/ckFybZk+SWvrV1SXYluan7OK1v2/uSbEtye5JTZmtwSdLYJvLM/SLgdaOsf6KqVnQfVwEkOR44EzihO+ZzSQ6bqWElSRMzbtyr6hrg3gk+3irgK1X1SFX9AtgGnDiN+SRJUzCda+5rk2zpLtsc1a0tBnb07bOzW5MkzaGpxv0C4PnACmA38LHJPkCSNUk2J9k8PDw8xTEkSaOZUtyr6u6qerSqHgM+zxOXXnYBS/p2PbZbG+0x1lfVyqpaOTQ0NJUxJEljmFLckyzqu/sGYN8raTYCZyY5PMlxwHLguumNKEmarHnj7ZDkUuBkYEGSncCHgZOTrAAKuBN4O0BV3ZrkcuA2YC9wTlU9OiuTS5LGNG7cq+qsUZa/cID9zwPOm85QkqTp8R2qktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktSgceOe5MIke5Lc0rd2dJKrk9zRfT6qW0+STyfZlmRLkpfM5vCSpNFN5Jn7RcDrRqydC2yqquXApu4+wKnA8u5jDXDBzIwpSZqMceNeVdcA945YXgVs6G5vAE7vW/9i9fwImJ9k0QzNKkmaoKlec19YVbu723cBC7vbi4Edffvt7NYkSXNo2j9QraoCarLHJVmTZHOSzcPDw9MdQ5LUZ6pxv3vf5Zbu855ufRewpG+/Y7u1J6mq9VW1sqpWDg0NTXEMSdJophr3jcDq7vZq4Bt962/uXjVzEvBA3+UbSdIcmTfeDkkuBU4GFiTZCXwYOB+4PMnbgO3AGd3uVwGnAduAh4G3zsLMkqRxjBv3qjprjE2vGWXfAs6Z7lCSpOnxHaqS1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNmjedg5PcCTwEPArsraqVSY4GLgOWAXcCZ1TVfdMbU5I0GTPxzP3Pq2pFVa3s7p8LbKqq5cCm7r4kaQ7NxmWZVcCG7vYG4PRZOIck6QCmG/cC/ivJDUnWdGsLq2p3d/suYOE0zyFJmqRpXXMHXllVu5L8IXB1kp/2b6yqSlKjHdj9YbAGYOnSpdMcQ5LUb1rP3KtqV/d5D3AlcCJwd5JFAN3nPWMcu76qVlbVyqGhoemMIUkaYcpxT/KsJEfuuw28FrgF2Ais7nZbDXxjukNKkiZnOpdlFgJXJtn3OF+uqm8luR64PMnbgO3AGdMfU5I0GVOOe1X9HHjRKOv3AK+ZzlCSpOnxHaqS1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNmrW4J3ldktuTbEty7mydR5L0ZLMS9ySHAZ8FTgWOB85KcvxsnEuS9GSz9cz9RGBbVf28qn4LfAVYNUvnkiSNMG+WHncxsKPv/k7gZf07JFkDrOnu/jrJ7bM0y6FoAfCrQQ8xnvzzoCfQAPi9ObOeN9aG2Yr7uKpqPbB+UOdvWZLNVbVy0HNII/m9OXdm67LMLmBJ3/1juzVJ0hyYrbhfDyxPclySpwNnAhtn6VySpBFm5bJMVe1Nshb4NnAYcGFV3Tob59KovNylpyq/N+dIqmrQM0iSZpjvUJWkBhl3SWqQcZekBg3sde6aOUn+hN47gBd3S7uAjVW1dXBTSRokn7kf5JL8A71f7xDguu4jwKX+wjY9VSV566BnaJ2vljnIJfkZcEJV/W7E+tOBW6tq+WAmk8aW5H+qaumg52iZl2UOfo8BxwDbR6wv6rZJA5Fky1ibgIVzOcuhyLgf/N4JbEpyB0/8sralwAuAtYMaSqIX8FOA+0asB/jB3I9zaDHuB7mq+laSF9L7Ncv9P1C9vqoeHdxkEv8BHFFVN43ckOR7cz7NIcZr7pLUIF8tI0kNMu6S1CDjLkkNMu6S1CDjLkkN+n9FlP1ETWJfHAAAAABJRU5ErkJggg==\n",
|
314 |
+
"text/plain": [
|
315 |
+
"<Figure size 432x288 with 1 Axes>"
|
316 |
+
]
|
317 |
+
},
|
318 |
+
"metadata": {
|
319 |
+
"needs_background": "light"
|
320 |
+
},
|
321 |
+
"output_type": "display_data"
|
322 |
+
}
|
323 |
+
],
|
324 |
+
"source": [
|
325 |
+
"from imblearn.under_sampling import RandomUnderSampler\n",
|
326 |
+
"\n",
|
327 |
+
"#separate the data in descriptive and target attributes\n",
|
328 |
+
"X = data.drop('class', axis=1)\n",
|
329 |
+
"Y = data['class']\n",
|
330 |
+
"\n",
|
331 |
+
"rus = RandomUnderSampler(random_state=42, replacement=True)\n",
|
332 |
+
"X_rus, Y_rus = rus.fit_resample(X, Y)\n",
|
333 |
+
"\n",
|
334 |
+
"df_rus = pd.concat([pd.DataFrame(X_rus), pd.DataFrame(Y_rus, columns=['class'])], axis=1)\n",
|
335 |
+
"\n",
|
336 |
+
"print('imblearn over-sampling:')\n",
|
337 |
+
"print(df_rus['class'].value_counts())\n",
|
338 |
+
"print(\"Num records = \", df_rus.shape[0])\n",
|
339 |
+
"\n",
|
340 |
+
"df_rus['class'].value_counts().plot(kind='bar', title='Count (target)');"
|
341 |
+
]
|
342 |
+
},
|
343 |
+
{
|
344 |
+
"cell_type": "code",
|
345 |
+
"execution_count": 24,
|
346 |
+
"metadata": {},
|
347 |
+
"outputs": [],
|
348 |
+
"source": [
|
349 |
+
"# we should have the same/similar results as previous. Although the selection of records could be different"
|
350 |
+
]
|
351 |
+
},
|
352 |
+
{
|
353 |
+
"cell_type": "markdown",
|
354 |
+
"metadata": {},
|
355 |
+
"source": [
|
356 |
+
" #### Down/Under sampling the majority class y=1 using Sci-Kit Learn"
|
357 |
+
]
|
358 |
+
},
|
359 |
+
{
|
360 |
+
"cell_type": "code",
|
361 |
+
"execution_count": 25,
|
362 |
+
"metadata": {},
|
363 |
+
"outputs": [
|
364 |
+
{
|
365 |
+
"name": "stdout",
|
366 |
+
"output_type": "stream",
|
367 |
+
"text": [
|
368 |
+
"Original Data distribution\n",
|
369 |
+
"0 500\n",
|
370 |
+
"1 268\n",
|
371 |
+
"Name: class, dtype: int64\n",
|
372 |
+
"Sci-Kit Learn : resample : Down Sampled data set\n",
|
373 |
+
"0 268\n",
|
374 |
+
"1 268\n",
|
375 |
+
"Name: class, dtype: int64\n",
|
376 |
+
"Num records = 536\n"
|
377 |
+
]
|
378 |
+
},
|
379 |
+
{
|
380 |
+
"data": {
|
381 |
+
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEFCAYAAAAYKqc0AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/YYfK9AAAACXBIWXMAAAsTAAALEwEAmpwYAAAPyklEQVR4nO3df6zddX3H8edrVHEKs2DvaimtZVq3QRarqYjxR1hMRFhMMdkY6LAal5qFJhp/bPgj2hlZ2DJ/RiWpkVAFEaag3WQ6bDRI/AGFQQUq0ihdWwu98lvZ0MJ7f5xv4fRyb+/ve+inz0dyc8/5fL/f831fuDx7+r3nXFJVSJLa8nuDHkCSNPOMuyQ1yLhLUoOMuyQ1yLhLUoOMuyQ1yLhLY0gylOSnSX5/0LOMJsnh3XxDg55FTz3GXQOV5I1JNif5dZLdSf4zySvn4LyV5AXj7HYucFFV/W93zPeS/O1szzaWkeevqkeAC+nNKe3HuGtgkrwL+CTwT8BCYCnwOWDVAMcCes+KgdXAxTP4mPNm6rH6fBlY3c0rPc64ayCSPBv4CHBOVV1RVb+pqt9V1b9X1Xu7fQ5P8skkv+w+PrkvYknekuTaEY/5+LPxJBcl+WySbyZ5KMmPkzy/23ZNd8jN3d8Y/nqUEV8G3F9VO7tjzgNeBXymO+Yz3fqnkuxI8mCSG5K8qm+edUm+muTiJA8Cb0lyXJJrupm+0814cd8xJyX5QZL7k9yc5OQDnb+b7z7gpKn/21CLjLsG5eXAM4ArD7DPB+hFawXwIuBE4IOTOMeZwD8CRwHbgPMAqurV3fYXVdURVXXZKMf+GXD7vjtV9QHg+8Da7pi13abru/mOpvcs+t+SPKPvcVYBXwXmA5d0+1wHPAdYB5y9b8cki4FvAh/tHu89wNeSDB3g/ABb6f3zkR5n3DUozwF+VVV7D7DPm4CPVNWeqhqmF+qzD7D/SFdW1XXdOS6hF+GJmg88NN5OVXVxVd1TVXur6mPA4cAf9+3yw6r6elU9BgwBLwU+VFW/raprgY19+/4NcFVVXVVVj1XV1cBm4LRxxniom1d6nHHXoNwDLBjnOvQxwPa++9u7tYm6q+/2w8ARkzj2PuDI8XZK8p4kW5M8kOR+4NnAgr5ddvTdPga4t6oeHmP784C/6i7J3N893iuBReOMcSRw/3iz6tBi3DUoPwQeAU4/wD6/pBe8fZZ2awC/AZ65b0OS587wfFuAF45Y2+9XqHbX1/8eOAM4qqrmAw8AGeOY3cDRSZ7Zt7ak7/YO4EtVNb/v41lVdf5o5+/zp8DNE/iadAgx7hqIqnoA+BDw2SSnJ3lmkqclOTXJv3S7XQp8sHu9+YJu/30/fLwZOCHJiu4a97pJjnA38EcH2H4dML+7Dj7WMUcCe4FhYF6SDwF/MNYDVtV2epdZ1iV5epKXA6/v2+Vi4PVJTklyWJJnJDk5ybFjzdzNdzTwowN8LToEGXcNTHeN+l30fkg6TO+Z61rg690uH6UXwy3AT4AbuzWq6mf0Xm3zHeAOYL9XzkzAOmBDd/njjFFm+y1wEb3r4Pt8CvjLJPcl+TTwbeBbwM/oXTL6P/a/zDKaN9H7YfI93ddyGb2/wVBVO+j9APb9PPHP47088d/pyPMDvBHY0L3mXXpc/J91SKPr3vn5feDF+97INAvnuAz4aVV9eArHHk7vbzCvrqo9Mz6cDmrGXZpDSV4K3Av8Angtvb+lvLyq/nuQc6k9s/GOOUljey5wBb2Xgu4E/s6wazb4zF2SGuQPVCWpQcZdkhr0lLjmvmDBglq2bNmgx5Ckg8oNN9zwq6oa9ff5PyXivmzZMjZv3jzoMSTpoJJk+1jbvCwjSQ0y7pLUIOMuSQ0y7pLUIOMuSQ0y7pLUIOMuSQ0y7pLUoKfEm5gOFsvO/eagR2jKnef/xaBHaIbfmzOrhe9Nn7lLUoOMuyQ1yLhLUoOMuyQ1yLhLUoOMuyQ1yLhLUoOMuyQ1yLhLUoPGjXuSJUm+m+S2JLcmeUe3vi7JriQ3dR+n9R3zviTbktye5JTZ/AIkSU82kV8/sBd4d1XdmORI4IYkV3fbPlFV/9q/c5LjgTOBE4BjgO8keWFVPTqTg0uSxjbuM/eq2l1VN3a3HwK2AosPcMgq4CtV9UhV/QLYBpw4E8NKkiZmUtfckywDXgz8uFtam2RLkguTHNWtLQZ29B22kwP/YSBJmmETjnuSI4CvAe+sqgeBC4DnAyuA3cDHJnPiJGuSbE6yeXh4eDKHSpLGMaG4J3kavbBfUlVXAFTV3VX1aFU9BnyeJy697AKW9B1+bLe2n6paX1Urq2rl0NDQdL4GSdIIE3m1TIAvAFur6uN964v6dnsDcEt3eyNwZpLDkxwHLAeum7mRJUnjmcirZV4BnA38JMlN3dr7gbOSrAAKuBN4O0BV3ZrkcuA2eq+0OcdXykjS3Bo37lV1LZBRNl11gGPOA86bxlySpGnwHaqS1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNGjfuSZYk+W6S25LcmuQd3frRSa5Ockf3+ahuPUk+nWRbki1JXjLbX4QkaX8Teea+F3h3VR0PnASck+R44FxgU1UtBzZ19wFOBZZ3H2uAC2Z8aknSAY0b96raXVU3drcfArYCi4FVwIZutw3A6d3tVcAXq+dHwPwki2Z6cEnS2CZ1zT3JMuDFwI+BhVW1u9t0F7Cwu70Y2NF32M5uTZI0RyYc9yRHAF8D3llVD/Zvq6oCajInTrImyeYkm4eHhydzqCRpHBOKe5Kn0Qv7JVV1Rbd8977LLd3nPd36LmBJ3+HHdmv7qar1VbWyqlYODQ1NdX5J0igm8mqZAF8AtlbVx/s2bQRWd7dXA9/oW39z96qZk4AH+i7fSJLmwLwJ7PMK4GzgJ0lu6tbeD5wPXJ7kbcB24Ixu21XAacA24GHgrTM5sCRpfOPGvaquBTLG5teMsn8B50xzLknSNPgOVUlqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAYZd0lqkHGXpAaNG/ckFybZk+SWvrV1SXYluan7OK1v2/uSbEtye5JTZmtwSdLYJvLM/SLgdaOsf6KqVnQfVwEkOR44EzihO+ZzSQ6bqWElSRMzbtyr6hrg3gk+3irgK1X1SFX9AtgGnDiN+SRJUzCda+5rk2zpLtsc1a0tBnb07bOzW5MkzaGpxv0C4PnACmA38LHJPkCSNUk2J9k8PDw8xTEkSaOZUtyr6u6qerSqHgM+zxOXXnYBS/p2PbZbG+0x1lfVyqpaOTQ0NJUxJEljmFLckyzqu/sGYN8raTYCZyY5PMlxwHLguumNKEmarHnj7ZDkUuBkYEGSncCHgZOTrAAKuBN4O0BV3ZrkcuA2YC9wTlU9OiuTS5LGNG7cq+qsUZa/cID9zwPOm85QkqTp8R2qktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktQg4y5JDTLuktSgceOe5MIke5Lc0rd2dJKrk9zRfT6qW0+STyfZlmRLkpfM5vCSpNFN5Jn7RcDrRqydC2yqquXApu4+wKnA8u5jDXDBzIwpSZqMceNeVdcA945YXgVs6G5vAE7vW/9i9fwImJ9k0QzNKkmaoKlec19YVbu723cBC7vbi4Edffvt7NYkSXNo2j9QraoCarLHJVmTZHOSzcPDw9MdQ5LUZ6pxv3vf5Zbu855ufRewpG+/Y7u1J6mq9VW1sqpWDg0NTXEMSdJophr3jcDq7vZq4Bt962/uXjVzEvBA3+UbSdIcmTfeDkkuBU4GFiTZCXwYOB+4PMnbgO3AGd3uVwGnAduAh4G3zsLMkqRxjBv3qjprjE2vGWXfAs6Z7lCSpOnxHaqS1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNmjedg5PcCTwEPArsraqVSY4GLgOWAXcCZ1TVfdMbU5I0GTPxzP3Pq2pFVa3s7p8LbKqq5cCm7r4kaQ7NxmWZVcCG7vYG4PRZOIck6QCmG/cC/ivJDUnWdGsLq2p3d/suYOE0zyFJmqRpXXMHXllVu5L8IXB1kp/2b6yqSlKjHdj9YbAGYOnSpdMcQ5LUb1rP3KtqV/d5D3AlcCJwd5JFAN3nPWMcu76qVlbVyqGhoemMIUkaYcpxT/KsJEfuuw28FrgF2Ais7nZbDXxjukNKkiZnOpdlFgJXJtn3OF+uqm8luR64PMnbgO3AGdMfU5I0GVOOe1X9HHjRKOv3AK+ZzlCSpOnxHaqS1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNMu6S1CDjLkkNmrW4J3ldktuTbEty7mydR5L0ZLMS9ySHAZ8FTgWOB85KcvxsnEuS9GSz9cz9RGBbVf28qn4LfAVYNUvnkiSNMG+WHncxsKPv/k7gZf07JFkDrOnu/jrJ7bM0y6FoAfCrQQ8xnvzzoCfQAPi9ObOeN9aG2Yr7uKpqPbB+UOdvWZLNVbVy0HNII/m9OXdm67LMLmBJ3/1juzVJ0hyYrbhfDyxPclySpwNnAhtn6VySpBFm5bJMVe1Nshb4NnAYcGFV3Tob59KovNylpyq/N+dIqmrQM0iSZpjvUJWkBhl3SWqQcZekBg3sde6aOUn+hN47gBd3S7uAjVW1dXBTSRokn7kf5JL8A71f7xDguu4jwKX+wjY9VSV566BnaJ2vljnIJfkZcEJV/W7E+tOBW6tq+WAmk8aW5H+qaumg52iZl2UOfo8BxwDbR6wv6rZJA5Fky1ibgIVzOcuhyLgf/N4JbEpyB0/8sralwAuAtYMaSqIX8FOA+0asB/jB3I9zaDHuB7mq+laSF9L7Ncv9P1C9vqoeHdxkEv8BHFFVN43ckOR7cz7NIcZr7pLUIF8tI0kNMu6S1CDjLkkNMu6S1CDjLkkN+n9FlP1ETWJfHAAAAABJRU5ErkJggg==\n",
|
382 |
+
"text/plain": [
|
383 |
+
"<Figure size 432x288 with 1 Axes>"
|
384 |
+
]
|
385 |
+
},
|
386 |
+
"metadata": {
|
387 |
+
"needs_background": "light"
|
388 |
+
},
|
389 |
+
"output_type": "display_data"
|
390 |
+
}
|
391 |
+
],
|
392 |
+
"source": [
|
393 |
+
"from sklearn.utils import resample\n",
|
394 |
+
"\n",
|
395 |
+
"print(\"Original Data distribution\")\n",
|
396 |
+
"print(data['class'].value_counts())\n",
|
397 |
+
"\n",
|
398 |
+
"# Down Sample Majority class\n",
|
399 |
+
"down_sample = resample(data[data['class']==0],\n",
|
400 |
+
" replace = True, # sample with replacement\n",
|
401 |
+
" n_samples = data[data['class']==1].shape[0], # to match minority class\n",
|
402 |
+
" random_state=42) # reproducible results\n",
|
403 |
+
"\n",
|
404 |
+
"# Combine majority class with upsampled minority class\n",
|
405 |
+
"train_downsample = pd.concat([data[data['class']==1], down_sample])\n",
|
406 |
+
"\n",
|
407 |
+
"# Display new class counts\n",
|
408 |
+
"print('Sci-Kit Learn : resample : Down Sampled data set')\n",
|
409 |
+
"print(train_downsample['class'].value_counts())\n",
|
410 |
+
"print(\"Num records = \", train_downsample.shape[0])\n",
|
411 |
+
"train_downsample['class'].value_counts().plot(kind='bar', title='Count (target)');"
|
412 |
+
]
|
413 |
+
},
|
414 |
+
{
|
415 |
+
"cell_type": "markdown",
|
416 |
+
"metadata": {},
|
417 |
+
"source": [
|
418 |
+
"#### Over sampling the minority call y=0 (using random sampling)"
|
419 |
+
]
|
420 |
+
},
|
421 |
+
{
|
422 |
+
"cell_type": "code",
|
423 |
+
"execution_count": 28,
|
424 |
+
"metadata": {},
|
425 |
+
"outputs": [
|
426 |
+
{
|
427 |
+
"name": "stdout",
|
428 |
+
"output_type": "stream",
|
429 |
+
"text": [
|
430 |
+
"Random over-sampling:\n",
|
431 |
+
"0 500\n",
|
432 |
+
"1 500\n",
|
433 |
+
"Name: class, dtype: int64\n"
|
434 |
+
]
|
435 |
+
},
|
436 |
+
{
|
437 |
+
"data": {
|
438 |
+
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEFCAYAAAAYKqc0AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/YYfK9AAAACXBIWXMAAAsTAAALEwEAmpwYAAAQKklEQVR4nO3df6zddX3H8edrVECFUaB3FVuwbNQ5zAKaihB/xEk2BefKH8pQp5WwNFkg0TB/MDWKRhdcsoFmzqwZhir+gKFIp0yHKFGj/CgKKKLSMVlbgVZoK8r8gbz3x/kUTq/39t62995DP30+kpPz+X4+n+/5vk97++r3fs73nJOqQpLUl98ZdQGSpJlnuEtShwx3SeqQ4S5JHTLcJalDhrskdchwlyaRZCzJ95M8cdS1TCTJAa2+sVHXoscfw10jleTVSdYm+VmSe5L8Z5Lnz8FxK8kxU0w7D7ikqv6v7XNdkr+e7domM/74VfVL4CMM6pR2YLhrZJKcC1wE/D2wEDgK+Bdg+QjLAgZnxcAK4NIZfMx5M/VYQz4BrGj1So8y3DUSSQ4B3gOcXVWfqaqfV9Wvq+o/qurNbc4BSS5K8uN2u2h7iCV5fZKvj3vMR8/Gk1yS5ENJPp/kwSQ3JPmDNvbVtsut7TeGv5ygxOcCW6tqQ9vnfcALgH9u+/xz6/9AkvVJfprk5iQvGKrn/CRXJLk0yU+B1yc5OslXW01fajVeOrTPiUm+kWRrkluTvGhnx2/1bQFO3P2/DfXIcNeonAQcCFy5kzlvZxBaxwPHAScA79iFY5wBvBs4FFgHvA+gql7Yxo+rqoOq6rIJ9v1j4AfbN6rq7cDXgHPaPue0oZtafYcxOIv+9yQHDj3OcuAKYD7w8TbnRuBw4HzgtdsnJlkEfB54b3u8NwGfTjK2k+MD3MHgz0d6lOGuUTkc+ElVPbyTOa8B3lNVm6pqM4Ogfu1O5o93ZVXd2I7xcQYhPF3zgQenmlRVl1bV/VX1cFX9I3AA8IdDU75ZVZ+tqkeAMeA5wDur6ldV9XVgzdDcvwKurqqrq+qRqroGWAucOkUZD7Z6pUcZ7hqV+4EFU6xDPxW4e2j77tY3XfcOtR8CDtqFfbcAB081KcmbktyRZFuSrcAhwIKhKeuH2k8FHqiqhyYZfxrwyrYks7U93vOBI6Yo42Bg61S1at9iuGtUvgn8EjhtJ3N+zCDwtjuq9QH8HHjS9oEkT5nh+m4Dnj6ub4ePUG3r628BTgcOrar5wDYgk+xzD3BYkicN9R051F4PfKyq5g/dnlxVF0x0/CF/BNw6jeekfYjhrpGoqm3AO4EPJTktyZOSPCHJKUn+oU37JPCOdr35gjZ/+4uPtwLPTHJ8W+M+fxdLuA/4/Z2M3wjMb+vgk+1zMPAwsBmYl+SdwO9O9oBVdTeDZZbzk+yf5CTg5UNTLgVenuQlSfZLcmCSFyVZPFnNrb7DgOt38ly0DzLcNTJtjfpcBi+SbmZw5noO8Nk25b0MwvA24DvAt1ofVfVDBlfbfAm4E9jhyplpOB9Y3ZY/Tp+gtl8BlzBYB9/uA8ArkmxJ8kHgi8AXgB8yWDL6BTsus0zkNQxeTL6/PZfLGPwGQ1WtZ/AC7Nt47M/jzTz273T88QFeDaxu17xLj4pf1iFNrL3z82vAs7a/kWkWjnEZ8P2qetdu7HsAg99gXlhVm2a8OO3VDHdpDiV5DvAA8D/AnzH4LeWkqvr2KOtSf2bjHXOSJvcU4DMMLgXdAPyNwa7Z4Jm7JHXIF1QlqUOGuyR16HGx5r5gwYJasmTJqMuQpL3KzTff/JOqmvDz/B8X4b5kyRLWrl076jIkaa+S5O7JxlyWkaQOGe6S1CHDXZI6ZLhLUocMd0nq0LTCPcmPknwnyS1J1ra+w5Jck+TOdn9o60+SDyZZl+S2JM+ezScgSfptu3Lm/idVdXxVLWvb5wHXVtVS4Nq2DXAKsLTdVgIfnqliJUnTsyfLMsuB1a29mse+UWc58NEauJ7BFx5M9TVhkqQZNN03MRXwX0kK+NeqWgUsrKp72vi9wMLWXsSOX1iwofXdM9RHkpUMzuw56qijdq/6ObbkvM+PuoSu/OiCl426hG74szmzevjZnG64P7+qNib5PeCaJN8fHqyqasE/be0/iFUAy5Yt86MpJWkGTWtZpqo2tvtNwJXACcB925db2v32b4LZyI5f+ru49UmS5siU4Z7kyUkO3t5m8O0x3wXWACvatBXAVa29Bnhdu2rmRGDb0PKNJGkOTGdZZiFwZZLt8z9RVV9IchNweZKzGHw58PYvGb4aOBVYBzwEnDnjVUuSdmrKcK+qu4DjJui/Hzh5gv4Czp6R6iRJu8V3qEpShwx3SeqQ4S5JHTLcJalDhrskdchwl6QOGe6S1CHDXZI6ZLhLUocMd0nqkOEuSR0y3CWpQ4a7JHXIcJekDhnuktQhw12SOmS4S1KHDHdJ6pDhLkkdMtwlqUOGuyR1yHCXpA4Z7pLUIcNdkjpkuEtShwx3SeqQ4S5JHTLcJalDhrskdchwl6QOGe6S1CHDXZI6NO1wT7Jfkm8n+VzbPjrJDUnWJbksyf6t/4C2va6NL5ml2iVJk9iVM/c3AHcMbb8fuLCqjgG2AGe1/rOALa3/wjZPkjSHphXuSRYDLwP+rW0HeDFwRZuyGjittZe3bdr4yW2+JGmOTPfM/SLgLcAjbftwYGtVPdy2NwCLWnsRsB6gjW9r8yVJc2TKcE/y58Cmqrp5Jg+cZGWStUnWbt68eSYfWpL2edM5c38e8BdJfgR8isFyzAeA+UnmtTmLgY2tvRE4EqCNHwLcP/5Bq2pVVS2rqmVjY2N79CQkSTuaMtyr6u+qanFVLQHOAL5cVa8BvgK8ok1bAVzV2mvaNm38y1VVM1q1JGmn9uQ697cC5yZZx2BN/eLWfzFweOs/Fzhvz0qUJO2qeVNPeUxVXQdc19p3ASdMMOcXwCtnoDZJ0m7yHaqS1CHDXZI6ZLhLUocMd0nqkOEuSR0y3CWpQ4a7JHXIcJekDhnuktQhw12SOmS4S1KHDHdJ6pDhLkkdMtwlqUOGuyR1yHCXpA4Z7pLUIcNdkjpkuEtShwx3SeqQ4S5JHTLcJalDhrskdchwl6QOGe6S1CHDXZI6ZLhLUocMd0nqkOEuSR0y3CWpQ4a7JHXIcJekDk0Z7kkOTHJjkluT3J7k3a3/6CQ3JFmX5LIk+7f+A9r2uja+ZJafgyRpnOmcuf8SeHFVHQccD7w0yYnA+4ELq+oYYAtwVpt/FrCl9V/Y5kmS5tCU4V4DP2ubT2i3Al4MXNH6VwOntfbytk0bPzlJZqpgSdLUprXmnmS/JLcAm4BrgP8GtlbVw23KBmBRay8C1gO08W3A4TNYsyRpCtMK96r6TVUdDywGTgCesacHTrIyydokazdv3rynDydJGrJLV8tU1VbgK8BJwPwk89rQYmBja28EjgRo44cA90/wWKuqallVLRsbG9u96iVJE5rO1TJjSea39hOBPwXuYBDyr2jTVgBXtfaatk0b/3JV1QzWLEmawrypp3AEsDrJfgz+M7i8qj6X5HvAp5K8F/g2cHGbfzHwsSTrgAeAM2ahbknSTkwZ7lV1G/CsCfrvYrD+Pr7/F8ArZ6Q6SdJu8R2qktQhw12SOmS4S1KHDHdJ6pDhLkkdMtwlqUOGuyR1yHCXpA4Z7pLUIcNdkjpkuEtShwx3SeqQ4S5JHTLcJalDhrskdchwl6QOGe6S1CHDXZI6ZLhLUocMd0nqkOEuSR0y3CWpQ4a7JHXIcJekDhnuktQhw12SOmS4S1KHDHdJ6pDhLkkdMtwlqUOGuyR1yHCXpA4Z7pLUoSnDPcmRSb6S5HtJbk/yhtZ/WJJrktzZ7g9t/UnywSTrktyW5Nmz/SQkSTuazpn7w8DfVtWxwInA2UmOBc4Drq2qpcC1bRvgFGBpu60EPjzjVUuSdmrKcK+qe6rqW639IHAHsAhYDqxu01YDp7X2cuCjNXA9MD/JETNduCRpcru05p5kCfAs4AZgYVXd04buBRa29iJg/dBuG1qfJGmOTDvckxwEfBp4Y1X9dHisqgqoXTlwkpVJ1iZZu3nz5l3ZVZI0hWmFe5InMAj2j1fVZ1r3fduXW9r9pta/EThyaPfFrW8HVbWqqpZV1bKxsbHdrV+SNIHpXC0T4GLgjqr6p6GhNcCK1l4BXDXU/7p21cyJwLah5RtJ0hyYN405zwNeC3wnyS2t723ABcDlSc4C7gZOb2NXA6cC64CHgDNnsmBJ0tSmDPeq+jqQSYZPnmB+AWfvYV2SpD3gO1QlqUOGuyR1yHCXpA4Z7pLUIcNdkjpkuEtShwx3SeqQ4S5JHTLcJalDhrskdchwl6QOGe6S1CHDXZI6ZLhLUocMd0nqkOEuSR0y3CWpQ4a7JHXIcJekDhnuktQhw12SOmS4S1KHDHdJ6pDhLkkdMtwlqUOGuyR1yHCXpA4Z7pLUIcNdkjpkuEtShwx3SeqQ4S5JHZoy3JN8JMmmJN8d6jssyTVJ7mz3h7b+JPlgknVJbkvy7NksXpI0semcuV8CvHRc33nAtVW1FLi2bQOcAixtt5XAh2emTEnSrpgy3Kvqq8AD47qXA6tbezVw2lD/R2vgemB+kiNmqFZJ0jTt7pr7wqq6p7XvBRa29iJg/dC8Da1PkjSH9vgF1aoqoHZ1vyQrk6xNsnbz5s17WoYkacjuhvt925db2v2m1r8ROHJo3uLW91uqalVVLauqZWNjY7tZhiRpIrsb7muAFa29ArhqqP917aqZE4FtQ8s3kqQ5Mm+qCUk+CbwIWJBkA/Au4ALg8iRnAXcDp7fpVwOnAuuAh4AzZ6FmSdIUpgz3qnrVJEMnTzC3gLP3tChJ0p7xHaqS1CHDXZI6ZLhLUocMd0nqkOEuSR0y3CWpQ4a7JHXIcJekDhnuktQhw12SOmS4S1KHDHdJ6pDhLkkdMtwlqUOGuyR1yHCXpA4Z7pLUIcNdkjpkuEtShwx3SeqQ4S5JHTLcJalDhrskdchwl6QOGe6S1CHDXZI6ZLhLUocMd0nqkOEuSR0y3CWpQ4a7JHXIcJekDhnuktShWQn3JC9N8oMk65KcNxvHkCRNbsbDPcl+wIeAU4BjgVclOXamjyNJmtxsnLmfAKyrqruq6lfAp4Dls3AcSdIk5s3CYy4C1g9tbwCeO35SkpXAyrb5syQ/mIVa9lULgJ+Muoip5P2jrkAj4M/mzHraZAOzEe7TUlWrgFWjOn7PkqytqmWjrkMaz5/NuTMbyzIbgSOHthe3PknSHJmNcL8JWJrk6CT7A2cAa2bhOJKkScz4skxVPZzkHOCLwH7AR6rq9pk+jnbK5S49XvmzOUdSVaOuQZI0w3yHqiR1yHCXpA4Z7pLUoZFd566ZkeQZDN4BvKh1bQTWVNUdo6tK0qh55r4XS/JWBh/vEODGdgvwST+wTY9nSc4cdQ2982qZvViSHwLPrKpfj+vfH7i9qpaOpjJp55L8b1UdNeo6euayzN7tEeCpwN3j+o9oY9LIJLltsiFg4VzWsi8y3PdubwSuTXInj31Y21HAMcA5oypKahYCLwG2jOsP8I25L2ffYrjvxarqC0mezuBjlodfUL2pqn4zusokAD4HHFRVt4wfSHLdnFezj3HNXZI65NUyktQhw12SOmS4S1KHDHdJ6pDhLkkd+n+rPQ6LBFTagQAAAABJRU5ErkJggg==\n",
|
439 |
+
"text/plain": [
|
440 |
+
"<Figure size 432x288 with 1 Axes>"
|
441 |
+
]
|
442 |
+
},
|
443 |
+
"metadata": {
|
444 |
+
"needs_background": "light"
|
445 |
+
},
|
446 |
+
"output_type": "display_data"
|
447 |
+
}
|
448 |
+
],
|
449 |
+
"source": [
|
450 |
+
"df_class_1_over = df_class_1.sample(count_class_0, replace=True)\n",
|
451 |
+
"\n",
|
452 |
+
"df_test_over = pd.concat([df_class_0, df_class_1_over], axis=0)\n",
|
453 |
+
"\n",
|
454 |
+
"print('Random over-sampling:')\n",
|
455 |
+
"print(df_test_over['class'].value_counts())\n",
|
456 |
+
"\n",
|
457 |
+
"df_test_over['class'].value_counts().plot(kind='bar', title='Count (target)');"
|
458 |
+
]
|
459 |
+
},
|
460 |
+
{
|
461 |
+
"cell_type": "markdown",
|
462 |
+
"metadata": {},
|
463 |
+
"source": [
|
464 |
+
"#### Over sampling the minority call y=0 using SMOTE"
|
465 |
+
]
|
466 |
+
},
|
467 |
+
{
|
468 |
+
"cell_type": "code",
|
469 |
+
"execution_count": 31,
|
470 |
+
"metadata": {},
|
471 |
+
"outputs": [
|
472 |
+
{
|
473 |
+
"name": "stdout",
|
474 |
+
"output_type": "stream",
|
475 |
+
"text": [
|
476 |
+
"0 500\n",
|
477 |
+
"1 268\n",
|
478 |
+
"Name: class, dtype: int64\n",
|
479 |
+
"SMOTE over-sampling:\n",
|
480 |
+
"0 500\n",
|
481 |
+
"1 500\n",
|
482 |
+
"Name: class, dtype: int64\n"
|
483 |
+
]
|
484 |
+
},
|
485 |
+
{
|
486 |
+
"data": {
|
487 |
+
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEFCAYAAAAYKqc0AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/YYfK9AAAACXBIWXMAAAsTAAALEwEAmpwYAAAQKklEQVR4nO3df6zddX3H8edrVECFUaB3FVuwbNQ5zAKaihB/xEk2BefKH8pQp5WwNFkg0TB/MDWKRhdcsoFmzqwZhir+gKFIp0yHKFGj/CgKKKLSMVlbgVZoK8r8gbz3x/kUTq/39t62995DP30+kpPz+X4+n+/5vk97++r3fs73nJOqQpLUl98ZdQGSpJlnuEtShwx3SeqQ4S5JHTLcJalDhrskdchwlyaRZCzJ95M8cdS1TCTJAa2+sVHXoscfw10jleTVSdYm+VmSe5L8Z5Lnz8FxK8kxU0w7D7ikqv6v7XNdkr+e7domM/74VfVL4CMM6pR2YLhrZJKcC1wE/D2wEDgK+Bdg+QjLAgZnxcAK4NIZfMx5M/VYQz4BrGj1So8y3DUSSQ4B3gOcXVWfqaqfV9Wvq+o/qurNbc4BSS5K8uN2u2h7iCV5fZKvj3vMR8/Gk1yS5ENJPp/kwSQ3JPmDNvbVtsut7TeGv5ygxOcCW6tqQ9vnfcALgH9u+/xz6/9AkvVJfprk5iQvGKrn/CRXJLk0yU+B1yc5OslXW01fajVeOrTPiUm+kWRrkluTvGhnx2/1bQFO3P2/DfXIcNeonAQcCFy5kzlvZxBaxwPHAScA79iFY5wBvBs4FFgHvA+gql7Yxo+rqoOq6rIJ9v1j4AfbN6rq7cDXgHPaPue0oZtafYcxOIv+9yQHDj3OcuAKYD7w8TbnRuBw4HzgtdsnJlkEfB54b3u8NwGfTjK2k+MD3MHgz0d6lOGuUTkc+ElVPbyTOa8B3lNVm6pqM4Ogfu1O5o93ZVXd2I7xcQYhPF3zgQenmlRVl1bV/VX1cFX9I3AA8IdDU75ZVZ+tqkeAMeA5wDur6ldV9XVgzdDcvwKurqqrq+qRqroGWAucOkUZD7Z6pUcZ7hqV+4EFU6xDPxW4e2j77tY3XfcOtR8CDtqFfbcAB081KcmbktyRZFuSrcAhwIKhKeuH2k8FHqiqhyYZfxrwyrYks7U93vOBI6Yo42Bg61S1at9iuGtUvgn8EjhtJ3N+zCDwtjuq9QH8HHjS9oEkT5nh+m4Dnj6ub4ePUG3r628BTgcOrar5wDYgk+xzD3BYkicN9R051F4PfKyq5g/dnlxVF0x0/CF/BNw6jeekfYjhrpGoqm3AO4EPJTktyZOSPCHJKUn+oU37JPCOdr35gjZ/+4uPtwLPTHJ8W+M+fxdLuA/4/Z2M3wjMb+vgk+1zMPAwsBmYl+SdwO9O9oBVdTeDZZbzk+yf5CTg5UNTLgVenuQlSfZLcmCSFyVZPFnNrb7DgOt38ly0DzLcNTJtjfpcBi+SbmZw5noO8Nk25b0MwvA24DvAt1ofVfVDBlfbfAm4E9jhyplpOB9Y3ZY/Tp+gtl8BlzBYB9/uA8ArkmxJ8kHgi8AXgB8yWDL6BTsus0zkNQxeTL6/PZfLGPwGQ1WtZ/AC7Nt47M/jzTz273T88QFeDaxu17xLj4pf1iFNrL3z82vAs7a/kWkWjnEZ8P2qetdu7HsAg99gXlhVm2a8OO3VDHdpDiV5DvAA8D/AnzH4LeWkqvr2KOtSf2bjHXOSJvcU4DMMLgXdAPyNwa7Z4Jm7JHXIF1QlqUOGuyR16HGx5r5gwYJasmTJqMuQpL3KzTff/JOqmvDz/B8X4b5kyRLWrl076jIkaa+S5O7JxlyWkaQOGe6S1CHDXZI6ZLhLUocMd0nq0LTCPcmPknwnyS1J1ra+w5Jck+TOdn9o60+SDyZZl+S2JM+ezScgSfptu3Lm/idVdXxVLWvb5wHXVtVS4Nq2DXAKsLTdVgIfnqliJUnTsyfLMsuB1a29mse+UWc58NEauJ7BFx5M9TVhkqQZNN03MRXwX0kK+NeqWgUsrKp72vi9wMLWXsSOX1iwofXdM9RHkpUMzuw56qijdq/6ObbkvM+PuoSu/OiCl426hG74szmzevjZnG64P7+qNib5PeCaJN8fHqyqasE/be0/iFUAy5Yt86MpJWkGTWtZpqo2tvtNwJXACcB925db2v32b4LZyI5f+ru49UmS5siU4Z7kyUkO3t5m8O0x3wXWACvatBXAVa29Bnhdu2rmRGDb0PKNJGkOTGdZZiFwZZLt8z9RVV9IchNweZKzGHw58PYvGb4aOBVYBzwEnDnjVUuSdmrKcK+qu4DjJui/Hzh5gv4Czp6R6iRJu8V3qEpShwx3SeqQ4S5JHTLcJalDhrskdchwl6QOGe6S1CHDXZI6ZLhLUocMd0nqkOEuSR0y3CWpQ4a7JHXIcJekDhnuktQhw12SOmS4S1KHDHdJ6pDhLkkdMtwlqUOGuyR1yHCXpA4Z7pLUIcNdkjpkuEtShwx3SeqQ4S5JHTLcJalDhrskdchwl6QOGe6S1CHDXZI6NO1wT7Jfkm8n+VzbPjrJDUnWJbksyf6t/4C2va6NL5ml2iVJk9iVM/c3AHcMbb8fuLCqjgG2AGe1/rOALa3/wjZPkjSHphXuSRYDLwP+rW0HeDFwRZuyGjittZe3bdr4yW2+JGmOTPfM/SLgLcAjbftwYGtVPdy2NwCLWnsRsB6gjW9r8yVJc2TKcE/y58Cmqrp5Jg+cZGWStUnWbt68eSYfWpL2edM5c38e8BdJfgR8isFyzAeA+UnmtTmLgY2tvRE4EqCNHwLcP/5Bq2pVVS2rqmVjY2N79CQkSTuaMtyr6u+qanFVLQHOAL5cVa8BvgK8ok1bAVzV2mvaNm38y1VVM1q1JGmn9uQ697cC5yZZx2BN/eLWfzFweOs/Fzhvz0qUJO2qeVNPeUxVXQdc19p3ASdMMOcXwCtnoDZJ0m7yHaqS1CHDXZI6ZLhLUocMd0nqkOEuSR0y3CWpQ4a7JHXIcJekDhnuktQhw12SOmS4S1KHDHdJ6pDhLkkdMtwlqUOGuyR1yHCXpA4Z7pLUIcNdkjpkuEtShwx3SeqQ4S5JHTLcJalDhrskdchwl6QOGe6S1CHDXZI6ZLhLUocMd0nqkOEuSR0y3CWpQ4a7JHXIcJekDk0Z7kkOTHJjkluT3J7k3a3/6CQ3JFmX5LIk+7f+A9r2uja+ZJafgyRpnOmcuf8SeHFVHQccD7w0yYnA+4ELq+oYYAtwVpt/FrCl9V/Y5kmS5tCU4V4DP2ubT2i3Al4MXNH6VwOntfbytk0bPzlJZqpgSdLUprXmnmS/JLcAm4BrgP8GtlbVw23KBmBRay8C1gO08W3A4TNYsyRpCtMK96r6TVUdDywGTgCesacHTrIyydokazdv3rynDydJGrJLV8tU1VbgK8BJwPwk89rQYmBja28EjgRo44cA90/wWKuqallVLRsbG9u96iVJE5rO1TJjSea39hOBPwXuYBDyr2jTVgBXtfaatk0b/3JV1QzWLEmawrypp3AEsDrJfgz+M7i8qj6X5HvAp5K8F/g2cHGbfzHwsSTrgAeAM2ahbknSTkwZ7lV1G/CsCfrvYrD+Pr7/F8ArZ6Q6SdJu8R2qktQhw12SOmS4S1KHDHdJ6pDhLkkdMtwlqUOGuyR1yHCXpA4Z7pLUIcNdkjpkuEtShwx3SeqQ4S5JHTLcJalDhrskdchwl6QOGe6S1CHDXZI6ZLhLUocMd0nqkOEuSR0y3CWpQ4a7JHXIcJekDhnuktQhw12SOmS4S1KHDHdJ6pDhLkkdMtwlqUOGuyR1yHCXpA4Z7pLUoSnDPcmRSb6S5HtJbk/yhtZ/WJJrktzZ7g9t/UnywSTrktyW5Nmz/SQkSTuazpn7w8DfVtWxwInA2UmOBc4Drq2qpcC1bRvgFGBpu60EPjzjVUuSdmrKcK+qe6rqW639IHAHsAhYDqxu01YDp7X2cuCjNXA9MD/JETNduCRpcru05p5kCfAs4AZgYVXd04buBRa29iJg/dBuG1qfJGmOTDvckxwEfBp4Y1X9dHisqgqoXTlwkpVJ1iZZu3nz5l3ZVZI0hWmFe5InMAj2j1fVZ1r3fduXW9r9pta/EThyaPfFrW8HVbWqqpZV1bKxsbHdrV+SNIHpXC0T4GLgjqr6p6GhNcCK1l4BXDXU/7p21cyJwLah5RtJ0hyYN405zwNeC3wnyS2t723ABcDlSc4C7gZOb2NXA6cC64CHgDNnsmBJ0tSmDPeq+jqQSYZPnmB+AWfvYV2SpD3gO1QlqUOGuyR1yHCXpA4Z7pLUIcNdkjpkuEtShwx3SeqQ4S5JHTLcJalDhrskdchwl6QOGe6S1CHDXZI6ZLhLUocMd0nqkOEuSR0y3CWpQ4a7JHXIcJekDhnuktQhw12SOmS4S1KHDHdJ6pDhLkkdMtwlqUOGuyR1yHCXpA4Z7pLUIcNdkjpkuEtShwx3SeqQ4S5JHZoy3JN8JMmmJN8d6jssyTVJ7mz3h7b+JPlgknVJbkvy7NksXpI0semcuV8CvHRc33nAtVW1FLi2bQOcAixtt5XAh2emTEnSrpgy3Kvqq8AD47qXA6tbezVw2lD/R2vgemB+kiNmqFZJ0jTt7pr7wqq6p7XvBRa29iJg/dC8Da1PkjSH9vgF1aoqoHZ1vyQrk6xNsnbz5s17WoYkacjuhvt925db2v2m1r8ROHJo3uLW91uqalVVLauqZWNjY7tZhiRpIrsb7muAFa29ArhqqP917aqZE4FtQ8s3kqQ5Mm+qCUk+CbwIWJBkA/Au4ALg8iRnAXcDp7fpVwOnAuuAh4AzZ6FmSdIUpgz3qnrVJEMnTzC3gLP3tChJ0p7xHaqS1CHDXZI6ZLhLUocMd0nqkOEuSR0y3CWpQ4a7JHXIcJekDhnuktQhw12SOmS4S1KHDHdJ6pDhLkkdMtwlqUOGuyR1yHCXpA4Z7pLUIcNdkjpkuEtShwx3SeqQ4S5JHTLcJalDhrskdchwl6QOGe6S1CHDXZI6ZLhLUocMd0nqkOEuSR0y3CWpQ4a7JHXIcJekDhnuktShWQn3JC9N8oMk65KcNxvHkCRNbsbDPcl+wIeAU4BjgVclOXamjyNJmtxsnLmfAKyrqruq6lfAp4Dls3AcSdIk5s3CYy4C1g9tbwCeO35SkpXAyrb5syQ/mIVa9lULgJ+Muoip5P2jrkAj4M/mzHraZAOzEe7TUlWrgFWjOn7PkqytqmWjrkMaz5/NuTMbyzIbgSOHthe3PknSHJmNcL8JWJrk6CT7A2cAa2bhOJKkScz4skxVPZzkHOCLwH7AR6rq9pk+jnbK5S49XvmzOUdSVaOuQZI0w3yHqiR1yHCXpA4Z7pLUoZFd566ZkeQZDN4BvKh1bQTWVNUdo6tK0qh55r4XS/JWBh/vEODGdgvwST+wTY9nSc4cdQ2982qZvViSHwLPrKpfj+vfH7i9qpaOpjJp55L8b1UdNeo6euayzN7tEeCpwN3j+o9oY9LIJLltsiFg4VzWsi8y3PdubwSuTXInj31Y21HAMcA5oypKahYCLwG2jOsP8I25L2ffYrjvxarqC0mezuBjlodfUL2pqn4zusokAD4HHFRVt4wfSHLdnFezj3HNXZI65NUyktQhw12SOmS4S1KHDHdJ6pDhLkkd+n+rPQ6LBFTagQAAAABJRU5ErkJggg==\n",
|
488 |
+
"text/plain": [
|
489 |
+
"<Figure size 432x288 with 1 Axes>"
|
490 |
+
]
|
491 |
+
},
|
492 |
+
"metadata": {
|
493 |
+
"needs_background": "light"
|
494 |
+
},
|
495 |
+
"output_type": "display_data"
|
496 |
+
}
|
497 |
+
],
|
498 |
+
"source": [
|
499 |
+
"from imblearn.over_sampling import SMOTE\n",
|
500 |
+
"\n",
|
501 |
+
"print(data['class'].value_counts())\n",
|
502 |
+
"X = data.drop('class', axis=1)\n",
|
503 |
+
"Y = data['class']\n",
|
504 |
+
"\n",
|
505 |
+
"sm = SMOTE(random_state=42)\n",
|
506 |
+
"X_res, Y_res = sm.fit_resample(X, Y)\n",
|
507 |
+
"\n",
|
508 |
+
"df_smote_over = pd.concat([pd.DataFrame(X_res), pd.DataFrame(Y_res, columns=['class'])], axis=1)\n",
|
509 |
+
"\n",
|
510 |
+
"print('SMOTE over-sampling:')\n",
|
511 |
+
"print(df_smote_over['class'].value_counts())\n",
|
512 |
+
"\n",
|
513 |
+
"df_smote_over['class'].value_counts().plot(kind='bar', title='Count (target)');"
|
514 |
+
]
|
515 |
+
}
|
516 |
+
],
|
517 |
+
"metadata": {
|
518 |
+
"kernelspec": {
|
519 |
+
"display_name": "Python 3 (ipykernel)",
|
520 |
+
"language": "python",
|
521 |
+
"name": "python3"
|
522 |
+
},
|
523 |
+
"language_info": {
|
524 |
+
"codemirror_mode": {
|
525 |
+
"name": "ipython",
|
526 |
+
"version": 3
|
527 |
+
},
|
528 |
+
"file_extension": ".py",
|
529 |
+
"mimetype": "text/x-python",
|
530 |
+
"name": "python",
|
531 |
+
"nbconvert_exporter": "python",
|
532 |
+
"pygments_lexer": "ipython3",
|
533 |
+
"version": "3.12.9"
|
534 |
+
}
|
535 |
+
},
|
536 |
+
"nbformat": 4,
|
537 |
+
"nbformat_minor": 4
|
538 |
+
}
|
Data Analitics/Week 4/Video_Games_Sales_as_at_22_Dec_2016.csv
ADDED
The diff for this file is too large to render.
See raw diff
|
|
Data Analitics/Week 4/Week 4 Complementary Material.txt
ADDED
@@ -0,0 +1,23 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
### Week 4 Complementary Material:
|
2 |
+
|
3 |
+
https://oralytics.com/2019/04/18/data-sets-for-analytics/
|
4 |
+
|
5 |
+
https://machinelearningmastery.com/decoding-data-descriptive-statistics/
|
6 |
+
|
7 |
+
https://machinelearningmastery.com/decoding-data-descriptive-statistics/
|
8 |
+
|
9 |
+
https://medium.com/@pca_plus/3-epic-data-quality-blunders-1a1c024c20af
|
10 |
+
|
11 |
+
https://www.datasciencecentral.com/one-page-r-a-survival-guide-to-data-science-with-r/
|
12 |
+
|
13 |
+
https://medium.com/towards-data-science/reducing-dimensionality-from-dimensionality-reduction-techniques-f658aec24dfe
|
14 |
+
|
15 |
+
https://tylervigen.com/
|
16 |
+
|
17 |
+
https://learn.g2.com/data-visualization
|
18 |
+
|
19 |
+
https://www.kdnuggets.com/2015/05/7-methods-data-dimensionality-reduction.html
|
20 |
+
|
21 |
+
https://towardsdatascience.com/3-key-encoding-techniques-for-machine-learning-a-beginner-friendly-guide-aff8a01a7b6a/
|
22 |
+
|
23 |
+
https://www.scientificamerican.com/article/how-the-guinness-brewery-invented-the-most-important-statistical-method-in/?utm_source=pocket-newtab-en-gb
|