{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# TU257 - Assignment 1\n", "\n", "#### Student 1: Guilherme\n", "#### Student 2: Lohana Azevedo Rodrigues - D24126847 - (TU257)\n", "#### Student 3: Rafael Teixeira dos Santos Rodrigues\n", "\n", "#### Group Num= 3\n", "#### Problem Set= 1 \n", "\n", "#### **Portuguese Banking Marking Campaign -** The data set is related to a direct marketing campaign for a Portuguese banking institution. The bank conducts marketing campaigns and uses their call center to contact their customers using phone calls.\n", "#### ***GOAL***: The purpose of this project is to identify customers who are most likely to subscribe to a term deposit account based on previous marketing campaigns.\n", "#### https://archive.ics.uci.edu/ml/datasets/Bank+Marketing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "________________________________________________________" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Modules" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import sweetviz as sv\n", "from sklearn.preprocessing import LabelEncoder\n", "\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 1 - Importing the Data Set" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "#For this project we used the bank-additional data set with 4119 examples.\n", "mkt = pd.read_csv('Data/bank-additional-full.csv', sep = ';')\n", "mkt_raw = mkt #keeping the original data untouched for comparisons at the end" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 2 - Data Exploration" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### I choose to use the sweetviz to explore my data. It is a good way to explore an unknown data base. The disvantage is ith the increase in the size of the data the time to generate the report also increases a lot." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "First 5 values\n", "\n" ] }, { "data": { "text/html": [ "
\n", " | age | \n", "job | \n", "marital | \n", "education | \n", "default | \n", "housing | \n", "loan | \n", "contact | \n", "month | \n", "day_of_week | \n", "... | \n", "campaign | \n", "pdays | \n", "previous | \n", "poutcome | \n", "emp.var.rate | \n", "cons.price.idx | \n", "cons.conf.idx | \n", "euribor3m | \n", "nr.employed | \n", "y | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "56 | \n", "housemaid | \n", "married | \n", "basic.4y | \n", "no | \n", "no | \n", "no | \n", "telephone | \n", "may | \n", "mon | \n", "... | \n", "1 | \n", "999 | \n", "0 | \n", "nonexistent | \n", "1.1 | \n", "93.994 | \n", "-36.4 | \n", "4.857 | \n", "5191.0 | \n", "no | \n", "
1 | \n", "57 | \n", "services | \n", "married | \n", "high.school | \n", "unknown | \n", "no | \n", "no | \n", "telephone | \n", "may | \n", "mon | \n", "... | \n", "1 | \n", "999 | \n", "0 | \n", "nonexistent | \n", "1.1 | \n", "93.994 | \n", "-36.4 | \n", "4.857 | \n", "5191.0 | \n", "no | \n", "
2 | \n", "37 | \n", "services | \n", "married | \n", "high.school | \n", "no | \n", "yes | \n", "no | \n", "telephone | \n", "may | \n", "mon | \n", "... | \n", "1 | \n", "999 | \n", "0 | \n", "nonexistent | \n", "1.1 | \n", "93.994 | \n", "-36.4 | \n", "4.857 | \n", "5191.0 | \n", "no | \n", "
3 | \n", "40 | \n", "admin. | \n", "married | \n", "basic.6y | \n", "no | \n", "no | \n", "no | \n", "telephone | \n", "may | \n", "mon | \n", "... | \n", "1 | \n", "999 | \n", "0 | \n", "nonexistent | \n", "1.1 | \n", "93.994 | \n", "-36.4 | \n", "4.857 | \n", "5191.0 | \n", "no | \n", "
4 | \n", "56 | \n", "services | \n", "married | \n", "high.school | \n", "no | \n", "no | \n", "yes | \n", "telephone | \n", "may | \n", "mon | \n", "... | \n", "1 | \n", "999 | \n", "0 | \n", "nonexistent | \n", "1.1 | \n", "93.994 | \n", "-36.4 | \n", "4.857 | \n", "5191.0 | \n", "no | \n", "
5 rows × 21 columns
\n", "\n", " | age | \n", "job | \n", "marital | \n", "education | \n", "default | \n", "housing | \n", "loan | \n", "contact | \n", "month | \n", "day_of_week | \n", "... | \n", "campaign | \n", "pdays | \n", "previous | \n", "poutcome | \n", "emp.var.rate | \n", "cons.price.idx | \n", "cons.conf.idx | \n", "euribor3m | \n", "nr.employed | \n", "y | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "56 | \n", "3 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "6 | \n", "1 | \n", "... | \n", "1 | \n", "999 | \n", "0 | \n", "1 | \n", "1.1 | \n", "93.994 | \n", "-36.4 | \n", "4.857 | \n", "5191.0 | \n", "0 | \n", "
1 | \n", "57 | \n", "7 | \n", "1 | \n", "3 | \n", "1 | \n", "0 | \n", "0 | \n", "1 | \n", "6 | \n", "1 | \n", "... | \n", "1 | \n", "999 | \n", "0 | \n", "1 | \n", "1.1 | \n", "93.994 | \n", "-36.4 | \n", "4.857 | \n", "5191.0 | \n", "0 | \n", "
2 | \n", "37 | \n", "7 | \n", "1 | \n", "3 | \n", "0 | \n", "2 | \n", "0 | \n", "1 | \n", "6 | \n", "1 | \n", "... | \n", "1 | \n", "999 | \n", "0 | \n", "1 | \n", "1.1 | \n", "93.994 | \n", "-36.4 | \n", "4.857 | \n", "5191.0 | \n", "0 | \n", "
3 | \n", "40 | \n", "0 | \n", "1 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "6 | \n", "1 | \n", "... | \n", "1 | \n", "999 | \n", "0 | \n", "1 | \n", "1.1 | \n", "93.994 | \n", "-36.4 | \n", "4.857 | \n", "5191.0 | \n", "0 | \n", "
4 | \n", "56 | \n", "7 | \n", "1 | \n", "3 | \n", "0 | \n", "0 | \n", "2 | \n", "1 | \n", "6 | \n", "1 | \n", "... | \n", "1 | \n", "999 | \n", "0 | \n", "1 | \n", "1.1 | \n", "93.994 | \n", "-36.4 | \n", "4.857 | \n", "5191.0 | \n", "0 | \n", "
5 rows × 21 columns
\n", "