Viraj2307 commited on
Commit
27083da
·
1 Parent(s): 7730e90

Added product recommendation Section

Browse files
Notebooks/Product Recommendation System.ipynb ADDED
@@ -0,0 +1,2912 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "code",
5
+ "execution_count": 2,
6
+ "metadata": {},
7
+ "outputs": [
8
+ {
9
+ "name": "stdout",
10
+ "output_type": "stream",
11
+ "text": [
12
+ "Requirement already satisfied: tqdm in c:\\python3.11.1\\lib\\site-packages (4.66.2)\n",
13
+ "Requirement already satisfied: colorama in c:\\python3.11.1\\lib\\site-packages (from tqdm) (0.4.6)\n",
14
+ "Note: you may need to restart the kernel to use updated packages.\n"
15
+ ]
16
+ },
17
+ {
18
+ "name": "stderr",
19
+ "output_type": "stream",
20
+ "text": [
21
+ "\n",
22
+ "[notice] A new release of pip is available: 24.1.2 -> 24.3.1\n",
23
+ "[notice] To update, run: python.exe -m pip install --upgrade pip\n"
24
+ ]
25
+ }
26
+ ],
27
+ "source": [
28
+ "pip install tqdm"
29
+ ]
30
+ },
31
+ {
32
+ "cell_type": "code",
33
+ "execution_count": 3,
34
+ "metadata": {},
35
+ "outputs": [],
36
+ "source": [
37
+ "import pandas as pd\n",
38
+ "import numpy as np\n",
39
+ "import random\n",
40
+ "from tqdm import tqdm\n",
41
+ "import plotly.express as px\n",
42
+ "import matplotlib.pyplot as plt"
43
+ ]
44
+ },
45
+ {
46
+ "cell_type": "code",
47
+ "execution_count": 4,
48
+ "metadata": {},
49
+ "outputs": [
50
+ {
51
+ "data": {
52
+ "text/html": [
53
+ "<div>\n",
54
+ "<style scoped>\n",
55
+ " .dataframe tbody tr th:only-of-type {\n",
56
+ " vertical-align: middle;\n",
57
+ " }\n",
58
+ "\n",
59
+ " .dataframe tbody tr th {\n",
60
+ " vertical-align: top;\n",
61
+ " }\n",
62
+ "\n",
63
+ " .dataframe thead th {\n",
64
+ " text-align: right;\n",
65
+ " }\n",
66
+ "</style>\n",
67
+ "<table border=\"1\" class=\"dataframe\">\n",
68
+ " <thead>\n",
69
+ " <tr style=\"text-align: right;\">\n",
70
+ " <th></th>\n",
71
+ " <th>InvoiceNo</th>\n",
72
+ " <th>StockCode</th>\n",
73
+ " <th>Description</th>\n",
74
+ " <th>Quantity</th>\n",
75
+ " <th>InvoiceDate</th>\n",
76
+ " <th>UnitPrice</th>\n",
77
+ " <th>CustomerID</th>\n",
78
+ " <th>Country</th>\n",
79
+ " </tr>\n",
80
+ " </thead>\n",
81
+ " <tbody>\n",
82
+ " <tr>\n",
83
+ " <th>0</th>\n",
84
+ " <td>536365</td>\n",
85
+ " <td>85123A</td>\n",
86
+ " <td>WHITE HANGING HEART T-LIGHT HOLDER</td>\n",
87
+ " <td>6</td>\n",
88
+ " <td>2010-12-01 08:26:00</td>\n",
89
+ " <td>2.55</td>\n",
90
+ " <td>17850.0</td>\n",
91
+ " <td>United Kingdom</td>\n",
92
+ " </tr>\n",
93
+ " <tr>\n",
94
+ " <th>1</th>\n",
95
+ " <td>536365</td>\n",
96
+ " <td>71053</td>\n",
97
+ " <td>WHITE METAL LANTERN</td>\n",
98
+ " <td>6</td>\n",
99
+ " <td>2010-12-01 08:26:00</td>\n",
100
+ " <td>3.39</td>\n",
101
+ " <td>17850.0</td>\n",
102
+ " <td>United Kingdom</td>\n",
103
+ " </tr>\n",
104
+ " <tr>\n",
105
+ " <th>2</th>\n",
106
+ " <td>536365</td>\n",
107
+ " <td>84406B</td>\n",
108
+ " <td>CREAM CUPID HEARTS COAT HANGER</td>\n",
109
+ " <td>8</td>\n",
110
+ " <td>2010-12-01 08:26:00</td>\n",
111
+ " <td>2.75</td>\n",
112
+ " <td>17850.0</td>\n",
113
+ " <td>United Kingdom</td>\n",
114
+ " </tr>\n",
115
+ " <tr>\n",
116
+ " <th>3</th>\n",
117
+ " <td>536365</td>\n",
118
+ " <td>84029G</td>\n",
119
+ " <td>KNITTED UNION FLAG HOT WATER BOTTLE</td>\n",
120
+ " <td>6</td>\n",
121
+ " <td>2010-12-01 08:26:00</td>\n",
122
+ " <td>3.39</td>\n",
123
+ " <td>17850.0</td>\n",
124
+ " <td>United Kingdom</td>\n",
125
+ " </tr>\n",
126
+ " <tr>\n",
127
+ " <th>4</th>\n",
128
+ " <td>536365</td>\n",
129
+ " <td>84029E</td>\n",
130
+ " <td>RED WOOLLY HOTTIE WHITE HEART.</td>\n",
131
+ " <td>6</td>\n",
132
+ " <td>2010-12-01 08:26:00</td>\n",
133
+ " <td>3.39</td>\n",
134
+ " <td>17850.0</td>\n",
135
+ " <td>United Kingdom</td>\n",
136
+ " </tr>\n",
137
+ " </tbody>\n",
138
+ "</table>\n",
139
+ "</div>"
140
+ ],
141
+ "text/plain": [
142
+ " InvoiceNo StockCode Description Quantity \\\n",
143
+ "0 536365 85123A WHITE HANGING HEART T-LIGHT HOLDER 6 \n",
144
+ "1 536365 71053 WHITE METAL LANTERN 6 \n",
145
+ "2 536365 84406B CREAM CUPID HEARTS COAT HANGER 8 \n",
146
+ "3 536365 84029G KNITTED UNION FLAG HOT WATER BOTTLE 6 \n",
147
+ "4 536365 84029E RED WOOLLY HOTTIE WHITE HEART. 6 \n",
148
+ "\n",
149
+ " InvoiceDate UnitPrice CustomerID Country \n",
150
+ "0 2010-12-01 08:26:00 2.55 17850.0 United Kingdom \n",
151
+ "1 2010-12-01 08:26:00 3.39 17850.0 United Kingdom \n",
152
+ "2 2010-12-01 08:26:00 2.75 17850.0 United Kingdom \n",
153
+ "3 2010-12-01 08:26:00 3.39 17850.0 United Kingdom \n",
154
+ "4 2010-12-01 08:26:00 3.39 17850.0 United Kingdom "
155
+ ]
156
+ },
157
+ "execution_count": 4,
158
+ "metadata": {},
159
+ "output_type": "execute_result"
160
+ }
161
+ ],
162
+ "source": [
163
+ "df = pd.read_csv(r\"D:\\Customer Segmentation\\retail_sales.csv\")\n",
164
+ "df.head()"
165
+ ]
166
+ },
167
+ {
168
+ "cell_type": "markdown",
169
+ "metadata": {},
170
+ "source": [
171
+ "Data Cleaning"
172
+ ]
173
+ },
174
+ {
175
+ "cell_type": "code",
176
+ "execution_count": 5,
177
+ "metadata": {},
178
+ "outputs": [
179
+ {
180
+ "data": {
181
+ "text/html": [
182
+ "<div>\n",
183
+ "<style scoped>\n",
184
+ " .dataframe tbody tr th:only-of-type {\n",
185
+ " vertical-align: middle;\n",
186
+ " }\n",
187
+ "\n",
188
+ " .dataframe tbody tr th {\n",
189
+ " vertical-align: top;\n",
190
+ " }\n",
191
+ "\n",
192
+ " .dataframe thead th {\n",
193
+ " text-align: right;\n",
194
+ " }\n",
195
+ "</style>\n",
196
+ "<table border=\"1\" class=\"dataframe\">\n",
197
+ " <thead>\n",
198
+ " <tr style=\"text-align: right;\">\n",
199
+ " <th></th>\n",
200
+ " <th>Column</th>\n",
201
+ " <th>dtype</th>\n",
202
+ " <th>unique sample</th>\n",
203
+ " <th>n uniques</th>\n",
204
+ " <th>num of missing</th>\n",
205
+ " <th>mean of missing</th>\n",
206
+ " </tr>\n",
207
+ " </thead>\n",
208
+ " <tbody>\n",
209
+ " <tr>\n",
210
+ " <th>0</th>\n",
211
+ " <td>InvoiceNo</td>\n",
212
+ " <td>object</td>\n",
213
+ " <td>[536365, 536366, 536367, 536368, 536369]</td>\n",
214
+ " <td>25900</td>\n",
215
+ " <td>0</td>\n",
216
+ " <td>0.000000</td>\n",
217
+ " </tr>\n",
218
+ " <tr>\n",
219
+ " <th>1</th>\n",
220
+ " <td>StockCode</td>\n",
221
+ " <td>object</td>\n",
222
+ " <td>[85123A, 71053, 84406B, 84029G, 84029E]</td>\n",
223
+ " <td>4070</td>\n",
224
+ " <td>0</td>\n",
225
+ " <td>0.000000</td>\n",
226
+ " </tr>\n",
227
+ " <tr>\n",
228
+ " <th>2</th>\n",
229
+ " <td>Description</td>\n",
230
+ " <td>object</td>\n",
231
+ " <td>[WHITE HANGING HEART T-LIGHT HOLDER, WHITE MET...</td>\n",
232
+ " <td>4223</td>\n",
233
+ " <td>1454</td>\n",
234
+ " <td>0.002683</td>\n",
235
+ " </tr>\n",
236
+ " <tr>\n",
237
+ " <th>3</th>\n",
238
+ " <td>Quantity</td>\n",
239
+ " <td>int64</td>\n",
240
+ " <td>[6, 8, 2, 32, 3]</td>\n",
241
+ " <td>722</td>\n",
242
+ " <td>0</td>\n",
243
+ " <td>0.000000</td>\n",
244
+ " </tr>\n",
245
+ " <tr>\n",
246
+ " <th>4</th>\n",
247
+ " <td>InvoiceDate</td>\n",
248
+ " <td>object</td>\n",
249
+ " <td>[2010-12-01 08:26:00, 2010-12-01 08:28:00, 201...</td>\n",
250
+ " <td>23260</td>\n",
251
+ " <td>0</td>\n",
252
+ " <td>0.000000</td>\n",
253
+ " </tr>\n",
254
+ " <tr>\n",
255
+ " <th>5</th>\n",
256
+ " <td>UnitPrice</td>\n",
257
+ " <td>float64</td>\n",
258
+ " <td>[2.55, 3.39, 2.75, 7.65, 4.25]</td>\n",
259
+ " <td>1630</td>\n",
260
+ " <td>0</td>\n",
261
+ " <td>0.000000</td>\n",
262
+ " </tr>\n",
263
+ " <tr>\n",
264
+ " <th>6</th>\n",
265
+ " <td>CustomerID</td>\n",
266
+ " <td>float64</td>\n",
267
+ " <td>[17850.0, 13047.0, 12583.0, 13748.0, 15100.0]</td>\n",
268
+ " <td>4372</td>\n",
269
+ " <td>135080</td>\n",
270
+ " <td>0.249267</td>\n",
271
+ " </tr>\n",
272
+ " <tr>\n",
273
+ " <th>7</th>\n",
274
+ " <td>Country</td>\n",
275
+ " <td>object</td>\n",
276
+ " <td>[United Kingdom, France, Australia, Netherland...</td>\n",
277
+ " <td>38</td>\n",
278
+ " <td>0</td>\n",
279
+ " <td>0.000000</td>\n",
280
+ " </tr>\n",
281
+ " </tbody>\n",
282
+ "</table>\n",
283
+ "</div>"
284
+ ],
285
+ "text/plain": [
286
+ " Column dtype unique sample \\\n",
287
+ "0 InvoiceNo object [536365, 536366, 536367, 536368, 536369] \n",
288
+ "1 StockCode object [85123A, 71053, 84406B, 84029G, 84029E] \n",
289
+ "2 Description object [WHITE HANGING HEART T-LIGHT HOLDER, WHITE MET... \n",
290
+ "3 Quantity int64 [6, 8, 2, 32, 3] \n",
291
+ "4 InvoiceDate object [2010-12-01 08:26:00, 2010-12-01 08:28:00, 201... \n",
292
+ "5 UnitPrice float64 [2.55, 3.39, 2.75, 7.65, 4.25] \n",
293
+ "6 CustomerID float64 [17850.0, 13047.0, 12583.0, 13748.0, 15100.0] \n",
294
+ "7 Country object [United Kingdom, France, Australia, Netherland... \n",
295
+ "\n",
296
+ " n uniques num of missing mean of missing \n",
297
+ "0 25900 0 0.000000 \n",
298
+ "1 4070 0 0.000000 \n",
299
+ "2 4223 1454 0.002683 \n",
300
+ "3 722 0 0.000000 \n",
301
+ "4 23260 0 0.000000 \n",
302
+ "5 1630 0 0.000000 \n",
303
+ "6 4372 135080 0.249267 \n",
304
+ "7 38 0 0.000000 "
305
+ ]
306
+ },
307
+ "execution_count": 5,
308
+ "metadata": {},
309
+ "output_type": "execute_result"
310
+ }
311
+ ],
312
+ "source": [
313
+ "def report(df):\n",
314
+ " col = []\n",
315
+ " d_type = []\n",
316
+ " uniques = []\n",
317
+ " n_uniques = []\n",
318
+ " missing_values = []\n",
319
+ " mean_of_missing = []\n",
320
+ " \n",
321
+ " for i in df.columns:\n",
322
+ " col.append(i)\n",
323
+ " d_type.append(df[i].dtypes)\n",
324
+ " uniques.append(df[i].unique()[:5])\n",
325
+ " n_uniques.append(df[i].nunique())\n",
326
+ " missing_values.append(df[i].isna().sum())\n",
327
+ " mean_of_missing.append(df[i].isna().sum()/len(df))\n",
328
+ " \n",
329
+ " return pd.DataFrame({'Column': col, 'dtype': d_type, 'unique sample': uniques, 'n uniques': n_uniques, 'num of missing': missing_values, 'mean of missing': mean_of_missing })\n",
330
+ "\n",
331
+ "\n",
332
+ "report(df)"
333
+ ]
334
+ },
335
+ {
336
+ "cell_type": "code",
337
+ "execution_count": 6,
338
+ "metadata": {},
339
+ "outputs": [],
340
+ "source": [
341
+ "df.dropna(inplace=True)"
342
+ ]
343
+ },
344
+ {
345
+ "cell_type": "code",
346
+ "execution_count": 7,
347
+ "metadata": {},
348
+ "outputs": [
349
+ {
350
+ "data": {
351
+ "text/html": [
352
+ "<div>\n",
353
+ "<style scoped>\n",
354
+ " .dataframe tbody tr th:only-of-type {\n",
355
+ " vertical-align: middle;\n",
356
+ " }\n",
357
+ "\n",
358
+ " .dataframe tbody tr th {\n",
359
+ " vertical-align: top;\n",
360
+ " }\n",
361
+ "\n",
362
+ " .dataframe thead th {\n",
363
+ " text-align: right;\n",
364
+ " }\n",
365
+ "</style>\n",
366
+ "<table border=\"1\" class=\"dataframe\">\n",
367
+ " <thead>\n",
368
+ " <tr style=\"text-align: right;\">\n",
369
+ " <th></th>\n",
370
+ " <th>Quantity</th>\n",
371
+ " <th>UnitPrice</th>\n",
372
+ " <th>CustomerID</th>\n",
373
+ " </tr>\n",
374
+ " </thead>\n",
375
+ " <tbody>\n",
376
+ " <tr>\n",
377
+ " <th>count</th>\n",
378
+ " <td>406829.000000</td>\n",
379
+ " <td>406829.000000</td>\n",
380
+ " <td>406829.000000</td>\n",
381
+ " </tr>\n",
382
+ " <tr>\n",
383
+ " <th>mean</th>\n",
384
+ " <td>12.061303</td>\n",
385
+ " <td>3.460471</td>\n",
386
+ " <td>15287.690570</td>\n",
387
+ " </tr>\n",
388
+ " <tr>\n",
389
+ " <th>std</th>\n",
390
+ " <td>248.693370</td>\n",
391
+ " <td>69.315162</td>\n",
392
+ " <td>1713.600303</td>\n",
393
+ " </tr>\n",
394
+ " <tr>\n",
395
+ " <th>min</th>\n",
396
+ " <td>-80995.000000</td>\n",
397
+ " <td>0.000000</td>\n",
398
+ " <td>12346.000000</td>\n",
399
+ " </tr>\n",
400
+ " <tr>\n",
401
+ " <th>25%</th>\n",
402
+ " <td>2.000000</td>\n",
403
+ " <td>1.250000</td>\n",
404
+ " <td>13953.000000</td>\n",
405
+ " </tr>\n",
406
+ " <tr>\n",
407
+ " <th>50%</th>\n",
408
+ " <td>5.000000</td>\n",
409
+ " <td>1.950000</td>\n",
410
+ " <td>15152.000000</td>\n",
411
+ " </tr>\n",
412
+ " <tr>\n",
413
+ " <th>75%</th>\n",
414
+ " <td>12.000000</td>\n",
415
+ " <td>3.750000</td>\n",
416
+ " <td>16791.000000</td>\n",
417
+ " </tr>\n",
418
+ " <tr>\n",
419
+ " <th>max</th>\n",
420
+ " <td>80995.000000</td>\n",
421
+ " <td>38970.000000</td>\n",
422
+ " <td>18287.000000</td>\n",
423
+ " </tr>\n",
424
+ " </tbody>\n",
425
+ "</table>\n",
426
+ "</div>"
427
+ ],
428
+ "text/plain": [
429
+ " Quantity UnitPrice CustomerID\n",
430
+ "count 406829.000000 406829.000000 406829.000000\n",
431
+ "mean 12.061303 3.460471 15287.690570\n",
432
+ "std 248.693370 69.315162 1713.600303\n",
433
+ "min -80995.000000 0.000000 12346.000000\n",
434
+ "25% 2.000000 1.250000 13953.000000\n",
435
+ "50% 5.000000 1.950000 15152.000000\n",
436
+ "75% 12.000000 3.750000 16791.000000\n",
437
+ "max 80995.000000 38970.000000 18287.000000"
438
+ ]
439
+ },
440
+ "execution_count": 7,
441
+ "metadata": {},
442
+ "output_type": "execute_result"
443
+ }
444
+ ],
445
+ "source": [
446
+ "df.describe()"
447
+ ]
448
+ },
449
+ {
450
+ "cell_type": "markdown",
451
+ "metadata": {},
452
+ "source": [
453
+ "Remove Negative Values"
454
+ ]
455
+ },
456
+ {
457
+ "cell_type": "code",
458
+ "execution_count": 8,
459
+ "metadata": {},
460
+ "outputs": [],
461
+ "source": [
462
+ "df = df[df['Quantity'] > 0]"
463
+ ]
464
+ },
465
+ {
466
+ "cell_type": "code",
467
+ "execution_count": 9,
468
+ "metadata": {},
469
+ "outputs": [
470
+ {
471
+ "data": {
472
+ "text/plain": [
473
+ "(397924, 8)"
474
+ ]
475
+ },
476
+ "execution_count": 9,
477
+ "metadata": {},
478
+ "output_type": "execute_result"
479
+ }
480
+ ],
481
+ "source": [
482
+ "df.shape"
483
+ ]
484
+ },
485
+ {
486
+ "cell_type": "markdown",
487
+ "metadata": {},
488
+ "source": [
489
+ "EDA"
490
+ ]
491
+ },
492
+ {
493
+ "cell_type": "markdown",
494
+ "metadata": {},
495
+ "source": [
496
+ "Top Products by Quantity Sold"
497
+ ]
498
+ },
499
+ {
500
+ "cell_type": "code",
501
+ "execution_count": 11,
502
+ "metadata": {},
503
+ "outputs": [
504
+ {
505
+ "data": {
506
+ "text/html": [
507
+ "<div>\n",
508
+ "<style scoped>\n",
509
+ " .dataframe tbody tr th:only-of-type {\n",
510
+ " vertical-align: middle;\n",
511
+ " }\n",
512
+ "\n",
513
+ " .dataframe tbody tr th {\n",
514
+ " vertical-align: top;\n",
515
+ " }\n",
516
+ "\n",
517
+ " .dataframe thead th {\n",
518
+ " text-align: right;\n",
519
+ " }\n",
520
+ "</style>\n",
521
+ "<table border=\"1\" class=\"dataframe\">\n",
522
+ " <thead>\n",
523
+ " <tr style=\"text-align: right;\">\n",
524
+ " <th></th>\n",
525
+ " <th></th>\n",
526
+ " <th>Quantity</th>\n",
527
+ " </tr>\n",
528
+ " <tr>\n",
529
+ " <th>StockCode</th>\n",
530
+ " <th>Description</th>\n",
531
+ " <th></th>\n",
532
+ " </tr>\n",
533
+ " </thead>\n",
534
+ " <tbody>\n",
535
+ " <tr>\n",
536
+ " <th>23843</th>\n",
537
+ " <th>PAPER CRAFT , LITTLE BIRDIE</th>\n",
538
+ " <td>80995</td>\n",
539
+ " </tr>\n",
540
+ " <tr>\n",
541
+ " <th>23166</th>\n",
542
+ " <th>MEDIUM CERAMIC TOP STORAGE JAR</th>\n",
543
+ " <td>77916</td>\n",
544
+ " </tr>\n",
545
+ " <tr>\n",
546
+ " <th>84077</th>\n",
547
+ " <th>WORLD WAR 2 GLIDERS ASSTD DESIGNS</th>\n",
548
+ " <td>54415</td>\n",
549
+ " </tr>\n",
550
+ " <tr>\n",
551
+ " <th>85099B</th>\n",
552
+ " <th>JUMBO BAG RED RETROSPOT</th>\n",
553
+ " <td>46181</td>\n",
554
+ " </tr>\n",
555
+ " <tr>\n",
556
+ " <th>85123A</th>\n",
557
+ " <th>WHITE HANGING HEART T-LIGHT HOLDER</th>\n",
558
+ " <td>36725</td>\n",
559
+ " </tr>\n",
560
+ " <tr>\n",
561
+ " <th>84879</th>\n",
562
+ " <th>ASSORTED COLOUR BIRD ORNAMENT</th>\n",
563
+ " <td>35362</td>\n",
564
+ " </tr>\n",
565
+ " <tr>\n",
566
+ " <th>21212</th>\n",
567
+ " <th>PACK OF 72 RETROSPOT CAKE CASES</th>\n",
568
+ " <td>33693</td>\n",
569
+ " </tr>\n",
570
+ " <tr>\n",
571
+ " <th>22197</th>\n",
572
+ " <th>POPCORN HOLDER</th>\n",
573
+ " <td>30931</td>\n",
574
+ " </tr>\n",
575
+ " <tr>\n",
576
+ " <th>23084</th>\n",
577
+ " <th>RABBIT NIGHT LIGHT</th>\n",
578
+ " <td>27202</td>\n",
579
+ " </tr>\n",
580
+ " <tr>\n",
581
+ " <th>22492</th>\n",
582
+ " <th>MINI PAINT SET VINTAGE</th>\n",
583
+ " <td>26076</td>\n",
584
+ " </tr>\n",
585
+ " </tbody>\n",
586
+ "</table>\n",
587
+ "</div>"
588
+ ],
589
+ "text/plain": [
590
+ " Quantity\n",
591
+ "StockCode Description \n",
592
+ "23843 PAPER CRAFT , LITTLE BIRDIE 80995\n",
593
+ "23166 MEDIUM CERAMIC TOP STORAGE JAR 77916\n",
594
+ "84077 WORLD WAR 2 GLIDERS ASSTD DESIGNS 54415\n",
595
+ "85099B JUMBO BAG RED RETROSPOT 46181\n",
596
+ "85123A WHITE HANGING HEART T-LIGHT HOLDER 36725\n",
597
+ "84879 ASSORTED COLOUR BIRD ORNAMENT 35362\n",
598
+ "21212 PACK OF 72 RETROSPOT CAKE CASES 33693\n",
599
+ "22197 POPCORN HOLDER 30931\n",
600
+ "23084 RABBIT NIGHT LIGHT 27202\n",
601
+ "22492 MINI PAINT SET VINTAGE 26076"
602
+ ]
603
+ },
604
+ "execution_count": 11,
605
+ "metadata": {},
606
+ "output_type": "execute_result"
607
+ }
608
+ ],
609
+ "source": [
610
+ "TopProducts= df.pivot_table(\n",
611
+ " index=['StockCode','Description'],\n",
612
+ " values='Quantity',\n",
613
+ " aggfunc='sum').sort_values(\n",
614
+ " by='Quantity', ascending=False)\n",
615
+ "\n",
616
+ "TopProducts.head(10)"
617
+ ]
618
+ },
619
+ {
620
+ "cell_type": "code",
621
+ "execution_count": 12,
622
+ "metadata": {},
623
+ "outputs": [
624
+ {
625
+ "data": {
626
+ "application/vnd.plotly.v1+json": {
627
+ "config": {
628
+ "plotlyServerURL": "https://plot.ly"
629
+ },
630
+ "data": [
631
+ {
632
+ "alignmentgroup": "True",
633
+ "hovertemplate": "Quantity=%{x}<br>Description=%{y}<extra></extra>",
634
+ "legendgroup": "",
635
+ "marker": {
636
+ "color": "#636efa",
637
+ "pattern": {
638
+ "shape": ""
639
+ }
640
+ },
641
+ "name": "",
642
+ "offsetgroup": "",
643
+ "orientation": "h",
644
+ "showlegend": false,
645
+ "textposition": "auto",
646
+ "type": "bar",
647
+ "x": [
648
+ 80995,
649
+ 77916,
650
+ 54415,
651
+ 46181,
652
+ 36725,
653
+ 35362,
654
+ 33693,
655
+ 30931,
656
+ 27202,
657
+ 26076
658
+ ],
659
+ "xaxis": "x",
660
+ "y": [
661
+ "PAPER CRAFT , LITTLE BIRDIE",
662
+ "MEDIUM CERAMIC TOP STORAGE JAR",
663
+ "WORLD WAR 2 GLIDERS ASSTD DESIGNS",
664
+ "JUMBO BAG RED RETROSPOT",
665
+ "WHITE HANGING HEART T-LIGHT HOLDER",
666
+ "ASSORTED COLOUR BIRD ORNAMENT",
667
+ "PACK OF 72 RETROSPOT CAKE CASES",
668
+ "POPCORN HOLDER",
669
+ "RABBIT NIGHT LIGHT",
670
+ "MINI PAINT SET VINTAGE "
671
+ ],
672
+ "yaxis": "y"
673
+ }
674
+ ],
675
+ "layout": {
676
+ "barmode": "relative",
677
+ "legend": {
678
+ "tracegroupgap": 0
679
+ },
680
+ "template": {
681
+ "data": {
682
+ "bar": [
683
+ {
684
+ "error_x": {
685
+ "color": "#2a3f5f"
686
+ },
687
+ "error_y": {
688
+ "color": "#2a3f5f"
689
+ },
690
+ "marker": {
691
+ "line": {
692
+ "color": "#E5ECF6",
693
+ "width": 0.5
694
+ },
695
+ "pattern": {
696
+ "fillmode": "overlay",
697
+ "size": 10,
698
+ "solidity": 0.2
699
+ }
700
+ },
701
+ "type": "bar"
702
+ }
703
+ ],
704
+ "barpolar": [
705
+ {
706
+ "marker": {
707
+ "line": {
708
+ "color": "#E5ECF6",
709
+ "width": 0.5
710
+ },
711
+ "pattern": {
712
+ "fillmode": "overlay",
713
+ "size": 10,
714
+ "solidity": 0.2
715
+ }
716
+ },
717
+ "type": "barpolar"
718
+ }
719
+ ],
720
+ "carpet": [
721
+ {
722
+ "aaxis": {
723
+ "endlinecolor": "#2a3f5f",
724
+ "gridcolor": "white",
725
+ "linecolor": "white",
726
+ "minorgridcolor": "white",
727
+ "startlinecolor": "#2a3f5f"
728
+ },
729
+ "baxis": {
730
+ "endlinecolor": "#2a3f5f",
731
+ "gridcolor": "white",
732
+ "linecolor": "white",
733
+ "minorgridcolor": "white",
734
+ "startlinecolor": "#2a3f5f"
735
+ },
736
+ "type": "carpet"
737
+ }
738
+ ],
739
+ "choropleth": [
740
+ {
741
+ "colorbar": {
742
+ "outlinewidth": 0,
743
+ "ticks": ""
744
+ },
745
+ "type": "choropleth"
746
+ }
747
+ ],
748
+ "contour": [
749
+ {
750
+ "colorbar": {
751
+ "outlinewidth": 0,
752
+ "ticks": ""
753
+ },
754
+ "colorscale": [
755
+ [
756
+ 0,
757
+ "#0d0887"
758
+ ],
759
+ [
760
+ 0.1111111111111111,
761
+ "#46039f"
762
+ ],
763
+ [
764
+ 0.2222222222222222,
765
+ "#7201a8"
766
+ ],
767
+ [
768
+ 0.3333333333333333,
769
+ "#9c179e"
770
+ ],
771
+ [
772
+ 0.4444444444444444,
773
+ "#bd3786"
774
+ ],
775
+ [
776
+ 0.5555555555555556,
777
+ "#d8576b"
778
+ ],
779
+ [
780
+ 0.6666666666666666,
781
+ "#ed7953"
782
+ ],
783
+ [
784
+ 0.7777777777777778,
785
+ "#fb9f3a"
786
+ ],
787
+ [
788
+ 0.8888888888888888,
789
+ "#fdca26"
790
+ ],
791
+ [
792
+ 1,
793
+ "#f0f921"
794
+ ]
795
+ ],
796
+ "type": "contour"
797
+ }
798
+ ],
799
+ "contourcarpet": [
800
+ {
801
+ "colorbar": {
802
+ "outlinewidth": 0,
803
+ "ticks": ""
804
+ },
805
+ "type": "contourcarpet"
806
+ }
807
+ ],
808
+ "heatmap": [
809
+ {
810
+ "colorbar": {
811
+ "outlinewidth": 0,
812
+ "ticks": ""
813
+ },
814
+ "colorscale": [
815
+ [
816
+ 0,
817
+ "#0d0887"
818
+ ],
819
+ [
820
+ 0.1111111111111111,
821
+ "#46039f"
822
+ ],
823
+ [
824
+ 0.2222222222222222,
825
+ "#7201a8"
826
+ ],
827
+ [
828
+ 0.3333333333333333,
829
+ "#9c179e"
830
+ ],
831
+ [
832
+ 0.4444444444444444,
833
+ "#bd3786"
834
+ ],
835
+ [
836
+ 0.5555555555555556,
837
+ "#d8576b"
838
+ ],
839
+ [
840
+ 0.6666666666666666,
841
+ "#ed7953"
842
+ ],
843
+ [
844
+ 0.7777777777777778,
845
+ "#fb9f3a"
846
+ ],
847
+ [
848
+ 0.8888888888888888,
849
+ "#fdca26"
850
+ ],
851
+ [
852
+ 1,
853
+ "#f0f921"
854
+ ]
855
+ ],
856
+ "type": "heatmap"
857
+ }
858
+ ],
859
+ "heatmapgl": [
860
+ {
861
+ "colorbar": {
862
+ "outlinewidth": 0,
863
+ "ticks": ""
864
+ },
865
+ "colorscale": [
866
+ [
867
+ 0,
868
+ "#0d0887"
869
+ ],
870
+ [
871
+ 0.1111111111111111,
872
+ "#46039f"
873
+ ],
874
+ [
875
+ 0.2222222222222222,
876
+ "#7201a8"
877
+ ],
878
+ [
879
+ 0.3333333333333333,
880
+ "#9c179e"
881
+ ],
882
+ [
883
+ 0.4444444444444444,
884
+ "#bd3786"
885
+ ],
886
+ [
887
+ 0.5555555555555556,
888
+ "#d8576b"
889
+ ],
890
+ [
891
+ 0.6666666666666666,
892
+ "#ed7953"
893
+ ],
894
+ [
895
+ 0.7777777777777778,
896
+ "#fb9f3a"
897
+ ],
898
+ [
899
+ 0.8888888888888888,
900
+ "#fdca26"
901
+ ],
902
+ [
903
+ 1,
904
+ "#f0f921"
905
+ ]
906
+ ],
907
+ "type": "heatmapgl"
908
+ }
909
+ ],
910
+ "histogram": [
911
+ {
912
+ "marker": {
913
+ "pattern": {
914
+ "fillmode": "overlay",
915
+ "size": 10,
916
+ "solidity": 0.2
917
+ }
918
+ },
919
+ "type": "histogram"
920
+ }
921
+ ],
922
+ "histogram2d": [
923
+ {
924
+ "colorbar": {
925
+ "outlinewidth": 0,
926
+ "ticks": ""
927
+ },
928
+ "colorscale": [
929
+ [
930
+ 0,
931
+ "#0d0887"
932
+ ],
933
+ [
934
+ 0.1111111111111111,
935
+ "#46039f"
936
+ ],
937
+ [
938
+ 0.2222222222222222,
939
+ "#7201a8"
940
+ ],
941
+ [
942
+ 0.3333333333333333,
943
+ "#9c179e"
944
+ ],
945
+ [
946
+ 0.4444444444444444,
947
+ "#bd3786"
948
+ ],
949
+ [
950
+ 0.5555555555555556,
951
+ "#d8576b"
952
+ ],
953
+ [
954
+ 0.6666666666666666,
955
+ "#ed7953"
956
+ ],
957
+ [
958
+ 0.7777777777777778,
959
+ "#fb9f3a"
960
+ ],
961
+ [
962
+ 0.8888888888888888,
963
+ "#fdca26"
964
+ ],
965
+ [
966
+ 1,
967
+ "#f0f921"
968
+ ]
969
+ ],
970
+ "type": "histogram2d"
971
+ }
972
+ ],
973
+ "histogram2dcontour": [
974
+ {
975
+ "colorbar": {
976
+ "outlinewidth": 0,
977
+ "ticks": ""
978
+ },
979
+ "colorscale": [
980
+ [
981
+ 0,
982
+ "#0d0887"
983
+ ],
984
+ [
985
+ 0.1111111111111111,
986
+ "#46039f"
987
+ ],
988
+ [
989
+ 0.2222222222222222,
990
+ "#7201a8"
991
+ ],
992
+ [
993
+ 0.3333333333333333,
994
+ "#9c179e"
995
+ ],
996
+ [
997
+ 0.4444444444444444,
998
+ "#bd3786"
999
+ ],
1000
+ [
1001
+ 0.5555555555555556,
1002
+ "#d8576b"
1003
+ ],
1004
+ [
1005
+ 0.6666666666666666,
1006
+ "#ed7953"
1007
+ ],
1008
+ [
1009
+ 0.7777777777777778,
1010
+ "#fb9f3a"
1011
+ ],
1012
+ [
1013
+ 0.8888888888888888,
1014
+ "#fdca26"
1015
+ ],
1016
+ [
1017
+ 1,
1018
+ "#f0f921"
1019
+ ]
1020
+ ],
1021
+ "type": "histogram2dcontour"
1022
+ }
1023
+ ],
1024
+ "mesh3d": [
1025
+ {
1026
+ "colorbar": {
1027
+ "outlinewidth": 0,
1028
+ "ticks": ""
1029
+ },
1030
+ "type": "mesh3d"
1031
+ }
1032
+ ],
1033
+ "parcoords": [
1034
+ {
1035
+ "line": {
1036
+ "colorbar": {
1037
+ "outlinewidth": 0,
1038
+ "ticks": ""
1039
+ }
1040
+ },
1041
+ "type": "parcoords"
1042
+ }
1043
+ ],
1044
+ "pie": [
1045
+ {
1046
+ "automargin": true,
1047
+ "type": "pie"
1048
+ }
1049
+ ],
1050
+ "scatter": [
1051
+ {
1052
+ "fillpattern": {
1053
+ "fillmode": "overlay",
1054
+ "size": 10,
1055
+ "solidity": 0.2
1056
+ },
1057
+ "type": "scatter"
1058
+ }
1059
+ ],
1060
+ "scatter3d": [
1061
+ {
1062
+ "line": {
1063
+ "colorbar": {
1064
+ "outlinewidth": 0,
1065
+ "ticks": ""
1066
+ }
1067
+ },
1068
+ "marker": {
1069
+ "colorbar": {
1070
+ "outlinewidth": 0,
1071
+ "ticks": ""
1072
+ }
1073
+ },
1074
+ "type": "scatter3d"
1075
+ }
1076
+ ],
1077
+ "scattercarpet": [
1078
+ {
1079
+ "marker": {
1080
+ "colorbar": {
1081
+ "outlinewidth": 0,
1082
+ "ticks": ""
1083
+ }
1084
+ },
1085
+ "type": "scattercarpet"
1086
+ }
1087
+ ],
1088
+ "scattergeo": [
1089
+ {
1090
+ "marker": {
1091
+ "colorbar": {
1092
+ "outlinewidth": 0,
1093
+ "ticks": ""
1094
+ }
1095
+ },
1096
+ "type": "scattergeo"
1097
+ }
1098
+ ],
1099
+ "scattergl": [
1100
+ {
1101
+ "marker": {
1102
+ "colorbar": {
1103
+ "outlinewidth": 0,
1104
+ "ticks": ""
1105
+ }
1106
+ },
1107
+ "type": "scattergl"
1108
+ }
1109
+ ],
1110
+ "scattermapbox": [
1111
+ {
1112
+ "marker": {
1113
+ "colorbar": {
1114
+ "outlinewidth": 0,
1115
+ "ticks": ""
1116
+ }
1117
+ },
1118
+ "type": "scattermapbox"
1119
+ }
1120
+ ],
1121
+ "scatterpolar": [
1122
+ {
1123
+ "marker": {
1124
+ "colorbar": {
1125
+ "outlinewidth": 0,
1126
+ "ticks": ""
1127
+ }
1128
+ },
1129
+ "type": "scatterpolar"
1130
+ }
1131
+ ],
1132
+ "scatterpolargl": [
1133
+ {
1134
+ "marker": {
1135
+ "colorbar": {
1136
+ "outlinewidth": 0,
1137
+ "ticks": ""
1138
+ }
1139
+ },
1140
+ "type": "scatterpolargl"
1141
+ }
1142
+ ],
1143
+ "scatterternary": [
1144
+ {
1145
+ "marker": {
1146
+ "colorbar": {
1147
+ "outlinewidth": 0,
1148
+ "ticks": ""
1149
+ }
1150
+ },
1151
+ "type": "scatterternary"
1152
+ }
1153
+ ],
1154
+ "surface": [
1155
+ {
1156
+ "colorbar": {
1157
+ "outlinewidth": 0,
1158
+ "ticks": ""
1159
+ },
1160
+ "colorscale": [
1161
+ [
1162
+ 0,
1163
+ "#0d0887"
1164
+ ],
1165
+ [
1166
+ 0.1111111111111111,
1167
+ "#46039f"
1168
+ ],
1169
+ [
1170
+ 0.2222222222222222,
1171
+ "#7201a8"
1172
+ ],
1173
+ [
1174
+ 0.3333333333333333,
1175
+ "#9c179e"
1176
+ ],
1177
+ [
1178
+ 0.4444444444444444,
1179
+ "#bd3786"
1180
+ ],
1181
+ [
1182
+ 0.5555555555555556,
1183
+ "#d8576b"
1184
+ ],
1185
+ [
1186
+ 0.6666666666666666,
1187
+ "#ed7953"
1188
+ ],
1189
+ [
1190
+ 0.7777777777777778,
1191
+ "#fb9f3a"
1192
+ ],
1193
+ [
1194
+ 0.8888888888888888,
1195
+ "#fdca26"
1196
+ ],
1197
+ [
1198
+ 1,
1199
+ "#f0f921"
1200
+ ]
1201
+ ],
1202
+ "type": "surface"
1203
+ }
1204
+ ],
1205
+ "table": [
1206
+ {
1207
+ "cells": {
1208
+ "fill": {
1209
+ "color": "#EBF0F8"
1210
+ },
1211
+ "line": {
1212
+ "color": "white"
1213
+ }
1214
+ },
1215
+ "header": {
1216
+ "fill": {
1217
+ "color": "#C8D4E3"
1218
+ },
1219
+ "line": {
1220
+ "color": "white"
1221
+ }
1222
+ },
1223
+ "type": "table"
1224
+ }
1225
+ ]
1226
+ },
1227
+ "layout": {
1228
+ "annotationdefaults": {
1229
+ "arrowcolor": "#2a3f5f",
1230
+ "arrowhead": 0,
1231
+ "arrowwidth": 1
1232
+ },
1233
+ "autotypenumbers": "strict",
1234
+ "coloraxis": {
1235
+ "colorbar": {
1236
+ "outlinewidth": 0,
1237
+ "ticks": ""
1238
+ }
1239
+ },
1240
+ "colorscale": {
1241
+ "diverging": [
1242
+ [
1243
+ 0,
1244
+ "#8e0152"
1245
+ ],
1246
+ [
1247
+ 0.1,
1248
+ "#c51b7d"
1249
+ ],
1250
+ [
1251
+ 0.2,
1252
+ "#de77ae"
1253
+ ],
1254
+ [
1255
+ 0.3,
1256
+ "#f1b6da"
1257
+ ],
1258
+ [
1259
+ 0.4,
1260
+ "#fde0ef"
1261
+ ],
1262
+ [
1263
+ 0.5,
1264
+ "#f7f7f7"
1265
+ ],
1266
+ [
1267
+ 0.6,
1268
+ "#e6f5d0"
1269
+ ],
1270
+ [
1271
+ 0.7,
1272
+ "#b8e186"
1273
+ ],
1274
+ [
1275
+ 0.8,
1276
+ "#7fbc41"
1277
+ ],
1278
+ [
1279
+ 0.9,
1280
+ "#4d9221"
1281
+ ],
1282
+ [
1283
+ 1,
1284
+ "#276419"
1285
+ ]
1286
+ ],
1287
+ "sequential": [
1288
+ [
1289
+ 0,
1290
+ "#0d0887"
1291
+ ],
1292
+ [
1293
+ 0.1111111111111111,
1294
+ "#46039f"
1295
+ ],
1296
+ [
1297
+ 0.2222222222222222,
1298
+ "#7201a8"
1299
+ ],
1300
+ [
1301
+ 0.3333333333333333,
1302
+ "#9c179e"
1303
+ ],
1304
+ [
1305
+ 0.4444444444444444,
1306
+ "#bd3786"
1307
+ ],
1308
+ [
1309
+ 0.5555555555555556,
1310
+ "#d8576b"
1311
+ ],
1312
+ [
1313
+ 0.6666666666666666,
1314
+ "#ed7953"
1315
+ ],
1316
+ [
1317
+ 0.7777777777777778,
1318
+ "#fb9f3a"
1319
+ ],
1320
+ [
1321
+ 0.8888888888888888,
1322
+ "#fdca26"
1323
+ ],
1324
+ [
1325
+ 1,
1326
+ "#f0f921"
1327
+ ]
1328
+ ],
1329
+ "sequentialminus": [
1330
+ [
1331
+ 0,
1332
+ "#0d0887"
1333
+ ],
1334
+ [
1335
+ 0.1111111111111111,
1336
+ "#46039f"
1337
+ ],
1338
+ [
1339
+ 0.2222222222222222,
1340
+ "#7201a8"
1341
+ ],
1342
+ [
1343
+ 0.3333333333333333,
1344
+ "#9c179e"
1345
+ ],
1346
+ [
1347
+ 0.4444444444444444,
1348
+ "#bd3786"
1349
+ ],
1350
+ [
1351
+ 0.5555555555555556,
1352
+ "#d8576b"
1353
+ ],
1354
+ [
1355
+ 0.6666666666666666,
1356
+ "#ed7953"
1357
+ ],
1358
+ [
1359
+ 0.7777777777777778,
1360
+ "#fb9f3a"
1361
+ ],
1362
+ [
1363
+ 0.8888888888888888,
1364
+ "#fdca26"
1365
+ ],
1366
+ [
1367
+ 1,
1368
+ "#f0f921"
1369
+ ]
1370
+ ]
1371
+ },
1372
+ "colorway": [
1373
+ "#636efa",
1374
+ "#EF553B",
1375
+ "#00cc96",
1376
+ "#ab63fa",
1377
+ "#FFA15A",
1378
+ "#19d3f3",
1379
+ "#FF6692",
1380
+ "#B6E880",
1381
+ "#FF97FF",
1382
+ "#FECB52"
1383
+ ],
1384
+ "font": {
1385
+ "color": "#2a3f5f"
1386
+ },
1387
+ "geo": {
1388
+ "bgcolor": "white",
1389
+ "lakecolor": "white",
1390
+ "landcolor": "#E5ECF6",
1391
+ "showlakes": true,
1392
+ "showland": true,
1393
+ "subunitcolor": "white"
1394
+ },
1395
+ "hoverlabel": {
1396
+ "align": "left"
1397
+ },
1398
+ "hovermode": "closest",
1399
+ "mapbox": {
1400
+ "style": "light"
1401
+ },
1402
+ "paper_bgcolor": "white",
1403
+ "plot_bgcolor": "#E5ECF6",
1404
+ "polar": {
1405
+ "angularaxis": {
1406
+ "gridcolor": "white",
1407
+ "linecolor": "white",
1408
+ "ticks": ""
1409
+ },
1410
+ "bgcolor": "#E5ECF6",
1411
+ "radialaxis": {
1412
+ "gridcolor": "white",
1413
+ "linecolor": "white",
1414
+ "ticks": ""
1415
+ }
1416
+ },
1417
+ "scene": {
1418
+ "xaxis": {
1419
+ "backgroundcolor": "#E5ECF6",
1420
+ "gridcolor": "white",
1421
+ "gridwidth": 2,
1422
+ "linecolor": "white",
1423
+ "showbackground": true,
1424
+ "ticks": "",
1425
+ "zerolinecolor": "white"
1426
+ },
1427
+ "yaxis": {
1428
+ "backgroundcolor": "#E5ECF6",
1429
+ "gridcolor": "white",
1430
+ "gridwidth": 2,
1431
+ "linecolor": "white",
1432
+ "showbackground": true,
1433
+ "ticks": "",
1434
+ "zerolinecolor": "white"
1435
+ },
1436
+ "zaxis": {
1437
+ "backgroundcolor": "#E5ECF6",
1438
+ "gridcolor": "white",
1439
+ "gridwidth": 2,
1440
+ "linecolor": "white",
1441
+ "showbackground": true,
1442
+ "ticks": "",
1443
+ "zerolinecolor": "white"
1444
+ }
1445
+ },
1446
+ "shapedefaults": {
1447
+ "line": {
1448
+ "color": "#2a3f5f"
1449
+ }
1450
+ },
1451
+ "ternary": {
1452
+ "aaxis": {
1453
+ "gridcolor": "white",
1454
+ "linecolor": "white",
1455
+ "ticks": ""
1456
+ },
1457
+ "baxis": {
1458
+ "gridcolor": "white",
1459
+ "linecolor": "white",
1460
+ "ticks": ""
1461
+ },
1462
+ "bgcolor": "#E5ECF6",
1463
+ "caxis": {
1464
+ "gridcolor": "white",
1465
+ "linecolor": "white",
1466
+ "ticks": ""
1467
+ }
1468
+ },
1469
+ "title": {
1470
+ "x": 0.05
1471
+ },
1472
+ "xaxis": {
1473
+ "automargin": true,
1474
+ "gridcolor": "white",
1475
+ "linecolor": "white",
1476
+ "ticks": "",
1477
+ "title": {
1478
+ "standoff": 15
1479
+ },
1480
+ "zerolinecolor": "white",
1481
+ "zerolinewidth": 2
1482
+ },
1483
+ "yaxis": {
1484
+ "automargin": true,
1485
+ "gridcolor": "white",
1486
+ "linecolor": "white",
1487
+ "ticks": "",
1488
+ "title": {
1489
+ "standoff": 15
1490
+ },
1491
+ "zerolinecolor": "white",
1492
+ "zerolinewidth": 2
1493
+ }
1494
+ }
1495
+ },
1496
+ "title": {
1497
+ "text": "Top 10 Products by Quantity Sold"
1498
+ },
1499
+ "xaxis": {
1500
+ "anchor": "y",
1501
+ "domain": [
1502
+ 0,
1503
+ 1
1504
+ ],
1505
+ "title": {
1506
+ "text": "Quantity"
1507
+ }
1508
+ },
1509
+ "yaxis": {
1510
+ "anchor": "x",
1511
+ "domain": [
1512
+ 0,
1513
+ 1
1514
+ ],
1515
+ "title": {
1516
+ "text": "Description"
1517
+ }
1518
+ }
1519
+ }
1520
+ }
1521
+ },
1522
+ "metadata": {},
1523
+ "output_type": "display_data"
1524
+ }
1525
+ ],
1526
+ "source": [
1527
+ "TopProducts.reset_index(inplace=True)\n",
1528
+ "\n",
1529
+ "px.bar(TopProducts.head(10), y='Description', x='Quantity',\n",
1530
+ " orientation='h',\n",
1531
+ " title='Top 10 Products by Quantity Sold')"
1532
+ ]
1533
+ },
1534
+ {
1535
+ "cell_type": "markdown",
1536
+ "metadata": {},
1537
+ "source": [
1538
+ "The product with the highest quantity sold is \"PAPER CRAFT, LITTLE BIRDIE,\" with approximately 80,000 units."
1539
+ ]
1540
+ },
1541
+ {
1542
+ "cell_type": "markdown",
1543
+ "metadata": {},
1544
+ "source": [
1545
+ "Let’s check out the number of unique customers:"
1546
+ ]
1547
+ },
1548
+ {
1549
+ "cell_type": "code",
1550
+ "execution_count": 13,
1551
+ "metadata": {},
1552
+ "outputs": [
1553
+ {
1554
+ "data": {
1555
+ "text/plain": [
1556
+ "4339"
1557
+ ]
1558
+ },
1559
+ "execution_count": 13,
1560
+ "metadata": {},
1561
+ "output_type": "execute_result"
1562
+ }
1563
+ ],
1564
+ "source": [
1565
+ "customers = df[\"CustomerID\"].unique().tolist()\n",
1566
+ "len(customers)"
1567
+ ]
1568
+ },
1569
+ {
1570
+ "cell_type": "markdown",
1571
+ "metadata": {},
1572
+ "source": [
1573
+ "Top Products by Number of Customers"
1574
+ ]
1575
+ },
1576
+ {
1577
+ "cell_type": "code",
1578
+ "execution_count": 14,
1579
+ "metadata": {},
1580
+ "outputs": [],
1581
+ "source": [
1582
+ "CustomersBoughts = df.pivot_table(index=['StockCode','Description'],\n",
1583
+ " values='CustomerID',\n",
1584
+ " aggfunc=lambda x: len(x.unique())).sort_values(by='CustomerID', ascending=False)"
1585
+ ]
1586
+ },
1587
+ {
1588
+ "cell_type": "code",
1589
+ "execution_count": 15,
1590
+ "metadata": {},
1591
+ "outputs": [
1592
+ {
1593
+ "data": {
1594
+ "text/html": [
1595
+ "<div>\n",
1596
+ "<style scoped>\n",
1597
+ " .dataframe tbody tr th:only-of-type {\n",
1598
+ " vertical-align: middle;\n",
1599
+ " }\n",
1600
+ "\n",
1601
+ " .dataframe tbody tr th {\n",
1602
+ " vertical-align: top;\n",
1603
+ " }\n",
1604
+ "\n",
1605
+ " .dataframe thead th {\n",
1606
+ " text-align: right;\n",
1607
+ " }\n",
1608
+ "</style>\n",
1609
+ "<table border=\"1\" class=\"dataframe\">\n",
1610
+ " <thead>\n",
1611
+ " <tr style=\"text-align: right;\">\n",
1612
+ " <th></th>\n",
1613
+ " <th></th>\n",
1614
+ " <th>CustomerID</th>\n",
1615
+ " </tr>\n",
1616
+ " <tr>\n",
1617
+ " <th>StockCode</th>\n",
1618
+ " <th>Description</th>\n",
1619
+ " <th></th>\n",
1620
+ " </tr>\n",
1621
+ " </thead>\n",
1622
+ " <tbody>\n",
1623
+ " <tr>\n",
1624
+ " <th>22423</th>\n",
1625
+ " <th>REGENCY CAKESTAND 3 TIER</th>\n",
1626
+ " <td>881</td>\n",
1627
+ " </tr>\n",
1628
+ " <tr>\n",
1629
+ " <th>85123A</th>\n",
1630
+ " <th>WHITE HANGING HEART T-LIGHT HOLDER</th>\n",
1631
+ " <td>856</td>\n",
1632
+ " </tr>\n",
1633
+ " <tr>\n",
1634
+ " <th>47566</th>\n",
1635
+ " <th>PARTY BUNTING</th>\n",
1636
+ " <td>708</td>\n",
1637
+ " </tr>\n",
1638
+ " <tr>\n",
1639
+ " <th>84879</th>\n",
1640
+ " <th>ASSORTED COLOUR BIRD ORNAMENT</th>\n",
1641
+ " <td>678</td>\n",
1642
+ " </tr>\n",
1643
+ " <tr>\n",
1644
+ " <th>22720</th>\n",
1645
+ " <th>SET OF 3 CAKE TINS PANTRY DESIGN</th>\n",
1646
+ " <td>640</td>\n",
1647
+ " </tr>\n",
1648
+ " <tr>\n",
1649
+ " <th>21212</th>\n",
1650
+ " <th>PACK OF 72 RETROSPOT CAKE CASES</th>\n",
1651
+ " <td>635</td>\n",
1652
+ " </tr>\n",
1653
+ " <tr>\n",
1654
+ " <th>85099B</th>\n",
1655
+ " <th>JUMBO BAG RED RETROSPOT</th>\n",
1656
+ " <td>635</td>\n",
1657
+ " </tr>\n",
1658
+ " <tr>\n",
1659
+ " <th>22086</th>\n",
1660
+ " <th>PAPER CHAIN KIT 50'S CHRISTMAS</th>\n",
1661
+ " <td>613</td>\n",
1662
+ " </tr>\n",
1663
+ " <tr>\n",
1664
+ " <th>22457</th>\n",
1665
+ " <th>NATURAL SLATE HEART CHALKBOARD</th>\n",
1666
+ " <td>587</td>\n",
1667
+ " </tr>\n",
1668
+ " <tr>\n",
1669
+ " <th>22138</th>\n",
1670
+ " <th>BAKING SET 9 PIECE RETROSPOT</th>\n",
1671
+ " <td>581</td>\n",
1672
+ " </tr>\n",
1673
+ " </tbody>\n",
1674
+ "</table>\n",
1675
+ "</div>"
1676
+ ],
1677
+ "text/plain": [
1678
+ " CustomerID\n",
1679
+ "StockCode Description \n",
1680
+ "22423 REGENCY CAKESTAND 3 TIER 881\n",
1681
+ "85123A WHITE HANGING HEART T-LIGHT HOLDER 856\n",
1682
+ "47566 PARTY BUNTING 708\n",
1683
+ "84879 ASSORTED COLOUR BIRD ORNAMENT 678\n",
1684
+ "22720 SET OF 3 CAKE TINS PANTRY DESIGN 640\n",
1685
+ "21212 PACK OF 72 RETROSPOT CAKE CASES 635\n",
1686
+ "85099B JUMBO BAG RED RETROSPOT 635\n",
1687
+ "22086 PAPER CHAIN KIT 50'S CHRISTMAS 613\n",
1688
+ "22457 NATURAL SLATE HEART CHALKBOARD 587\n",
1689
+ "22138 BAKING SET 9 PIECE RETROSPOT 581"
1690
+ ]
1691
+ },
1692
+ "execution_count": 15,
1693
+ "metadata": {},
1694
+ "output_type": "execute_result"
1695
+ }
1696
+ ],
1697
+ "source": [
1698
+ "CustomersBoughts.head(10)\n"
1699
+ ]
1700
+ },
1701
+ {
1702
+ "cell_type": "markdown",
1703
+ "metadata": {},
1704
+ "source": [
1705
+ "Top 10 products by number of customers"
1706
+ ]
1707
+ },
1708
+ {
1709
+ "cell_type": "code",
1710
+ "execution_count": 16,
1711
+ "metadata": {},
1712
+ "outputs": [
1713
+ {
1714
+ "data": {
1715
+ "application/vnd.plotly.v1+json": {
1716
+ "config": {
1717
+ "plotlyServerURL": "https://plot.ly"
1718
+ },
1719
+ "data": [
1720
+ {
1721
+ "alignmentgroup": "True",
1722
+ "hovertemplate": "CustomerID=%{x}<br>Description=%{y}<extra></extra>",
1723
+ "legendgroup": "",
1724
+ "marker": {
1725
+ "color": "#636efa",
1726
+ "pattern": {
1727
+ "shape": ""
1728
+ }
1729
+ },
1730
+ "name": "",
1731
+ "offsetgroup": "",
1732
+ "orientation": "h",
1733
+ "showlegend": false,
1734
+ "textposition": "auto",
1735
+ "type": "bar",
1736
+ "x": [
1737
+ 881,
1738
+ 856,
1739
+ 708,
1740
+ 678,
1741
+ 640,
1742
+ 635,
1743
+ 635,
1744
+ 613,
1745
+ 587,
1746
+ 581
1747
+ ],
1748
+ "xaxis": "x",
1749
+ "y": [
1750
+ "REGENCY CAKESTAND 3 TIER",
1751
+ "WHITE HANGING HEART T-LIGHT HOLDER",
1752
+ "PARTY BUNTING",
1753
+ "ASSORTED COLOUR BIRD ORNAMENT",
1754
+ "SET OF 3 CAKE TINS PANTRY DESIGN ",
1755
+ "PACK OF 72 RETROSPOT CAKE CASES",
1756
+ "JUMBO BAG RED RETROSPOT",
1757
+ "PAPER CHAIN KIT 50'S CHRISTMAS ",
1758
+ "NATURAL SLATE HEART CHALKBOARD ",
1759
+ "BAKING SET 9 PIECE RETROSPOT "
1760
+ ],
1761
+ "yaxis": "y"
1762
+ }
1763
+ ],
1764
+ "layout": {
1765
+ "barmode": "relative",
1766
+ "legend": {
1767
+ "tracegroupgap": 0
1768
+ },
1769
+ "template": {
1770
+ "data": {
1771
+ "bar": [
1772
+ {
1773
+ "error_x": {
1774
+ "color": "#2a3f5f"
1775
+ },
1776
+ "error_y": {
1777
+ "color": "#2a3f5f"
1778
+ },
1779
+ "marker": {
1780
+ "line": {
1781
+ "color": "#E5ECF6",
1782
+ "width": 0.5
1783
+ },
1784
+ "pattern": {
1785
+ "fillmode": "overlay",
1786
+ "size": 10,
1787
+ "solidity": 0.2
1788
+ }
1789
+ },
1790
+ "type": "bar"
1791
+ }
1792
+ ],
1793
+ "barpolar": [
1794
+ {
1795
+ "marker": {
1796
+ "line": {
1797
+ "color": "#E5ECF6",
1798
+ "width": 0.5
1799
+ },
1800
+ "pattern": {
1801
+ "fillmode": "overlay",
1802
+ "size": 10,
1803
+ "solidity": 0.2
1804
+ }
1805
+ },
1806
+ "type": "barpolar"
1807
+ }
1808
+ ],
1809
+ "carpet": [
1810
+ {
1811
+ "aaxis": {
1812
+ "endlinecolor": "#2a3f5f",
1813
+ "gridcolor": "white",
1814
+ "linecolor": "white",
1815
+ "minorgridcolor": "white",
1816
+ "startlinecolor": "#2a3f5f"
1817
+ },
1818
+ "baxis": {
1819
+ "endlinecolor": "#2a3f5f",
1820
+ "gridcolor": "white",
1821
+ "linecolor": "white",
1822
+ "minorgridcolor": "white",
1823
+ "startlinecolor": "#2a3f5f"
1824
+ },
1825
+ "type": "carpet"
1826
+ }
1827
+ ],
1828
+ "choropleth": [
1829
+ {
1830
+ "colorbar": {
1831
+ "outlinewidth": 0,
1832
+ "ticks": ""
1833
+ },
1834
+ "type": "choropleth"
1835
+ }
1836
+ ],
1837
+ "contour": [
1838
+ {
1839
+ "colorbar": {
1840
+ "outlinewidth": 0,
1841
+ "ticks": ""
1842
+ },
1843
+ "colorscale": [
1844
+ [
1845
+ 0,
1846
+ "#0d0887"
1847
+ ],
1848
+ [
1849
+ 0.1111111111111111,
1850
+ "#46039f"
1851
+ ],
1852
+ [
1853
+ 0.2222222222222222,
1854
+ "#7201a8"
1855
+ ],
1856
+ [
1857
+ 0.3333333333333333,
1858
+ "#9c179e"
1859
+ ],
1860
+ [
1861
+ 0.4444444444444444,
1862
+ "#bd3786"
1863
+ ],
1864
+ [
1865
+ 0.5555555555555556,
1866
+ "#d8576b"
1867
+ ],
1868
+ [
1869
+ 0.6666666666666666,
1870
+ "#ed7953"
1871
+ ],
1872
+ [
1873
+ 0.7777777777777778,
1874
+ "#fb9f3a"
1875
+ ],
1876
+ [
1877
+ 0.8888888888888888,
1878
+ "#fdca26"
1879
+ ],
1880
+ [
1881
+ 1,
1882
+ "#f0f921"
1883
+ ]
1884
+ ],
1885
+ "type": "contour"
1886
+ }
1887
+ ],
1888
+ "contourcarpet": [
1889
+ {
1890
+ "colorbar": {
1891
+ "outlinewidth": 0,
1892
+ "ticks": ""
1893
+ },
1894
+ "type": "contourcarpet"
1895
+ }
1896
+ ],
1897
+ "heatmap": [
1898
+ {
1899
+ "colorbar": {
1900
+ "outlinewidth": 0,
1901
+ "ticks": ""
1902
+ },
1903
+ "colorscale": [
1904
+ [
1905
+ 0,
1906
+ "#0d0887"
1907
+ ],
1908
+ [
1909
+ 0.1111111111111111,
1910
+ "#46039f"
1911
+ ],
1912
+ [
1913
+ 0.2222222222222222,
1914
+ "#7201a8"
1915
+ ],
1916
+ [
1917
+ 0.3333333333333333,
1918
+ "#9c179e"
1919
+ ],
1920
+ [
1921
+ 0.4444444444444444,
1922
+ "#bd3786"
1923
+ ],
1924
+ [
1925
+ 0.5555555555555556,
1926
+ "#d8576b"
1927
+ ],
1928
+ [
1929
+ 0.6666666666666666,
1930
+ "#ed7953"
1931
+ ],
1932
+ [
1933
+ 0.7777777777777778,
1934
+ "#fb9f3a"
1935
+ ],
1936
+ [
1937
+ 0.8888888888888888,
1938
+ "#fdca26"
1939
+ ],
1940
+ [
1941
+ 1,
1942
+ "#f0f921"
1943
+ ]
1944
+ ],
1945
+ "type": "heatmap"
1946
+ }
1947
+ ],
1948
+ "heatmapgl": [
1949
+ {
1950
+ "colorbar": {
1951
+ "outlinewidth": 0,
1952
+ "ticks": ""
1953
+ },
1954
+ "colorscale": [
1955
+ [
1956
+ 0,
1957
+ "#0d0887"
1958
+ ],
1959
+ [
1960
+ 0.1111111111111111,
1961
+ "#46039f"
1962
+ ],
1963
+ [
1964
+ 0.2222222222222222,
1965
+ "#7201a8"
1966
+ ],
1967
+ [
1968
+ 0.3333333333333333,
1969
+ "#9c179e"
1970
+ ],
1971
+ [
1972
+ 0.4444444444444444,
1973
+ "#bd3786"
1974
+ ],
1975
+ [
1976
+ 0.5555555555555556,
1977
+ "#d8576b"
1978
+ ],
1979
+ [
1980
+ 0.6666666666666666,
1981
+ "#ed7953"
1982
+ ],
1983
+ [
1984
+ 0.7777777777777778,
1985
+ "#fb9f3a"
1986
+ ],
1987
+ [
1988
+ 0.8888888888888888,
1989
+ "#fdca26"
1990
+ ],
1991
+ [
1992
+ 1,
1993
+ "#f0f921"
1994
+ ]
1995
+ ],
1996
+ "type": "heatmapgl"
1997
+ }
1998
+ ],
1999
+ "histogram": [
2000
+ {
2001
+ "marker": {
2002
+ "pattern": {
2003
+ "fillmode": "overlay",
2004
+ "size": 10,
2005
+ "solidity": 0.2
2006
+ }
2007
+ },
2008
+ "type": "histogram"
2009
+ }
2010
+ ],
2011
+ "histogram2d": [
2012
+ {
2013
+ "colorbar": {
2014
+ "outlinewidth": 0,
2015
+ "ticks": ""
2016
+ },
2017
+ "colorscale": [
2018
+ [
2019
+ 0,
2020
+ "#0d0887"
2021
+ ],
2022
+ [
2023
+ 0.1111111111111111,
2024
+ "#46039f"
2025
+ ],
2026
+ [
2027
+ 0.2222222222222222,
2028
+ "#7201a8"
2029
+ ],
2030
+ [
2031
+ 0.3333333333333333,
2032
+ "#9c179e"
2033
+ ],
2034
+ [
2035
+ 0.4444444444444444,
2036
+ "#bd3786"
2037
+ ],
2038
+ [
2039
+ 0.5555555555555556,
2040
+ "#d8576b"
2041
+ ],
2042
+ [
2043
+ 0.6666666666666666,
2044
+ "#ed7953"
2045
+ ],
2046
+ [
2047
+ 0.7777777777777778,
2048
+ "#fb9f3a"
2049
+ ],
2050
+ [
2051
+ 0.8888888888888888,
2052
+ "#fdca26"
2053
+ ],
2054
+ [
2055
+ 1,
2056
+ "#f0f921"
2057
+ ]
2058
+ ],
2059
+ "type": "histogram2d"
2060
+ }
2061
+ ],
2062
+ "histogram2dcontour": [
2063
+ {
2064
+ "colorbar": {
2065
+ "outlinewidth": 0,
2066
+ "ticks": ""
2067
+ },
2068
+ "colorscale": [
2069
+ [
2070
+ 0,
2071
+ "#0d0887"
2072
+ ],
2073
+ [
2074
+ 0.1111111111111111,
2075
+ "#46039f"
2076
+ ],
2077
+ [
2078
+ 0.2222222222222222,
2079
+ "#7201a8"
2080
+ ],
2081
+ [
2082
+ 0.3333333333333333,
2083
+ "#9c179e"
2084
+ ],
2085
+ [
2086
+ 0.4444444444444444,
2087
+ "#bd3786"
2088
+ ],
2089
+ [
2090
+ 0.5555555555555556,
2091
+ "#d8576b"
2092
+ ],
2093
+ [
2094
+ 0.6666666666666666,
2095
+ "#ed7953"
2096
+ ],
2097
+ [
2098
+ 0.7777777777777778,
2099
+ "#fb9f3a"
2100
+ ],
2101
+ [
2102
+ 0.8888888888888888,
2103
+ "#fdca26"
2104
+ ],
2105
+ [
2106
+ 1,
2107
+ "#f0f921"
2108
+ ]
2109
+ ],
2110
+ "type": "histogram2dcontour"
2111
+ }
2112
+ ],
2113
+ "mesh3d": [
2114
+ {
2115
+ "colorbar": {
2116
+ "outlinewidth": 0,
2117
+ "ticks": ""
2118
+ },
2119
+ "type": "mesh3d"
2120
+ }
2121
+ ],
2122
+ "parcoords": [
2123
+ {
2124
+ "line": {
2125
+ "colorbar": {
2126
+ "outlinewidth": 0,
2127
+ "ticks": ""
2128
+ }
2129
+ },
2130
+ "type": "parcoords"
2131
+ }
2132
+ ],
2133
+ "pie": [
2134
+ {
2135
+ "automargin": true,
2136
+ "type": "pie"
2137
+ }
2138
+ ],
2139
+ "scatter": [
2140
+ {
2141
+ "fillpattern": {
2142
+ "fillmode": "overlay",
2143
+ "size": 10,
2144
+ "solidity": 0.2
2145
+ },
2146
+ "type": "scatter"
2147
+ }
2148
+ ],
2149
+ "scatter3d": [
2150
+ {
2151
+ "line": {
2152
+ "colorbar": {
2153
+ "outlinewidth": 0,
2154
+ "ticks": ""
2155
+ }
2156
+ },
2157
+ "marker": {
2158
+ "colorbar": {
2159
+ "outlinewidth": 0,
2160
+ "ticks": ""
2161
+ }
2162
+ },
2163
+ "type": "scatter3d"
2164
+ }
2165
+ ],
2166
+ "scattercarpet": [
2167
+ {
2168
+ "marker": {
2169
+ "colorbar": {
2170
+ "outlinewidth": 0,
2171
+ "ticks": ""
2172
+ }
2173
+ },
2174
+ "type": "scattercarpet"
2175
+ }
2176
+ ],
2177
+ "scattergeo": [
2178
+ {
2179
+ "marker": {
2180
+ "colorbar": {
2181
+ "outlinewidth": 0,
2182
+ "ticks": ""
2183
+ }
2184
+ },
2185
+ "type": "scattergeo"
2186
+ }
2187
+ ],
2188
+ "scattergl": [
2189
+ {
2190
+ "marker": {
2191
+ "colorbar": {
2192
+ "outlinewidth": 0,
2193
+ "ticks": ""
2194
+ }
2195
+ },
2196
+ "type": "scattergl"
2197
+ }
2198
+ ],
2199
+ "scattermapbox": [
2200
+ {
2201
+ "marker": {
2202
+ "colorbar": {
2203
+ "outlinewidth": 0,
2204
+ "ticks": ""
2205
+ }
2206
+ },
2207
+ "type": "scattermapbox"
2208
+ }
2209
+ ],
2210
+ "scatterpolar": [
2211
+ {
2212
+ "marker": {
2213
+ "colorbar": {
2214
+ "outlinewidth": 0,
2215
+ "ticks": ""
2216
+ }
2217
+ },
2218
+ "type": "scatterpolar"
2219
+ }
2220
+ ],
2221
+ "scatterpolargl": [
2222
+ {
2223
+ "marker": {
2224
+ "colorbar": {
2225
+ "outlinewidth": 0,
2226
+ "ticks": ""
2227
+ }
2228
+ },
2229
+ "type": "scatterpolargl"
2230
+ }
2231
+ ],
2232
+ "scatterternary": [
2233
+ {
2234
+ "marker": {
2235
+ "colorbar": {
2236
+ "outlinewidth": 0,
2237
+ "ticks": ""
2238
+ }
2239
+ },
2240
+ "type": "scatterternary"
2241
+ }
2242
+ ],
2243
+ "surface": [
2244
+ {
2245
+ "colorbar": {
2246
+ "outlinewidth": 0,
2247
+ "ticks": ""
2248
+ },
2249
+ "colorscale": [
2250
+ [
2251
+ 0,
2252
+ "#0d0887"
2253
+ ],
2254
+ [
2255
+ 0.1111111111111111,
2256
+ "#46039f"
2257
+ ],
2258
+ [
2259
+ 0.2222222222222222,
2260
+ "#7201a8"
2261
+ ],
2262
+ [
2263
+ 0.3333333333333333,
2264
+ "#9c179e"
2265
+ ],
2266
+ [
2267
+ 0.4444444444444444,
2268
+ "#bd3786"
2269
+ ],
2270
+ [
2271
+ 0.5555555555555556,
2272
+ "#d8576b"
2273
+ ],
2274
+ [
2275
+ 0.6666666666666666,
2276
+ "#ed7953"
2277
+ ],
2278
+ [
2279
+ 0.7777777777777778,
2280
+ "#fb9f3a"
2281
+ ],
2282
+ [
2283
+ 0.8888888888888888,
2284
+ "#fdca26"
2285
+ ],
2286
+ [
2287
+ 1,
2288
+ "#f0f921"
2289
+ ]
2290
+ ],
2291
+ "type": "surface"
2292
+ }
2293
+ ],
2294
+ "table": [
2295
+ {
2296
+ "cells": {
2297
+ "fill": {
2298
+ "color": "#EBF0F8"
2299
+ },
2300
+ "line": {
2301
+ "color": "white"
2302
+ }
2303
+ },
2304
+ "header": {
2305
+ "fill": {
2306
+ "color": "#C8D4E3"
2307
+ },
2308
+ "line": {
2309
+ "color": "white"
2310
+ }
2311
+ },
2312
+ "type": "table"
2313
+ }
2314
+ ]
2315
+ },
2316
+ "layout": {
2317
+ "annotationdefaults": {
2318
+ "arrowcolor": "#2a3f5f",
2319
+ "arrowhead": 0,
2320
+ "arrowwidth": 1
2321
+ },
2322
+ "autotypenumbers": "strict",
2323
+ "coloraxis": {
2324
+ "colorbar": {
2325
+ "outlinewidth": 0,
2326
+ "ticks": ""
2327
+ }
2328
+ },
2329
+ "colorscale": {
2330
+ "diverging": [
2331
+ [
2332
+ 0,
2333
+ "#8e0152"
2334
+ ],
2335
+ [
2336
+ 0.1,
2337
+ "#c51b7d"
2338
+ ],
2339
+ [
2340
+ 0.2,
2341
+ "#de77ae"
2342
+ ],
2343
+ [
2344
+ 0.3,
2345
+ "#f1b6da"
2346
+ ],
2347
+ [
2348
+ 0.4,
2349
+ "#fde0ef"
2350
+ ],
2351
+ [
2352
+ 0.5,
2353
+ "#f7f7f7"
2354
+ ],
2355
+ [
2356
+ 0.6,
2357
+ "#e6f5d0"
2358
+ ],
2359
+ [
2360
+ 0.7,
2361
+ "#b8e186"
2362
+ ],
2363
+ [
2364
+ 0.8,
2365
+ "#7fbc41"
2366
+ ],
2367
+ [
2368
+ 0.9,
2369
+ "#4d9221"
2370
+ ],
2371
+ [
2372
+ 1,
2373
+ "#276419"
2374
+ ]
2375
+ ],
2376
+ "sequential": [
2377
+ [
2378
+ 0,
2379
+ "#0d0887"
2380
+ ],
2381
+ [
2382
+ 0.1111111111111111,
2383
+ "#46039f"
2384
+ ],
2385
+ [
2386
+ 0.2222222222222222,
2387
+ "#7201a8"
2388
+ ],
2389
+ [
2390
+ 0.3333333333333333,
2391
+ "#9c179e"
2392
+ ],
2393
+ [
2394
+ 0.4444444444444444,
2395
+ "#bd3786"
2396
+ ],
2397
+ [
2398
+ 0.5555555555555556,
2399
+ "#d8576b"
2400
+ ],
2401
+ [
2402
+ 0.6666666666666666,
2403
+ "#ed7953"
2404
+ ],
2405
+ [
2406
+ 0.7777777777777778,
2407
+ "#fb9f3a"
2408
+ ],
2409
+ [
2410
+ 0.8888888888888888,
2411
+ "#fdca26"
2412
+ ],
2413
+ [
2414
+ 1,
2415
+ "#f0f921"
2416
+ ]
2417
+ ],
2418
+ "sequentialminus": [
2419
+ [
2420
+ 0,
2421
+ "#0d0887"
2422
+ ],
2423
+ [
2424
+ 0.1111111111111111,
2425
+ "#46039f"
2426
+ ],
2427
+ [
2428
+ 0.2222222222222222,
2429
+ "#7201a8"
2430
+ ],
2431
+ [
2432
+ 0.3333333333333333,
2433
+ "#9c179e"
2434
+ ],
2435
+ [
2436
+ 0.4444444444444444,
2437
+ "#bd3786"
2438
+ ],
2439
+ [
2440
+ 0.5555555555555556,
2441
+ "#d8576b"
2442
+ ],
2443
+ [
2444
+ 0.6666666666666666,
2445
+ "#ed7953"
2446
+ ],
2447
+ [
2448
+ 0.7777777777777778,
2449
+ "#fb9f3a"
2450
+ ],
2451
+ [
2452
+ 0.8888888888888888,
2453
+ "#fdca26"
2454
+ ],
2455
+ [
2456
+ 1,
2457
+ "#f0f921"
2458
+ ]
2459
+ ]
2460
+ },
2461
+ "colorway": [
2462
+ "#636efa",
2463
+ "#EF553B",
2464
+ "#00cc96",
2465
+ "#ab63fa",
2466
+ "#FFA15A",
2467
+ "#19d3f3",
2468
+ "#FF6692",
2469
+ "#B6E880",
2470
+ "#FF97FF",
2471
+ "#FECB52"
2472
+ ],
2473
+ "font": {
2474
+ "color": "#2a3f5f"
2475
+ },
2476
+ "geo": {
2477
+ "bgcolor": "white",
2478
+ "lakecolor": "white",
2479
+ "landcolor": "#E5ECF6",
2480
+ "showlakes": true,
2481
+ "showland": true,
2482
+ "subunitcolor": "white"
2483
+ },
2484
+ "hoverlabel": {
2485
+ "align": "left"
2486
+ },
2487
+ "hovermode": "closest",
2488
+ "mapbox": {
2489
+ "style": "light"
2490
+ },
2491
+ "paper_bgcolor": "white",
2492
+ "plot_bgcolor": "#E5ECF6",
2493
+ "polar": {
2494
+ "angularaxis": {
2495
+ "gridcolor": "white",
2496
+ "linecolor": "white",
2497
+ "ticks": ""
2498
+ },
2499
+ "bgcolor": "#E5ECF6",
2500
+ "radialaxis": {
2501
+ "gridcolor": "white",
2502
+ "linecolor": "white",
2503
+ "ticks": ""
2504
+ }
2505
+ },
2506
+ "scene": {
2507
+ "xaxis": {
2508
+ "backgroundcolor": "#E5ECF6",
2509
+ "gridcolor": "white",
2510
+ "gridwidth": 2,
2511
+ "linecolor": "white",
2512
+ "showbackground": true,
2513
+ "ticks": "",
2514
+ "zerolinecolor": "white"
2515
+ },
2516
+ "yaxis": {
2517
+ "backgroundcolor": "#E5ECF6",
2518
+ "gridcolor": "white",
2519
+ "gridwidth": 2,
2520
+ "linecolor": "white",
2521
+ "showbackground": true,
2522
+ "ticks": "",
2523
+ "zerolinecolor": "white"
2524
+ },
2525
+ "zaxis": {
2526
+ "backgroundcolor": "#E5ECF6",
2527
+ "gridcolor": "white",
2528
+ "gridwidth": 2,
2529
+ "linecolor": "white",
2530
+ "showbackground": true,
2531
+ "ticks": "",
2532
+ "zerolinecolor": "white"
2533
+ }
2534
+ },
2535
+ "shapedefaults": {
2536
+ "line": {
2537
+ "color": "#2a3f5f"
2538
+ }
2539
+ },
2540
+ "ternary": {
2541
+ "aaxis": {
2542
+ "gridcolor": "white",
2543
+ "linecolor": "white",
2544
+ "ticks": ""
2545
+ },
2546
+ "baxis": {
2547
+ "gridcolor": "white",
2548
+ "linecolor": "white",
2549
+ "ticks": ""
2550
+ },
2551
+ "bgcolor": "#E5ECF6",
2552
+ "caxis": {
2553
+ "gridcolor": "white",
2554
+ "linecolor": "white",
2555
+ "ticks": ""
2556
+ }
2557
+ },
2558
+ "title": {
2559
+ "x": 0.05
2560
+ },
2561
+ "xaxis": {
2562
+ "automargin": true,
2563
+ "gridcolor": "white",
2564
+ "linecolor": "white",
2565
+ "ticks": "",
2566
+ "title": {
2567
+ "standoff": 15
2568
+ },
2569
+ "zerolinecolor": "white",
2570
+ "zerolinewidth": 2
2571
+ },
2572
+ "yaxis": {
2573
+ "automargin": true,
2574
+ "gridcolor": "white",
2575
+ "linecolor": "white",
2576
+ "ticks": "",
2577
+ "title": {
2578
+ "standoff": 15
2579
+ },
2580
+ "zerolinecolor": "white",
2581
+ "zerolinewidth": 2
2582
+ }
2583
+ }
2584
+ },
2585
+ "title": {
2586
+ "text": "Top 10 Products by Number of Customers"
2587
+ },
2588
+ "xaxis": {
2589
+ "anchor": "y",
2590
+ "domain": [
2591
+ 0,
2592
+ 1
2593
+ ],
2594
+ "title": {
2595
+ "text": "CustomerID"
2596
+ }
2597
+ },
2598
+ "yaxis": {
2599
+ "anchor": "x",
2600
+ "domain": [
2601
+ 0,
2602
+ 1
2603
+ ],
2604
+ "title": {
2605
+ "text": "Description"
2606
+ }
2607
+ }
2608
+ }
2609
+ }
2610
+ },
2611
+ "metadata": {},
2612
+ "output_type": "display_data"
2613
+ }
2614
+ ],
2615
+ "source": [
2616
+ "CustomersBoughts.reset_index(inplace=True)\n",
2617
+ "\n",
2618
+ "px.bar(CustomersBoughts.head(10), y='Description', x='CustomerID',\n",
2619
+ " orientation='h',\n",
2620
+ " title='Top 10 Products by Number of Customers')"
2621
+ ]
2622
+ },
2623
+ {
2624
+ "cell_type": "markdown",
2625
+ "metadata": {},
2626
+ "source": [
2627
+ "Prepare Data For Modelling"
2628
+ ]
2629
+ },
2630
+ {
2631
+ "cell_type": "markdown",
2632
+ "metadata": {},
2633
+ "source": [
2634
+ "Splitting Data::::\n",
2635
+ " We will use 90% data of the customers as a training dataset to create word2vec embeddings."
2636
+ ]
2637
+ },
2638
+ {
2639
+ "cell_type": "code",
2640
+ "execution_count": 17,
2641
+ "metadata": {},
2642
+ "outputs": [],
2643
+ "source": [
2644
+ "random.shuffle(customers)\n",
2645
+ "\n",
2646
+ "# extract 90% of customer ID's\n",
2647
+ "customers_train = [customers[i] for i in range(round(0.9*len(customers)))]\n",
2648
+ "\n",
2649
+ "# split data into train and validation set\n",
2650
+ "train_df = df[df['CustomerID'].isin(customers_train)]\n",
2651
+ "validation_df = df[~df['CustomerID'].isin(customers_train)]"
2652
+ ]
2653
+ },
2654
+ {
2655
+ "cell_type": "markdown",
2656
+ "metadata": {},
2657
+ "source": [
2658
+ "Creating Sequence of Purchases for training dataset::::"
2659
+ ]
2660
+ },
2661
+ {
2662
+ "cell_type": "code",
2663
+ "execution_count": 18,
2664
+ "metadata": {},
2665
+ "outputs": [
2666
+ {
2667
+ "name": "stderr",
2668
+ "output_type": "stream",
2669
+ "text": [
2670
+ "100%|██████████| 3905/3905 [00:01<00:00, 1954.11it/s]\n"
2671
+ ]
2672
+ }
2673
+ ],
2674
+ "source": [
2675
+ "purchases_train = []\n",
2676
+ "\n",
2677
+ "for i in tqdm(customers_train):\n",
2678
+ " temp = train_df[train_df[\"CustomerID\"] == i][\"StockCode\"].tolist()\n",
2679
+ " purchases_train.append(temp)"
2680
+ ]
2681
+ },
2682
+ {
2683
+ "cell_type": "code",
2684
+ "execution_count": 19,
2685
+ "metadata": {},
2686
+ "outputs": [
2687
+ {
2688
+ "name": "stderr",
2689
+ "output_type": "stream",
2690
+ "text": [
2691
+ "100%|██████████| 434/434 [00:00<00:00, 2451.86it/s]\n"
2692
+ ]
2693
+ }
2694
+ ],
2695
+ "source": [
2696
+ "purchases_val = []\n",
2697
+ "\n",
2698
+ "for i in tqdm(validation_df['CustomerID'].unique()):\n",
2699
+ " temp = validation_df[validation_df[\"CustomerID\"] == i][\"StockCode\"].tolist()\n",
2700
+ " purchases_val.append(temp)"
2701
+ ]
2702
+ },
2703
+ {
2704
+ "cell_type": "markdown",
2705
+ "metadata": {},
2706
+ "source": [
2707
+ "Building a Recommendation System"
2708
+ ]
2709
+ },
2710
+ {
2711
+ "cell_type": "markdown",
2712
+ "metadata": {},
2713
+ "source": [
2714
+ "Building word2vec Embeddings for products"
2715
+ ]
2716
+ },
2717
+ {
2718
+ "cell_type": "code",
2719
+ "execution_count": 20,
2720
+ "metadata": {},
2721
+ "outputs": [
2722
+ {
2723
+ "name": "stdout",
2724
+ "output_type": "stream",
2725
+ "text": [
2726
+ "Collecting gensim\n",
2727
+ " Downloading gensim-4.3.3-cp311-cp311-win_amd64.whl.metadata (8.2 kB)\n",
2728
+ "Requirement already satisfied: numpy<2.0,>=1.18.5 in c:\\python3.11.1\\lib\\site-packages (from gensim) (1.26.4)\n",
2729
+ "Requirement already satisfied: scipy<1.14.0,>=1.7.0 in c:\\python3.11.1\\lib\\site-packages (from gensim) (1.12.0)\n",
2730
+ "Requirement already satisfied: smart-open>=1.8.1 in c:\\python3.11.1\\lib\\site-packages (from gensim) (7.0.4)\n",
2731
+ "Requirement already satisfied: wrapt in c:\\python3.11.1\\lib\\site-packages (from smart-open>=1.8.1->gensim) (1.16.0)\n",
2732
+ "Downloading gensim-4.3.3-cp311-cp311-win_amd64.whl (24.0 MB)\n",
2733
+ " ---------------------------------------- 0.0/24.0 MB ? eta -:--:--\n",
2734
+ " ---------------------------------------- 0.1/24.0 MB 2.8 MB/s eta 0:00:09\n",
2735
+ " --------------------------------------- 0.4/24.0 MB 3.9 MB/s eta 0:00:07\n",
2736
+ " --------------------------------------- 0.6/24.0 MB 4.1 MB/s eta 0:00:06\n",
2737
+ " - -------------------------------------- 0.8/24.0 MB 4.2 MB/s eta 0:00:06\n",
2738
+ " - -------------------------------------- 1.0/24.0 MB 4.1 MB/s eta 0:00:06\n",
2739
+ " -- ------------------------------------- 1.3/24.0 MB 4.4 MB/s eta 0:00:06\n",
2740
+ " -- ------------------------------------- 1.5/24.0 MB 4.7 MB/s eta 0:00:05\n",
2741
+ " -- ------------------------------------- 1.7/24.0 MB 4.8 MB/s eta 0:00:05\n",
2742
+ " --- ------------------------------------ 1.9/24.0 MB 4.6 MB/s eta 0:00:05\n",
2743
+ " --- ------------------------------------ 2.1/24.0 MB 4.5 MB/s eta 0:00:05\n",
2744
+ " --- ------------------------------------ 2.3/24.0 MB 4.6 MB/s eta 0:00:05\n",
2745
+ " ---- ----------------------------------- 2.5/24.0 MB 4.5 MB/s eta 0:00:05\n",
2746
+ " ---- ----------------------------------- 2.8/24.0 MB 4.6 MB/s eta 0:00:05\n",
2747
+ " ----- ---------------------------------- 3.1/24.0 MB 4.7 MB/s eta 0:00:05\n",
2748
+ " ----- ---------------------------------- 3.3/24.0 MB 4.7 MB/s eta 0:00:05\n",
2749
+ " ----- ---------------------------------- 3.6/24.0 MB 4.7 MB/s eta 0:00:05\n",
2750
+ " ------ --------------------------------- 3.8/24.0 MB 4.8 MB/s eta 0:00:05\n",
2751
+ " ------ --------------------------------- 4.0/24.0 MB 4.8 MB/s eta 0:00:05\n",
2752
+ " ------- -------------------------------- 4.2/24.0 MB 4.8 MB/s eta 0:00:05\n",
2753
+ " ------- -------------------------------- 4.4/24.0 MB 4.7 MB/s eta 0:00:05\n",
2754
+ " ------- -------------------------------- 4.7/24.0 MB 4.7 MB/s eta 0:00:05\n",
2755
+ " -------- ------------------------------- 4.9/24.0 MB 4.8 MB/s eta 0:00:05\n",
2756
+ " -------- ------------------------------- 5.1/24.0 MB 4.8 MB/s eta 0:00:04\n",
2757
+ " -------- ------------------------------- 5.4/24.0 MB 4.8 MB/s eta 0:00:04\n",
2758
+ " --------- ------------------------------ 5.6/24.0 MB 4.9 MB/s eta 0:00:04\n",
2759
+ " --------- ------------------------------ 5.9/24.0 MB 4.9 MB/s eta 0:00:04\n",
2760
+ " ---------- ----------------------------- 6.1/24.0 MB 4.9 MB/s eta 0:00:04\n",
2761
+ " ---------- ----------------------------- 6.5/24.0 MB 5.0 MB/s eta 0:00:04\n",
2762
+ " ----------- ---------------------------- 6.6/24.0 MB 4.9 MB/s eta 0:00:04\n",
2763
+ " ----------- ---------------------------- 6.9/24.0 MB 5.0 MB/s eta 0:00:04\n",
2764
+ " ----------- ---------------------------- 7.1/24.0 MB 4.9 MB/s eta 0:00:04\n",
2765
+ " ------------ --------------------------- 7.4/24.0 MB 5.0 MB/s eta 0:00:04\n",
2766
+ " ------------ --------------------------- 7.6/24.0 MB 5.0 MB/s eta 0:00:04\n",
2767
+ " ------------- -------------------------- 7.8/24.0 MB 5.0 MB/s eta 0:00:04\n",
2768
+ " ------------- -------------------------- 8.1/24.0 MB 5.0 MB/s eta 0:00:04\n",
2769
+ " ------------- -------------------------- 8.3/24.0 MB 5.0 MB/s eta 0:00:04\n",
2770
+ " -------------- ------------------------- 8.6/24.0 MB 5.0 MB/s eta 0:00:04\n",
2771
+ " -------------- ------------------------- 8.8/24.0 MB 5.0 MB/s eta 0:00:04\n",
2772
+ " --------------- ------------------------ 9.1/24.0 MB 5.0 MB/s eta 0:00:03\n",
2773
+ " --------------- ------------------------ 9.3/24.0 MB 5.0 MB/s eta 0:00:03\n",
2774
+ " --------------- ------------------------ 9.6/24.0 MB 5.1 MB/s eta 0:00:03\n",
2775
+ " ---------------- ----------------------- 9.8/24.0 MB 5.1 MB/s eta 0:00:03\n",
2776
+ " ---------------- ----------------------- 10.1/24.0 MB 5.1 MB/s eta 0:00:03\n",
2777
+ " ----------------- ---------------------- 10.3/24.0 MB 5.1 MB/s eta 0:00:03\n",
2778
+ " ----------------- ---------------------- 10.5/24.0 MB 5.1 MB/s eta 0:00:03\n",
2779
+ " ----------------- ---------------------- 10.6/24.0 MB 5.0 MB/s eta 0:00:03\n",
2780
+ " ----------------- ---------------------- 10.7/24.0 MB 5.0 MB/s eta 0:00:03\n",
2781
+ " ------------------ --------------------- 10.9/24.0 MB 5.0 MB/s eta 0:00:03\n",
2782
+ " ------------------ --------------------- 11.2/24.0 MB 5.0 MB/s eta 0:00:03\n",
2783
+ " ------------------ --------------------- 11.4/24.0 MB 5.0 MB/s eta 0:00:03\n",
2784
+ " ------------------- -------------------- 11.6/24.0 MB 5.0 MB/s eta 0:00:03\n",
2785
+ " ------------------- -------------------- 11.9/24.0 MB 5.0 MB/s eta 0:00:03\n",
2786
+ " -------------------- ------------------- 12.1/24.0 MB 5.0 MB/s eta 0:00:03\n",
2787
+ " -------------------- ------------------- 12.3/24.0 MB 5.0 MB/s eta 0:00:03\n",
2788
+ " -------------------- ------------------- 12.6/24.0 MB 5.0 MB/s eta 0:00:03\n",
2789
+ " --------------------- ------------------ 12.7/24.0 MB 5.1 MB/s eta 0:00:03\n",
2790
+ " --------------------- ------------------ 13.0/24.0 MB 5.0 MB/s eta 0:00:03\n",
2791
+ " ---------------------- ----------------- 13.2/24.0 MB 5.0 MB/s eta 0:00:03\n",
2792
+ " ---------------------- ----------------- 13.5/24.0 MB 5.0 MB/s eta 0:00:03\n",
2793
+ " ---------------------- ----------------- 13.7/24.0 MB 5.0 MB/s eta 0:00:03\n",
2794
+ " ----------------------- ---------------- 14.0/24.0 MB 5.0 MB/s eta 0:00:02\n",
2795
+ " ----------------------- ---------------- 14.3/24.0 MB 5.1 MB/s eta 0:00:02\n",
2796
+ " ------------------------ --------------- 14.5/24.0 MB 5.1 MB/s eta 0:00:02\n",
2797
+ " ------------------------ --------------- 14.7/24.0 MB 5.1 MB/s eta 0:00:02\n",
2798
+ " ------------------------ --------------- 15.0/24.0 MB 5.1 MB/s eta 0:00:02\n",
2799
+ " ------------------------- -------------- 15.2/24.0 MB 5.1 MB/s eta 0:00:02\n",
2800
+ " ------------------------- -------------- 15.5/24.0 MB 5.1 MB/s eta 0:00:02\n",
2801
+ " -------------------------- ------------- 15.7/24.0 MB 5.1 MB/s eta 0:00:02\n",
2802
+ " -------------------------- ------------- 15.9/24.0 MB 5.1 MB/s eta 0:00:02\n",
2803
+ " -------------------------- ------------- 16.2/24.0 MB 5.1 MB/s eta 0:00:02\n",
2804
+ " --------------------------- ------------ 16.4/24.0 MB 5.1 MB/s eta 0:00:02\n",
2805
+ " --------------------------- ------------ 16.7/24.0 MB 5.0 MB/s eta 0:00:02\n",
2806
+ " --------------------------- ------------ 16.8/24.0 MB 5.0 MB/s eta 0:00:02\n",
2807
+ " ---------------------------- ----------- 17.0/24.0 MB 5.0 MB/s eta 0:00:02\n",
2808
+ " ---------------------------- ----------- 17.3/24.0 MB 5.0 MB/s eta 0:00:02\n",
2809
+ " ----------------------------- ---------- 17.5/24.0 MB 5.0 MB/s eta 0:00:02\n",
2810
+ " ----------------------------- ---------- 17.8/24.0 MB 5.0 MB/s eta 0:00:02\n",
2811
+ " ------------------------------ --------- 18.0/24.0 MB 5.0 MB/s eta 0:00:02\n",
2812
+ " ------------------------------ --------- 18.3/24.0 MB 5.1 MB/s eta 0:00:02\n",
2813
+ " ------------------------------ --------- 18.5/24.0 MB 5.0 MB/s eta 0:00:02\n",
2814
+ " ------------------------------- -------- 18.9/24.0 MB 5.0 MB/s eta 0:00:02\n",
2815
+ " ------------------------------- -------- 19.0/24.0 MB 5.1 MB/s eta 0:00:01\n",
2816
+ " -------------------------------- ------- 19.3/24.0 MB 5.0 MB/s eta 0:00:01\n",
2817
+ " -------------------------------- ------- 19.5/24.0 MB 5.0 MB/s eta 0:00:01\n",
2818
+ " -------------------------------- ------- 19.8/24.0 MB 5.0 MB/s eta 0:00:01\n",
2819
+ " --------------------------------- ------ 20.0/24.0 MB 5.0 MB/s eta 0:00:01\n",
2820
+ " --------------------------------- ------ 20.2/24.0 MB 5.0 MB/s eta 0:00:01\n",
2821
+ " ---------------------------------- ----- 20.5/24.0 MB 5.0 MB/s eta 0:00:01\n",
2822
+ " ---------------------------------- ----- 20.7/24.0 MB 5.1 MB/s eta 0:00:01\n",
2823
+ " ----------------------------------- ---- 21.0/24.0 MB 5.2 MB/s eta 0:00:01\n",
2824
+ " ----------------------------------- ---- 21.3/24.0 MB 5.2 MB/s eta 0:00:01\n",
2825
+ " ----------------------------------- ---- 21.6/24.0 MB 5.3 MB/s eta 0:00:01\n",
2826
+ " ------------------------------------ --- 21.8/24.0 MB 5.3 MB/s eta 0:00:01\n",
2827
+ " ------------------------------------ --- 22.1/24.0 MB 5.3 MB/s eta 0:00:01\n",
2828
+ " ------------------------------------- -- 22.4/24.0 MB 5.3 MB/s eta 0:00:01\n",
2829
+ " ------------------------------------- -- 22.4/24.0 MB 5.2 MB/s eta 0:00:01\n",
2830
+ " ------------------------------------- -- 22.8/24.0 MB 5.2 MB/s eta 0:00:01\n",
2831
+ " -------------------------------------- - 23.0/24.0 MB 5.3 MB/s eta 0:00:01\n",
2832
+ " -------------------------------------- - 23.3/24.0 MB 5.3 MB/s eta 0:00:01\n",
2833
+ " --------------------------------------- 23.6/24.0 MB 5.3 MB/s eta 0:00:01\n",
2834
+ " --------------------------------------- 23.9/24.0 MB 5.4 MB/s eta 0:00:01\n",
2835
+ " --------------------------------------- 24.0/24.0 MB 5.4 MB/s eta 0:00:01\n",
2836
+ " --------------------------------------- 24.0/24.0 MB 5.4 MB/s eta 0:00:01\n",
2837
+ " --------------------------------------- 24.0/24.0 MB 5.4 MB/s eta 0:00:01\n",
2838
+ " --------------------------------------- 24.0/24.0 MB 5.4 MB/s eta 0:00:01\n",
2839
+ " --------------------------------------- 24.0/24.0 MB 5.4 MB/s eta 0:00:01\n",
2840
+ " ---------------------------------------- 24.0/24.0 MB 4.8 MB/s eta 0:00:00\n",
2841
+ "Installing collected packages: gensim\n",
2842
+ "Successfully installed gensim-4.3.3\n",
2843
+ "Note: you may need to restart the kernel to use updated packages.\n"
2844
+ ]
2845
+ },
2846
+ {
2847
+ "name": "stderr",
2848
+ "output_type": "stream",
2849
+ "text": [
2850
+ "\n",
2851
+ "[notice] A new release of pip is available: 24.1.2 -> 24.3.1\n",
2852
+ "[notice] To update, run: python.exe -m pip install --upgrade pip\n"
2853
+ ]
2854
+ }
2855
+ ],
2856
+ "source": [
2857
+ "pip install gensim"
2858
+ ]
2859
+ },
2860
+ {
2861
+ "cell_type": "code",
2862
+ "execution_count": 21,
2863
+ "metadata": {},
2864
+ "outputs": [],
2865
+ "source": [
2866
+ "from gensim.models import Word2Vec"
2867
+ ]
2868
+ },
2869
+ {
2870
+ "cell_type": "markdown",
2871
+ "metadata": {},
2872
+ "source": [
2873
+ "The parameters i will use:/n/n\n",
2874
+ "\n",
2875
+ "window = 15: Defines the maximum distance between the current and predicted word within a sentence./n\n",
2876
+ "sg = 1: Means the model will use the Skip-gram approach/n\n",
2877
+ "hs = 0: Indicates that hierarchical softmax is not used because there arn't large vocabularies./n\n",
2878
+ "negative=10: Sets the number of negative samples to 10./n\n",
2879
+ "alpha=0.03: Set learning rate for the process to 0.03./n\n",
2880
+ "min_alpha=0.0007: Sets the minimum learning rate to 0.0007./n"
2881
+ ]
2882
+ },
2883
+ {
2884
+ "cell_type": "code",
2885
+ "execution_count": null,
2886
+ "metadata": {},
2887
+ "outputs": [],
2888
+ "source": []
2889
+ }
2890
+ ],
2891
+ "metadata": {
2892
+ "kernelspec": {
2893
+ "display_name": "Python 3",
2894
+ "language": "python",
2895
+ "name": "python3"
2896
+ },
2897
+ "language_info": {
2898
+ "codemirror_mode": {
2899
+ "name": "ipython",
2900
+ "version": 3
2901
+ },
2902
+ "file_extension": ".py",
2903
+ "mimetype": "text/x-python",
2904
+ "name": "python",
2905
+ "nbconvert_exporter": "python",
2906
+ "pygments_lexer": "ipython3",
2907
+ "version": "3.11.4"
2908
+ }
2909
+ },
2910
+ "nbformat": 4,
2911
+ "nbformat_minor": 2
2912
+ }
app.py CHANGED
@@ -6,15 +6,17 @@ from sklearn.cluster import KMeans
6
  import matplotlib.pyplot as plt
7
  import seaborn as sns
8
  import plotly.express as px
 
9
 
10
  # Set the page configuration
11
- st.set_page_config(page_title="Customer Segmentation", layout="wide")
12
 
13
  # Title and Description
14
- st.title("🛒 Advanced Customer Segmentation App")
15
  st.markdown("""
16
- This application allows you to perform **Customer Segmentation** using RFM analysis and clustering.
17
- Upload your dataset, analyze the metrics, and visualize customer behaviors interactively.
 
18
  """)
19
 
20
  # Sidebar for uploading data
@@ -128,6 +130,42 @@ fig_cluster = px.scatter_3d(
128
  )
129
  st.plotly_chart(fig_cluster)
130
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
131
  # Export Data
132
  st.header("📤 Export Processed Data")
133
  if st.button("Export RFM Data"):
 
6
  import matplotlib.pyplot as plt
7
  import seaborn as sns
8
  import plotly.express as px
9
+ from mlxtend.frequent_patterns import apriori, association_rules
10
 
11
  # Set the page configuration
12
+ st.set_page_config(page_title="Customer Segmentation and Product Recommendation", layout="wide")
13
 
14
  # Title and Description
15
+ st.title("🛒Customer Segmentation & Product Recommendation App")
16
  st.markdown("""
17
+ This application performs **Customer Segmentation** using RFM analysis and clustering,
18
+ and provides **Product Recommendations** based on purchase patterns.
19
+ Upload your dataset, analyze customer behavior, and visualize results interactively.
20
  """)
21
 
22
  # Sidebar for uploading data
 
130
  )
131
  st.plotly_chart(fig_cluster)
132
 
133
+ # Product Recommendation
134
+ st.header("🛍️ Product Recommendation")
135
+ st.sidebar.subheader("Recommendation Parameters")
136
+ cluster_to_recommend = st.sidebar.selectbox("Select Cluster", rfm["Cluster"].unique())
137
+
138
+ # Filter data by cluster
139
+ customers_in_cluster = rfm[rfm["Cluster"] == cluster_to_recommend]["CustomerID"]
140
+ df_cluster = df[df["CustomerID"].isin(customers_in_cluster)]
141
+
142
+ # Association Rule Mining for Recommendations
143
+ basket = (
144
+ df_cluster.groupby(["InvoiceNo", "Description"])["Quantity"]
145
+ .sum()
146
+ .unstack()
147
+ .fillna(0)
148
+ .applymap(lambda x: 1 if x > 0 else 0)
149
+ )
150
+
151
+ frequent_itemsets = apriori(basket, min_support=0.05, use_colnames=True)
152
+ if not frequent_itemsets.empty:
153
+ rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)
154
+
155
+ # Display top recommendations
156
+ st.write(f"### Recommendations for Cluster {cluster_to_recommend}")
157
+ top_recommendations = rules.sort_values(by="confidence", ascending=False).head(10)
158
+ st.write(top_recommendations[["antecedents", "consequents", "support", "confidence", "lift"]])
159
+ else:
160
+ st.write("No significant patterns found for this cluster.")
161
+
162
+ st.write(f"### Recommendations for Cluster {cluster_to_recommend}")
163
+ if not rules.empty:
164
+ top_recommendations = rules.sort_values(by="confidence", ascending=False).head(10)
165
+ st.write(top_recommendations[["antecedents", "consequents", "support", "confidence", "lift"]])
166
+ else:
167
+ st.write("No significant patterns found for this cluster.")
168
+
169
  # Export Data
170
  st.header("📤 Export Processed Data")
171
  if st.button("Export RFM Data"):
requirements.txt CHANGED
@@ -4,4 +4,6 @@ pandas
4
  seaborn
5
  streamlit
6
  scikit-learn
7
- plotly
 
 
 
4
  seaborn
5
  streamlit
6
  scikit-learn
7
+ plotly
8
+ tqdm
9
+ mlxtend