nnaned commited on
Commit
acaba38
·
1 Parent(s): 0380da1
Files changed (5) hide show
  1. README.md +63 -12
  2. fighters_20_03_2021.csv +0 -0
  3. logs.log +0 -0
  4. ressources/print.jpg +0 -0
  5. ufc_predictor.py +220 -0
README.md CHANGED
@@ -1,12 +1,63 @@
1
- ---
2
- title: Ufcpredict
3
- emoji: 🏃
4
- colorFrom: green
5
- colorTo: indigo
6
- sdk: streamlit
7
- sdk_version: 1.10.0
8
- app_file: app.py
9
- pinned: false
10
- ---
11
-
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+
3
+ ![](./ressources/print.jpg)
4
+ ## Acknowlegements
5
+
6
+ Thanks to [@WarrierRajeev](https://github.com/WarrierRajeev/UFC-Predictions) for uploading original dataset scraped from ufcstats website along with very insighful EDA and preprocessing!
7
+
8
+ ## Fighter Dataset Content
9
+
10
+ The dataset used for prediction of winner is obtained after preprocessing of original dataset to extract averaged statistics of fighters over fights using last statistics.
11
+
12
+ ## Original Dataset Content
13
+ Each row is a compilation of both fighter stats. Fighters are represented by 'red' and 'blue' (for red and blue corner). So for instance, red fighter has the complied average stats of all the fights except the current one. The stats include damage done by the red fighter on the opponent and the damage done by the opponent on the fighter (represented by 'opp' in the columns) in all the fights this particular red fighter has had, except this one as it has not occured yet (in the data). Same information exists for blue fighter. The target variable is 'Winner' which is the only column that tells you what happened.
14
+ Here are some column definitions:
15
+
16
+ ## Column definitions:
17
+
18
+ - `R_` and `B_` prefix signifies red and blue corner fighter stats respectively
19
+ - `_opp_` containing columns is the average of damage done by the opponent on the fighter
20
+ - `KD` is number of knockdowns
21
+ - `SIG_STR` is no. of significant strikes 'landed of attempted'
22
+ - `SIG_STR_pct` is significant strikes percentage
23
+ - `TOTAL_STR` is total strikes 'landed of attempted'
24
+ - `TD` is no. of takedowns
25
+ - `TD_pct` is takedown percentages
26
+ - `SUB_ATT` is no. of submission attempts
27
+ - `PASS` is no. times the guard was passed?
28
+ - `REV` are the number of reversals
29
+ - `HEAD` is no. of significant strinks to the head 'landed of attempted'
30
+ - `BODY` is no. of significant strikes to the body 'landed of attempted'
31
+ - `CLINCH` is no. of significant strikes in the clinch 'landed of attempted'
32
+ - `GROUND` is no. of significant strikes on the ground 'landed of attempted'
33
+ - `win_by` is method of win
34
+ - `last_round` is last round of the fight (ex. if it was a KO in 1st, then this will be 1)
35
+ - `last_round_time` is when the fight ended in the last round
36
+ - `Format` is the format of the fight (3 rounds, 5 rounds etc.)
37
+ - `Referee` is the name of the Ref
38
+ - `date` is the date of the fight
39
+ - `location` is the location in which the event took place
40
+ - `Fight_type` is which weight class and whether it's a title bout or not
41
+ - `Winner` is the winner of the fight
42
+ - `Stance` is the stance of the fighter (orthodox, southpaw, etc.)
43
+ - `Height_cms` is the height in centimeter
44
+ - `Reach_cms` is the reach of the fighter (arm span) in centimeter
45
+ - `Weight_lbs` is the weight of the fighter in pounds (lbs)
46
+ - `age` is the age of the fighter
47
+ - `title_bout` Boolean value of whether it is title fight or not
48
+ - `weight_class` is which weight class the fight is in (Bantamweight, heavyweight, Women's flyweight, etc.)
49
+ - `no_of_rounds` is the number of rounds the fight was scheduled for
50
+ - `current_lose_streak` is the count of current concurrent losses of the fighter
51
+ - `current_win_streak` is the count of current concurrent wins of the fighter
52
+ - `draw` is the number of draws in the fighter's ufc career
53
+ - `wins` is the number of wins in the fighter's ufc career
54
+ - `losses` is the number of losses in the fighter's ufc career
55
+ - `total_rounds_fought` is the average of total rounds fought by the fighter
56
+ - `total_time_fought(seconds)` is the count of total time spent fighting in seconds
57
+ - `total_title_bouts` is the total number of title bouts taken part in by the fighter
58
+ - `win_by_Decision_Majority` is the number of wins by majority judges decision in the fighter's ufc career
59
+ - `win_by_Decision_Split` is the number of wins by split judges decision in the fighter's ufc career
60
+ - `win_by_Decision_Unanimous` is the number of wins by unanimous judges decision in the fighter's ufc career
61
+ - `win_by_KO/TKO` is the number of wins by knockout in the fighter's ufc career
62
+ - `win_by_Submission` is the number of wins by submission in the fighter's ufc career
63
+ - `win_by_TKO_Doctor_Stoppage` is the number of wins by doctor stoppage in the fighter's ufc career
fighters_20_03_2021.csv ADDED
The diff for this file is too large to render. See raw diff
 
logs.log ADDED
The diff for this file is too large to render. See raw diff
 
ressources/print.jpg ADDED
ufc_predictor.py ADDED
@@ -0,0 +1,220 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ import requests
3
+ from bs4 import BeautifulSoup
4
+ import pandas as pd
5
+ from PIL import Image
6
+ from io import BytesIO
7
+ import streamlit as st
8
+ import pandas as pd
9
+ import numpy as np
10
+ from pycaret.classification import load_model,predict_model,blend_models
11
+ import shap
12
+ import streamlit.components.v1 as components
13
+ from sklearn.ensemble import VotingClassifier
14
+
15
+ @st.cache
16
+ def load_data():
17
+ return pd.read_csv("fighters_20_03_2021.csv")
18
+
19
+ def st_shap(plot, height=None):
20
+ shap_html = f"<head>{shap.getjs()}</head><body>{plot.html()}</body>"
21
+ components.html(shap_html, height=height)
22
+
23
+
24
+ def preprocess(dataframe):
25
+ data = dataframe.copy()
26
+ data['Men_or_women'] = data.weight_class.str.lower().str.contains('women').astype(int)
27
+ data['R_total_time_fought(mins)']=data['R_total_time_fought(seconds)']/60
28
+ data['B_total_time_fought(mins)']=data['B_total_time_fought(seconds)']/60
29
+ data['R_total_fights'] = data['R_wins']+data['R_draw']+data['R_losses']
30
+ data['B_total_fights'] = data['B_wins']+data['B_draw']+data['B_losses']
31
+ def home_definer(a,b):
32
+ if a=="unknown" or b=="unknown":
33
+ return "dunno"
34
+ if a==b:
35
+ return "yes"
36
+ return "no"
37
+
38
+ data['R_fighter_home'] = data.apply(lambda x:home_definer(x['R_fighter_country'],x['country_location']),axis=1)
39
+ data['B_fighter_home'] = data.apply(lambda x:home_definer(x['B_fighter_country'],x['country_location']),axis=1)
40
+
41
+ data['fighter_taller_but_not_rangier'] = data.apply(lambda x:(x['R_Height_cms']>x['B_Height_cms'] and x['R_Reach_cms']<=x['B_Reach_cms']),axis=1)
42
+ # time
43
+ data['B_avg_time_fought(mins)'] = data['B_total_time_fought(mins)']/(data['B_total_fights']+1)
44
+ data['R_avg_time_fought(mins)'] = data['R_total_time_fought(mins)']/(data['R_total_fights']+1)
45
+ # over fights (win and losses)
46
+ data['R_ratio_win_over_fights_exp']=(data['R_wins']/(data['R_total_fights']+1))*np.exp(data['R_total_fights']/4)
47
+ data['B_ratio_win_over_fights_exp']=(data['B_wins']/(data['B_total_fights']+1))*np.exp(data['B_total_fights']/4)
48
+ data['R_ratio_win']=(data['R_wins']/(data['R_total_fights']+1))
49
+ data['B_ratio_win']=(data['B_wins']/(data['B_total_fights']+1))
50
+ data['R_ratio_losses']=data['R_losses']/(data['R_total_fights']+1)*np.exp(data['R_total_fights']/4)
51
+ data['B_ratio_losses']=data['B_losses']/(data['B_total_fights']+1)*np.exp(data['B_total_fights']/4)
52
+ data['Underdog'] = ((data['R_current_win_streak']>=2) & ~(data['B_current_win_streak']>=2)).astype(int)
53
+ #data['Underdog_lose'] = (data['R_current_lose_streak']<=data['B_current_lose_streak']).astype(int)
54
+ numerical_columns = list(data.select_dtypes(include=['int64','float64']).columns.values)
55
+ print(numerical_columns)
56
+ win_columns = [col[2:] for col in numerical_columns if ('win' in col.lower() or 'lose' in col.lower() )and col.startswith('B_') and 'ratio' not in col]
57
+
58
+ numerical_columns_fighter = [col[2:] for col in numerical_columns if col.startswith('B_')]
59
+ for col in set(numerical_columns_fighter)-set(win_columns)-{'age'}:
60
+ data[col+'_diff'] = (data['R_'+col]/(data['R_total_fights']+1))-(data['B_'+col]/(data['B_total_fights']+1))
61
+ #data[col+'_ratio'] = (data['R_'+col]*data['R_total_fights'])/(data['B_'+col]*data['B_total_fights']+1)
62
+ numerical_columns.extend([col+'_diff'])#,col+'_ratio'])
63
+
64
+ for f in ['R_','B_']:
65
+ for col in win_columns:
66
+ data[f+col+'_over_fights'] = data[f+col]/(data[f+'total_fights']+1)
67
+ numerical_columns.append(f+col+'_over_fights')
68
+ data['Weight_lbs_diff2'] = data['B_Weight_lbs']-data['R_Weight_lbs']
69
+
70
+ data['Weight_lbs_diff2_ratio'] = data['Weight_lbs_diff2']/data[['R_Weight_lbs','B_Weight_lbs']].max(axis=1)
71
+ diff = np.log(data['R_age']-17)-np.log(data['B_age']-17) #17 because at 18 years old it will be 0
72
+ data['age_diff2']=diff#*(np.abs(diff)>np.abs(np.log(32/24)))
73
+ data['age_diff_my_ratio']=(data['age_diff2'])/data[['B_age','R_age']].max(axis=1)
74
+
75
+ numerical_columns.remove('R_age')
76
+ numerical_columns.remove('B_age')
77
+
78
+ return data.drop(columns=['R_age','B_age']),numerical_columns
79
+
80
+ st.title('UFC FIGHTERS MACHINE LEARNING PREDICTION')
81
+ st.subheader("Junior N.")
82
+ all_athletes = ''
83
+
84
+ athlete = 'deiveson-figueiredo'
85
+
86
+ fighters = []
87
+
88
+
89
+
90
+ col1, col2 = st.columns(2)
91
+
92
+ content = requests.get(f"https://www.ufc.com/athletes").content
93
+ soup = BeautifulSoup(content , features="lxml")
94
+ #print(soup)
95
+ athletes = soup.find_all(class_='ath-n__name ath-lf-fl')
96
+ liste_athletes = [(a.find('a').text.strip(),a.find('a').get('href')) for a in athletes ]
97
+ liste_athletes = dict(liste_athletes)
98
+
99
+ athlete1 = col1.selectbox(
100
+ 'Choose Red fighter?',
101
+ tuple(liste_athletes.keys()))
102
+ #col1.write('Fighter 1:', athlete1)
103
+
104
+ athlete2 = col2.selectbox(
105
+ 'Choose Blue fighter?',
106
+ tuple(liste_athletes.keys()))
107
+ #col2.write('Fighter 2:', athlete2)
108
+
109
+ selected_ = [(col1,athlete1,'Red'),(col2,athlete2,'Blue')]
110
+ for col,athlete,color in selected_:
111
+ #input()
112
+ content = requests.get(f"https://www.ufc.com{liste_athletes[athlete]}").content
113
+ soup = BeautifulSoup(content,features="lxml")
114
+ #print(soup)
115
+ img = soup.find(class_='hero-profile__image')
116
+ #print(img)
117
+ img_url = img.get('src')
118
+
119
+ response = requests.get(img_url)
120
+ img = Image.open(BytesIO(response.content))
121
+ #fighters.append(img)
122
+ name = " ".join([e.capitalize() for e in athlete.split(" ")])
123
+ new_title = f"<p style=\"font-family:sans-serif; color:{color}; font-size: 30px;\">{name}</p>"
124
+ col.markdown(new_title,unsafe_allow_html=True)
125
+ col.image(img, width=None, use_column_width=None, clamp=False, channels="RGB", output_format="auto")
126
+
127
+
128
+ fighters_dataset = load_data()
129
+
130
+ fighters_dataset = fighters_dataset.set_index('fighter')
131
+ fighters_dataset.index = [i.lower() for i in fighters_dataset.index]
132
+
133
+ #rer
134
+ st.dataframe(fighters_dataset.loc[[athlete1.lower(),athlete2.lower()]])
135
+
136
+ fighter1_stats = fighters_dataset.loc[[athlete1.lower()]]
137
+ fighter1_stats.columns = [ 'R_'+col for col in fighter1_stats.columns]
138
+ fighter2_stats = fighters_dataset.loc[[athlete2.lower()]]
139
+ fighter2_stats.columns = [ 'B_'+col for col in fighter2_stats.columns]
140
+ st.text(f'Red fighter : {athlete1}, Blue fighter :{athlete2}')
141
+
142
+ merged_stats = pd.concat([fighter1_stats,fighter2_stats],axis=1)
143
+ merged_stats['title_bout'] = False
144
+ merged_stats['weight_class'] = 'Heavyweight'
145
+ merged_stats['country_location'] = 'usa'
146
+
147
+ merged_stats = merged_stats.reset_index(drop=True).loc[[0]]
148
+
149
+ #numerical_columns = list(merged_stats.select_dtypes(include=['int64','float64']).columns.values)
150
+
151
+
152
+ data1,numerical_columns = preprocess(merged_stats)
153
+
154
+ mylgbm = load_model('mylgbm_normal')
155
+ #mylgbm2 = load_model('mylgbm_inverse')
156
+
157
+ #blender = blend_models([mylgbm2, mylgbm])
158
+
159
+ #combinedlgbm = VotingClassifier(estimators=[
160
+ # ('normal', mylgbm.named_steps["trained_model"]), ('inverse', mylgbm2.named_steps["trained_model"])], voting='soft')
161
+
162
+
163
+ test_transformed = mylgbm[:-1].transform(data1)
164
+ #test_transformed2 = mylgbm2[:-1].transform(data)
165
+
166
+
167
+ explainer = shap.TreeExplainer(mylgbm.named_steps["trained_model"]) #mylgbm.named_steps["trained_model"] not used yet because we don't want finalized model (aka trained on validation)
168
+ shap_values = explainer.shap_values(test_transformed)
169
+ # Worst predictions on validation set
170
+ prediction = predict_model(mylgbm,data1)[['Score','Label']]#+descr_columns+['R_total_fights','B_total_fights']]
171
+ #comparison = pd.concat([valid_['Winner'],prediction],axis=1)
172
+
173
+ st.dataframe(prediction)
174
+ print("expected",explainer.expected_value[1])
175
+
176
+ fight_idx = 0
177
+
178
+ shap_values1 = shap_values
179
+
180
+ #st_shap(shap.force_plot(explainer.expected_value[1], shap_values[1][fight_idx,:], test_transformed.loc[fight_idx,:],link='logit'))
181
+
182
+
183
+ fighter1_stats = fighters_dataset.loc[[athlete1.lower()]]
184
+ fighter1_stats.columns = [ 'B_'+col for col in fighter1_stats.columns]
185
+ fighter2_stats = fighters_dataset.loc[[athlete2.lower()]]
186
+ fighter2_stats.columns = [ 'R_'+col for col in fighter2_stats.columns]
187
+ st.text(f'Red fighter : {athlete1}, Blue fighter :{athlete2}')
188
+
189
+ merged_stats = pd.concat([fighter1_stats,fighter2_stats],axis=1)
190
+ merged_stats['title_bout'] = False
191
+ merged_stats['weight_class'] = 'Heavyweight'
192
+ merged_stats['country_location'] = 'usa'
193
+
194
+ merged_stats = merged_stats.reset_index(drop=True).loc[[0]]
195
+
196
+ data2,numerical_columns = preprocess(merged_stats)
197
+
198
+ test_transformed2 = mylgbm[:-1].transform(data2)
199
+
200
+
201
+ shap_values = explainer.shap_values(test_transformed2)
202
+ # Worst predictions on validation set
203
+ prediction = predict_model(mylgbm,data2)[['Score','Label']]#+descr_columns+['R_total_fights','B_total_fights']]
204
+ #comparison = pd.concat([valid_['Winner'],prediction],axis=1)
205
+
206
+ st.dataframe(prediction)
207
+
208
+ fight_idx = 0
209
+ print("expected",explainer.expected_value[1])
210
+ st_shap(shap.force_plot(explainer.expected_value[1], (shap_values[1][fight_idx,:]+shap_values1[1][fight_idx,:])/2, test_transformed.loc[fight_idx,:],link='logit'))
211
+
212
+ #shap_values.values=shap_values.values[:,:,1]
213
+ #shap_values.base_values=shap_values.base_values[:,1]
214
+ # print(type(shap_values[1]))
215
+ # print(type(shap_values[1][fight_idx,:]))
216
+
217
+ # st_shap(shap.waterfall_plot(shap_values[0]))
218
+
219
+ #st_shap(shap.plots._waterfall.waterfall_legacy(explainer.expected_value[1], shap_values[1][fight_idx,:],test_transformed.loc[fight_idx,:]))
220
+