Spaces:
Sleeping
Sleeping
nnaned
commited on
Commit
·
acaba38
1
Parent(s):
0380da1
initial
Browse files- README.md +63 -12
- fighters_20_03_2021.csv +0 -0
- logs.log +0 -0
- ressources/print.jpg +0 -0
- ufc_predictor.py +220 -0
README.md
CHANGED
@@ -1,12 +1,63 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
4 |
-
|
5 |
-
|
6 |
-
|
7 |
-
|
8 |
-
|
9 |
-
|
10 |
-
|
11 |
-
|
12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
|
2 |
+
|
3 |
+

|
4 |
+
## Acknowlegements
|
5 |
+
|
6 |
+
Thanks to [@WarrierRajeev](https://github.com/WarrierRajeev/UFC-Predictions) for uploading original dataset scraped from ufcstats website along with very insighful EDA and preprocessing!
|
7 |
+
|
8 |
+
## Fighter Dataset Content
|
9 |
+
|
10 |
+
The dataset used for prediction of winner is obtained after preprocessing of original dataset to extract averaged statistics of fighters over fights using last statistics.
|
11 |
+
|
12 |
+
## Original Dataset Content
|
13 |
+
Each row is a compilation of both fighter stats. Fighters are represented by 'red' and 'blue' (for red and blue corner). So for instance, red fighter has the complied average stats of all the fights except the current one. The stats include damage done by the red fighter on the opponent and the damage done by the opponent on the fighter (represented by 'opp' in the columns) in all the fights this particular red fighter has had, except this one as it has not occured yet (in the data). Same information exists for blue fighter. The target variable is 'Winner' which is the only column that tells you what happened.
|
14 |
+
Here are some column definitions:
|
15 |
+
|
16 |
+
## Column definitions:
|
17 |
+
|
18 |
+
- `R_` and `B_` prefix signifies red and blue corner fighter stats respectively
|
19 |
+
- `_opp_` containing columns is the average of damage done by the opponent on the fighter
|
20 |
+
- `KD` is number of knockdowns
|
21 |
+
- `SIG_STR` is no. of significant strikes 'landed of attempted'
|
22 |
+
- `SIG_STR_pct` is significant strikes percentage
|
23 |
+
- `TOTAL_STR` is total strikes 'landed of attempted'
|
24 |
+
- `TD` is no. of takedowns
|
25 |
+
- `TD_pct` is takedown percentages
|
26 |
+
- `SUB_ATT` is no. of submission attempts
|
27 |
+
- `PASS` is no. times the guard was passed?
|
28 |
+
- `REV` are the number of reversals
|
29 |
+
- `HEAD` is no. of significant strinks to the head 'landed of attempted'
|
30 |
+
- `BODY` is no. of significant strikes to the body 'landed of attempted'
|
31 |
+
- `CLINCH` is no. of significant strikes in the clinch 'landed of attempted'
|
32 |
+
- `GROUND` is no. of significant strikes on the ground 'landed of attempted'
|
33 |
+
- `win_by` is method of win
|
34 |
+
- `last_round` is last round of the fight (ex. if it was a KO in 1st, then this will be 1)
|
35 |
+
- `last_round_time` is when the fight ended in the last round
|
36 |
+
- `Format` is the format of the fight (3 rounds, 5 rounds etc.)
|
37 |
+
- `Referee` is the name of the Ref
|
38 |
+
- `date` is the date of the fight
|
39 |
+
- `location` is the location in which the event took place
|
40 |
+
- `Fight_type` is which weight class and whether it's a title bout or not
|
41 |
+
- `Winner` is the winner of the fight
|
42 |
+
- `Stance` is the stance of the fighter (orthodox, southpaw, etc.)
|
43 |
+
- `Height_cms` is the height in centimeter
|
44 |
+
- `Reach_cms` is the reach of the fighter (arm span) in centimeter
|
45 |
+
- `Weight_lbs` is the weight of the fighter in pounds (lbs)
|
46 |
+
- `age` is the age of the fighter
|
47 |
+
- `title_bout` Boolean value of whether it is title fight or not
|
48 |
+
- `weight_class` is which weight class the fight is in (Bantamweight, heavyweight, Women's flyweight, etc.)
|
49 |
+
- `no_of_rounds` is the number of rounds the fight was scheduled for
|
50 |
+
- `current_lose_streak` is the count of current concurrent losses of the fighter
|
51 |
+
- `current_win_streak` is the count of current concurrent wins of the fighter
|
52 |
+
- `draw` is the number of draws in the fighter's ufc career
|
53 |
+
- `wins` is the number of wins in the fighter's ufc career
|
54 |
+
- `losses` is the number of losses in the fighter's ufc career
|
55 |
+
- `total_rounds_fought` is the average of total rounds fought by the fighter
|
56 |
+
- `total_time_fought(seconds)` is the count of total time spent fighting in seconds
|
57 |
+
- `total_title_bouts` is the total number of title bouts taken part in by the fighter
|
58 |
+
- `win_by_Decision_Majority` is the number of wins by majority judges decision in the fighter's ufc career
|
59 |
+
- `win_by_Decision_Split` is the number of wins by split judges decision in the fighter's ufc career
|
60 |
+
- `win_by_Decision_Unanimous` is the number of wins by unanimous judges decision in the fighter's ufc career
|
61 |
+
- `win_by_KO/TKO` is the number of wins by knockout in the fighter's ufc career
|
62 |
+
- `win_by_Submission` is the number of wins by submission in the fighter's ufc career
|
63 |
+
- `win_by_TKO_Doctor_Stoppage` is the number of wins by doctor stoppage in the fighter's ufc career
|
fighters_20_03_2021.csv
ADDED
The diff for this file is too large to render.
See raw diff
|
|
logs.log
ADDED
The diff for this file is too large to render.
See raw diff
|
|
ressources/print.jpg
ADDED
![]() |
ufc_predictor.py
ADDED
@@ -0,0 +1,220 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
|
2 |
+
import requests
|
3 |
+
from bs4 import BeautifulSoup
|
4 |
+
import pandas as pd
|
5 |
+
from PIL import Image
|
6 |
+
from io import BytesIO
|
7 |
+
import streamlit as st
|
8 |
+
import pandas as pd
|
9 |
+
import numpy as np
|
10 |
+
from pycaret.classification import load_model,predict_model,blend_models
|
11 |
+
import shap
|
12 |
+
import streamlit.components.v1 as components
|
13 |
+
from sklearn.ensemble import VotingClassifier
|
14 |
+
|
15 |
+
@st.cache
|
16 |
+
def load_data():
|
17 |
+
return pd.read_csv("fighters_20_03_2021.csv")
|
18 |
+
|
19 |
+
def st_shap(plot, height=None):
|
20 |
+
shap_html = f"<head>{shap.getjs()}</head><body>{plot.html()}</body>"
|
21 |
+
components.html(shap_html, height=height)
|
22 |
+
|
23 |
+
|
24 |
+
def preprocess(dataframe):
|
25 |
+
data = dataframe.copy()
|
26 |
+
data['Men_or_women'] = data.weight_class.str.lower().str.contains('women').astype(int)
|
27 |
+
data['R_total_time_fought(mins)']=data['R_total_time_fought(seconds)']/60
|
28 |
+
data['B_total_time_fought(mins)']=data['B_total_time_fought(seconds)']/60
|
29 |
+
data['R_total_fights'] = data['R_wins']+data['R_draw']+data['R_losses']
|
30 |
+
data['B_total_fights'] = data['B_wins']+data['B_draw']+data['B_losses']
|
31 |
+
def home_definer(a,b):
|
32 |
+
if a=="unknown" or b=="unknown":
|
33 |
+
return "dunno"
|
34 |
+
if a==b:
|
35 |
+
return "yes"
|
36 |
+
return "no"
|
37 |
+
|
38 |
+
data['R_fighter_home'] = data.apply(lambda x:home_definer(x['R_fighter_country'],x['country_location']),axis=1)
|
39 |
+
data['B_fighter_home'] = data.apply(lambda x:home_definer(x['B_fighter_country'],x['country_location']),axis=1)
|
40 |
+
|
41 |
+
data['fighter_taller_but_not_rangier'] = data.apply(lambda x:(x['R_Height_cms']>x['B_Height_cms'] and x['R_Reach_cms']<=x['B_Reach_cms']),axis=1)
|
42 |
+
# time
|
43 |
+
data['B_avg_time_fought(mins)'] = data['B_total_time_fought(mins)']/(data['B_total_fights']+1)
|
44 |
+
data['R_avg_time_fought(mins)'] = data['R_total_time_fought(mins)']/(data['R_total_fights']+1)
|
45 |
+
# over fights (win and losses)
|
46 |
+
data['R_ratio_win_over_fights_exp']=(data['R_wins']/(data['R_total_fights']+1))*np.exp(data['R_total_fights']/4)
|
47 |
+
data['B_ratio_win_over_fights_exp']=(data['B_wins']/(data['B_total_fights']+1))*np.exp(data['B_total_fights']/4)
|
48 |
+
data['R_ratio_win']=(data['R_wins']/(data['R_total_fights']+1))
|
49 |
+
data['B_ratio_win']=(data['B_wins']/(data['B_total_fights']+1))
|
50 |
+
data['R_ratio_losses']=data['R_losses']/(data['R_total_fights']+1)*np.exp(data['R_total_fights']/4)
|
51 |
+
data['B_ratio_losses']=data['B_losses']/(data['B_total_fights']+1)*np.exp(data['B_total_fights']/4)
|
52 |
+
data['Underdog'] = ((data['R_current_win_streak']>=2) & ~(data['B_current_win_streak']>=2)).astype(int)
|
53 |
+
#data['Underdog_lose'] = (data['R_current_lose_streak']<=data['B_current_lose_streak']).astype(int)
|
54 |
+
numerical_columns = list(data.select_dtypes(include=['int64','float64']).columns.values)
|
55 |
+
print(numerical_columns)
|
56 |
+
win_columns = [col[2:] for col in numerical_columns if ('win' in col.lower() or 'lose' in col.lower() )and col.startswith('B_') and 'ratio' not in col]
|
57 |
+
|
58 |
+
numerical_columns_fighter = [col[2:] for col in numerical_columns if col.startswith('B_')]
|
59 |
+
for col in set(numerical_columns_fighter)-set(win_columns)-{'age'}:
|
60 |
+
data[col+'_diff'] = (data['R_'+col]/(data['R_total_fights']+1))-(data['B_'+col]/(data['B_total_fights']+1))
|
61 |
+
#data[col+'_ratio'] = (data['R_'+col]*data['R_total_fights'])/(data['B_'+col]*data['B_total_fights']+1)
|
62 |
+
numerical_columns.extend([col+'_diff'])#,col+'_ratio'])
|
63 |
+
|
64 |
+
for f in ['R_','B_']:
|
65 |
+
for col in win_columns:
|
66 |
+
data[f+col+'_over_fights'] = data[f+col]/(data[f+'total_fights']+1)
|
67 |
+
numerical_columns.append(f+col+'_over_fights')
|
68 |
+
data['Weight_lbs_diff2'] = data['B_Weight_lbs']-data['R_Weight_lbs']
|
69 |
+
|
70 |
+
data['Weight_lbs_diff2_ratio'] = data['Weight_lbs_diff2']/data[['R_Weight_lbs','B_Weight_lbs']].max(axis=1)
|
71 |
+
diff = np.log(data['R_age']-17)-np.log(data['B_age']-17) #17 because at 18 years old it will be 0
|
72 |
+
data['age_diff2']=diff#*(np.abs(diff)>np.abs(np.log(32/24)))
|
73 |
+
data['age_diff_my_ratio']=(data['age_diff2'])/data[['B_age','R_age']].max(axis=1)
|
74 |
+
|
75 |
+
numerical_columns.remove('R_age')
|
76 |
+
numerical_columns.remove('B_age')
|
77 |
+
|
78 |
+
return data.drop(columns=['R_age','B_age']),numerical_columns
|
79 |
+
|
80 |
+
st.title('UFC FIGHTERS MACHINE LEARNING PREDICTION')
|
81 |
+
st.subheader("Junior N.")
|
82 |
+
all_athletes = ''
|
83 |
+
|
84 |
+
athlete = 'deiveson-figueiredo'
|
85 |
+
|
86 |
+
fighters = []
|
87 |
+
|
88 |
+
|
89 |
+
|
90 |
+
col1, col2 = st.columns(2)
|
91 |
+
|
92 |
+
content = requests.get(f"https://www.ufc.com/athletes").content
|
93 |
+
soup = BeautifulSoup(content , features="lxml")
|
94 |
+
#print(soup)
|
95 |
+
athletes = soup.find_all(class_='ath-n__name ath-lf-fl')
|
96 |
+
liste_athletes = [(a.find('a').text.strip(),a.find('a').get('href')) for a in athletes ]
|
97 |
+
liste_athletes = dict(liste_athletes)
|
98 |
+
|
99 |
+
athlete1 = col1.selectbox(
|
100 |
+
'Choose Red fighter?',
|
101 |
+
tuple(liste_athletes.keys()))
|
102 |
+
#col1.write('Fighter 1:', athlete1)
|
103 |
+
|
104 |
+
athlete2 = col2.selectbox(
|
105 |
+
'Choose Blue fighter?',
|
106 |
+
tuple(liste_athletes.keys()))
|
107 |
+
#col2.write('Fighter 2:', athlete2)
|
108 |
+
|
109 |
+
selected_ = [(col1,athlete1,'Red'),(col2,athlete2,'Blue')]
|
110 |
+
for col,athlete,color in selected_:
|
111 |
+
#input()
|
112 |
+
content = requests.get(f"https://www.ufc.com{liste_athletes[athlete]}").content
|
113 |
+
soup = BeautifulSoup(content,features="lxml")
|
114 |
+
#print(soup)
|
115 |
+
img = soup.find(class_='hero-profile__image')
|
116 |
+
#print(img)
|
117 |
+
img_url = img.get('src')
|
118 |
+
|
119 |
+
response = requests.get(img_url)
|
120 |
+
img = Image.open(BytesIO(response.content))
|
121 |
+
#fighters.append(img)
|
122 |
+
name = " ".join([e.capitalize() for e in athlete.split(" ")])
|
123 |
+
new_title = f"<p style=\"font-family:sans-serif; color:{color}; font-size: 30px;\">{name}</p>"
|
124 |
+
col.markdown(new_title,unsafe_allow_html=True)
|
125 |
+
col.image(img, width=None, use_column_width=None, clamp=False, channels="RGB", output_format="auto")
|
126 |
+
|
127 |
+
|
128 |
+
fighters_dataset = load_data()
|
129 |
+
|
130 |
+
fighters_dataset = fighters_dataset.set_index('fighter')
|
131 |
+
fighters_dataset.index = [i.lower() for i in fighters_dataset.index]
|
132 |
+
|
133 |
+
#rer
|
134 |
+
st.dataframe(fighters_dataset.loc[[athlete1.lower(),athlete2.lower()]])
|
135 |
+
|
136 |
+
fighter1_stats = fighters_dataset.loc[[athlete1.lower()]]
|
137 |
+
fighter1_stats.columns = [ 'R_'+col for col in fighter1_stats.columns]
|
138 |
+
fighter2_stats = fighters_dataset.loc[[athlete2.lower()]]
|
139 |
+
fighter2_stats.columns = [ 'B_'+col for col in fighter2_stats.columns]
|
140 |
+
st.text(f'Red fighter : {athlete1}, Blue fighter :{athlete2}')
|
141 |
+
|
142 |
+
merged_stats = pd.concat([fighter1_stats,fighter2_stats],axis=1)
|
143 |
+
merged_stats['title_bout'] = False
|
144 |
+
merged_stats['weight_class'] = 'Heavyweight'
|
145 |
+
merged_stats['country_location'] = 'usa'
|
146 |
+
|
147 |
+
merged_stats = merged_stats.reset_index(drop=True).loc[[0]]
|
148 |
+
|
149 |
+
#numerical_columns = list(merged_stats.select_dtypes(include=['int64','float64']).columns.values)
|
150 |
+
|
151 |
+
|
152 |
+
data1,numerical_columns = preprocess(merged_stats)
|
153 |
+
|
154 |
+
mylgbm = load_model('mylgbm_normal')
|
155 |
+
#mylgbm2 = load_model('mylgbm_inverse')
|
156 |
+
|
157 |
+
#blender = blend_models([mylgbm2, mylgbm])
|
158 |
+
|
159 |
+
#combinedlgbm = VotingClassifier(estimators=[
|
160 |
+
# ('normal', mylgbm.named_steps["trained_model"]), ('inverse', mylgbm2.named_steps["trained_model"])], voting='soft')
|
161 |
+
|
162 |
+
|
163 |
+
test_transformed = mylgbm[:-1].transform(data1)
|
164 |
+
#test_transformed2 = mylgbm2[:-1].transform(data)
|
165 |
+
|
166 |
+
|
167 |
+
explainer = shap.TreeExplainer(mylgbm.named_steps["trained_model"]) #mylgbm.named_steps["trained_model"] not used yet because we don't want finalized model (aka trained on validation)
|
168 |
+
shap_values = explainer.shap_values(test_transformed)
|
169 |
+
# Worst predictions on validation set
|
170 |
+
prediction = predict_model(mylgbm,data1)[['Score','Label']]#+descr_columns+['R_total_fights','B_total_fights']]
|
171 |
+
#comparison = pd.concat([valid_['Winner'],prediction],axis=1)
|
172 |
+
|
173 |
+
st.dataframe(prediction)
|
174 |
+
print("expected",explainer.expected_value[1])
|
175 |
+
|
176 |
+
fight_idx = 0
|
177 |
+
|
178 |
+
shap_values1 = shap_values
|
179 |
+
|
180 |
+
#st_shap(shap.force_plot(explainer.expected_value[1], shap_values[1][fight_idx,:], test_transformed.loc[fight_idx,:],link='logit'))
|
181 |
+
|
182 |
+
|
183 |
+
fighter1_stats = fighters_dataset.loc[[athlete1.lower()]]
|
184 |
+
fighter1_stats.columns = [ 'B_'+col for col in fighter1_stats.columns]
|
185 |
+
fighter2_stats = fighters_dataset.loc[[athlete2.lower()]]
|
186 |
+
fighter2_stats.columns = [ 'R_'+col for col in fighter2_stats.columns]
|
187 |
+
st.text(f'Red fighter : {athlete1}, Blue fighter :{athlete2}')
|
188 |
+
|
189 |
+
merged_stats = pd.concat([fighter1_stats,fighter2_stats],axis=1)
|
190 |
+
merged_stats['title_bout'] = False
|
191 |
+
merged_stats['weight_class'] = 'Heavyweight'
|
192 |
+
merged_stats['country_location'] = 'usa'
|
193 |
+
|
194 |
+
merged_stats = merged_stats.reset_index(drop=True).loc[[0]]
|
195 |
+
|
196 |
+
data2,numerical_columns = preprocess(merged_stats)
|
197 |
+
|
198 |
+
test_transformed2 = mylgbm[:-1].transform(data2)
|
199 |
+
|
200 |
+
|
201 |
+
shap_values = explainer.shap_values(test_transformed2)
|
202 |
+
# Worst predictions on validation set
|
203 |
+
prediction = predict_model(mylgbm,data2)[['Score','Label']]#+descr_columns+['R_total_fights','B_total_fights']]
|
204 |
+
#comparison = pd.concat([valid_['Winner'],prediction],axis=1)
|
205 |
+
|
206 |
+
st.dataframe(prediction)
|
207 |
+
|
208 |
+
fight_idx = 0
|
209 |
+
print("expected",explainer.expected_value[1])
|
210 |
+
st_shap(shap.force_plot(explainer.expected_value[1], (shap_values[1][fight_idx,:]+shap_values1[1][fight_idx,:])/2, test_transformed.loc[fight_idx,:],link='logit'))
|
211 |
+
|
212 |
+
#shap_values.values=shap_values.values[:,:,1]
|
213 |
+
#shap_values.base_values=shap_values.base_values[:,1]
|
214 |
+
# print(type(shap_values[1]))
|
215 |
+
# print(type(shap_values[1][fight_idx,:]))
|
216 |
+
|
217 |
+
# st_shap(shap.waterfall_plot(shap_values[0]))
|
218 |
+
|
219 |
+
#st_shap(shap.plots._waterfall.waterfall_legacy(explainer.expected_value[1], shap_values[1][fight_idx,:],test_transformed.loc[fight_idx,:]))
|
220 |
+
|