由于2020年疫情的影响,2021欧洲杯延期到了今年举行,24支劲旅将在31天内,展开51场精彩对决。作为与奥运会、世界杯比肩的世界顶级三大体育赛事之一,四年一届的欧洲杯吸引了全世界球迷的目光。
欧足联为纪念欧洲杯60周年,本届赛事将采无主办国的巡回赛方式在欧洲的12个国家13个城市举行:丹麦哥本哈根、比利时布鲁塞尔、匈牙利布达佩斯、荷兰阿姆斯特丹、爱尔兰都柏林、罗马尼亚布加勒斯特、苏格兰格拉斯哥、西班牙毕尔巴鄂、阿塞拜疆巴库、德国慕尼黑、意大利罗马、俄罗斯圣彼得堡、英格兰伦敦。其中半决赛和决赛都将在伦敦的温布利球场举行。
本届比赛的参赛球队共24支,分成6个小组,每个小组前两名和4支成绩最好的第三名球队晋级16强,随后是淘汏赛,直至决出最后冠军。
分组情况如下:
- A组:土耳其、意大利、威尔士、瑞士。
- B组:丹麦、芬兰、比利时、俄罗斯。
- C组:荷兰、乌克兰、奥地利、北马其顿。
- D组:英格兰、克罗地亚、捷克、苏格兰。
- E组:西班牙、瑞典、波兰、斯洛伐克。
- F组:德国、法国、葡萄牙、匈牙利。
大赛开始之际,我们可以尝试用 ModelArts 来对各参赛队伍的实力情况进行分析,并结合数据挖掘、机器学习等人工智能技能来初步预测本届世界杯的夺冠热门球队。
在本次预测中,我们将结合历史胜负关系、FIFA积分关系、球队德转身价、主场优势这四个主要影响因素,来对比赛进行胜负判断。
from modelarts.session import Session
sess = Session()
bucket_path="modelarts-labs-bj4/2021EurCup/results.csv"
sess.download_data(bucket_path=bucket_path, path="./results.csv")
bucket_path="modelarts-labs-bj4/2021EurCup/FIFA RANKINGS.csv"
sess.download_data(bucket_path=bucket_path, path="./FIFA RANKINGS.csv")
bucket_path="modelarts-labs-bj4/2021EurCup/Market Values.csv"
sess.download_data(bucket_path=bucket_path, path="./Market Values.csv")
bucket_path="modelarts-labs-bj4/2021EurCup/Schedule.csv"
sess.download_data(bucket_path=bucket_path, path="./Schedule.csv")
Successfully download file modelarts-labs-bj4/2021EurCup/results.csv from OBS to local ./results.csv
Successfully download file modelarts-labs-bj4/2021EurCup/FIFA RANKINGS.csv from OBS to local ./FIFA RANKINGS.csv
Successfully download file modelarts-labs-bj4/2021EurCup/Market Values.csv from OBS to local ./Market Values.csv
Successfully download file modelarts-labs-bj4/2021EurCup/Schedule.csv from OBS to local ./Schedule.csv
1. 历史胜负关系
获取历史比赛数据,并在每场比赛后面增加胜利队伍。
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('ggplot')
df = pd.read_csv('results.csv')
#增加获胜队伍列
winners = []
for i, row in df.iterrows():
if row['home_score'] > row['away_score']:
winners.append(row['home_team'])
elif row['home_score'] < row['away_score']:
winners.append(row['away_team'])
else:
winners.append('Draw')
df['winner'] = winners
df.tail()
date | home_team | away_team | home_score | away_score | tournament | city | country | neutral | winner | |
---|---|---|---|---|---|---|---|---|---|---|
42100 | 2021-05-25 | Indonesia | Afghanistan | 2 | 3 | Friendly | Dubai | United Arab Emirates | True | Afghanistan |
42101 | 2021-05-25 | Oman | Thailand | 1 | 0 | Friendly | Dubai | United Arab Emirates | True | Oman |
42102 | 2021-05-27 | Turkey | Azerbaijan | 2 | 1 | Friendly | Alanya | Turkey | False | Turkey |
42103 | 2021-05-28 | Bahrain | Malaysia | 2 | 0 | Friendly | Riffa | Bahrain | False | Bahrain |
42104 | 2021-05-28 | Italy | San Marino | 7 | 0 | Friendly | Cagliari | Italy | False | Italy |
从总数据中提取出参加本届比赛的24支球队间相互进行的比赛。
teams = ["Turkey", "Italy", "Wales", "Switzerland",
"Denmark", "Belgium", "Finland", "Russia",
"Netherlands", "Ukraine", "Austria", "North Macedonia",
"England", "Scotland", "Czech Republic", "Croatia",
"Spain", "Sweden", "Poland", "Slovakia",
"France", "Germany", "Hungary", "Portugal"]
df_all = df[(df['home_team'].isin(teams))&(df['away_team'].isin(teams))]
df_all.tail()
date | home_team | away_team | home_score | away_score | tournament | city | country | neutral | winner | |
---|---|---|---|---|---|---|---|---|---|---|
42067 | 2021-03-30 | Slovakia | Russia | 2 | 1 | FIFA World Cup qualification | Trnava | Slovakia | False | Slovakia |
42081 | 2021-03-31 | Switzerland | Finland | 3 | 2 | Friendly | St Gallen | Switzerland | False | Switzerland |
42089 | 2021-03-31 | Austria | Denmark | 0 | 4 | FIFA World Cup qualification | Vienna | Austria | False | Denmark |
42091 | 2021-03-31 | England | Poland | 2 | 1 | FIFA World Cup qualification | London | England | False | England |
42095 | 2021-03-31 | Germany | North Macedonia | 1 | 2 | FIFA World Cup qualification | Duisburg | Germany | False | North Macedonia |
定义函数 vs_AB,得到两支球队历史对阵胜负情况。
def vs_AB(teamA, teamB):
df_AB = df_all[(df_all['home_team'].isin([teamA,teamB]))&(df_all['away_team'].isin([teamA,teamB]))]
result_vs = df_AB.groupby('winner')['winner'].count()
result_vs.sort_values(ascending=False, inplace=True)
dict_AB = dict(result_vs)
if teamA not in dict_AB:
dict_AB[teamA] = 0
if teamB not in dict_AB:
dict_AB[teamB] = 0
print('双方历史上共交手次数:')
print(df_AB.shape[0])
print('胜负情况如下')
for key,value in dict_AB.items():
print(str(key)+':'+str(value))
print('\n')
vs_A = dict_AB[teamA] / df_AB.shape[0]
vs_B = dict_AB[teamB] / df_AB.shape[0]
return vs_A, vs_B
2. FIFA积分关系
读取FIFA RANKINGS.csv,获取全部24支参赛球队的FIFA排名和积分数据。
df_FIFA = pd.read_csv('FIFA RANKINGS.csv')
print(df_FIFA)
Rank Team Points
0 1 Belgium 1783.38
1 2 France 1757.30
2 4 England 1686.78
3 5 Portugal 1666.12
4 6 Spain 1648.13
5 7 Italy 1642.06
6 10 Denmark 1631.55
7 12 Germany 1609.12
8 13 Switzerland 1606.21
9 14 Croatia 1605.75
10 16 Netherlands 1598.04
11 17 Wales 1570.36
12 18 Sweden 1569.81
13 21 Poland 1549.87
14 23 Austria 1523.42
15 24 Ukraine 1514.64
16 29 Turkey 1505.05
17 36 Slovakia 1475.24
18 37 Hungary 1468.75
19 38 Russia 1462.65
20 40 Czech Republic 1458.81
21 44 Scotland 1441.43
22 54 Finland 1410.82
23 62 North Macedonia 1374.73
定义函数 FIFA_AB,得到两支球队FIFA积分情况
def FIFA_AB(teamA, teamB):
rank_A = float(df_FIFA[df_FIFA['Team']==teamA]['Points'])
rank_B = float(df_FIFA[df_FIFA['Team']==teamB]['Points'])
print(teamA + ' 的FIFA积分为:' + str(rank_A))
print(teamB + ' 的FIFA积分为:' + str(rank_B))
print('\n')
FIFA_A = rank_A / (rank_A + rank_B)
FIFA_B = rank_B / (rank_A + rank_B)
return FIFA_A, FIFA_B
3. 球队德转身价
读取Market Values.csv,获取全部24支参赛球队的价值排名数据。
df_value = pd.read_csv('Market Values.csv')
print(df_value)
Rank Team Value_billion_pounds
0 1 England 12.700
1 2 France 10.300
2 3 Germany 9.365
3 4 Spain 9.150
4 5 Portugal 8.725
5 6 Italy 7.710
6 7 Belgium 6.694
7 8 Netherlands 6.070
8 9 Croatia 3.758
9 10 Turkey 3.250
10 11 Austria 3.201
11 12 Denmark 3.107
12 13 Switzerland 2.835
13 14 Scotland 2.699
14 15 Poland 2.548
15 16 Sweden 2.151
16 17 Ukraine 1.972
17 18 Russia 1.908
18 19 Czech Republic 1.900
19 20 Wales 1.768
20 21 Slovakia 1.311
21 22 Hungary 0.745
22 23 North Macedonia 0.618
23 24 Finland 0.446
定义函数 value_AB,得到两支球队价值情况
def value_AB(teamA, teamB):
values_A = float(df_value[df_value['Team']==teamA]['Value_billion_pounds'])
values_B = float(df_value[df_value['Team']==teamB]['Value_billion_pounds'])
print(teamA + ' 全队身价为:' + str(values_A) + ' Billion Pounds')
print(teamB + ' 全队身价为:' + str(values_B) + ' Billion Pounds')
print('\n')
value_A = values_A / (values_A + values_B)
value_B = values_B / (values_A + values_B)
return value_A, value_B
4. 主场优势
读取Schedule.csv,获取全部比赛赛程数据。
df_schedule = pd.read_csv('Schedule.csv')
df_schedule.head()
Match Number | Round Number | Date | Location | Home Field | Team1 | Team2 | Group | |
---|---|---|---|---|---|---|---|---|
0 | 1 | 1 | 11/06/2021 20:00 | Olimpico in Rome | Italy | Turkey | Italy | Group A |
1 | 2 | 1 | 12/06/2021 14:00 | Baki Olimpiya Stadionu | The Republic of Azerbaijan | Wales | Switzerland | Group A |
2 | 3 | 1 | 12/06/2021 17:00 | Parken | Denmark | Denmark | Finland | Group B |
3 | 4 | 1 | 12/06/2021 20:00 | Saint Petersburg Stadium | Russia | Belgium | Russia | Group B |
4 | 5 | 1 | 13/06/2021 14:00 | Wembley Stadium | England | England | Croatia | Group D |
定义函数 home_field_AB,得到是否某支球队有主场优势
def home_field_AB(teamA, teamB):
df_schedule_AB = df_schedule[(df_schedule['Team1'].isin([teamA,teamB]))&(df_schedule['Team2'].isin([teamA,teamB]))]
if len(df_schedule_AB) == 0:
print('本场比赛未在小组赛的赛程表中。\n')
return
else:
home_field = df_schedule_AB['Home Field'].tolist()[0]
if teamA==home_field:
home_field_A, home_field_B = 1, 0
print(teamA + ' 占据主场优势。')
elif teamB==home_field:
home_field_A, home_field_B = 0, 1
print(teamB + ' 占据主场优势。')
else:
home_field_A, home_field_B = 0, 0
print('两支球队均没有主场优势。')
print('\n')
return home_field_A, home_field_B
以上分别将历史胜负关系、FIFA积分关系、球队德转身价、主场优势定义了四个函数vs_AB、FIFA_AB、value_AB、home_field_AB
5.定义比赛结果预测模型
def score(teamA, teamB, weight_vs, weight_FIFA, weight_value, weight_home_field):
vs_A, vs_B = vs_AB(teamA, teamB)
FIFA_A, FIFA_B = FIFA_AB(teamA, teamB)
value_A, value_B = value_AB(teamA, teamB)
home_field_ = home_field_AB(teamA, teamB)
if not home_field_:
print('如果进行比赛,\n')
home_field_A, home_field_B = 0, 0
else:
home_field_A, home_field_B = home_field_
score_A = vs_A * weight_vs + FIFA_A * weight_FIFA + value_A * weight_value + home_field_A * weight_home_field
score_B = vs_B * weight_vs + FIFA_B * weight_FIFA + value_B * weight_value + home_field_B * weight_home_field
if score_A > score_B:
print(teamA + ' 获胜概率大。')
elif score_A < score_B:
print(teamB + ' 获胜概率大。')
else:
print(teamA + ' 与 ' + teamB + ' 大概率打平。')
return
模型将根据历史胜负关系、FIFA积分关系、球队德转身价、主场优势来判断比赛结果,可根据实际情况输入权重提升模型预测准确度。
# 权重的总和为1,如认为此场比赛主场优势很关键,可以增加第4个参数的值,来提升模型预测的准确度。
score('Wales','Switzerland', 0.3, 0.3, 0.3, 0.1)
双方历史上共交手次数:
7
胜负情况如下
Switzerland:5
Wales:2
Wales 的FIFA积分为:1570.36
Switzerland 的FIFA积分为:1606.21
Wales 全队身价为:1.768 Billion Pounds
Switzerland 全队身价为:2.835 Billion Pounds
两支球队均没有主场优势。
Switzerland 获胜概率大。
6.结合模型的分析,最后结果预测--------> 瑞士不败!
- 威尔士队作为上届欧洲杯的黑马进入了四强,本届欧洲杯所有的队在对阵威尔士时,都不会掉以轻心,队中的贝尔、拉姆塞等球星发挥是关键。
- 瑞士队一直被认为是拥有血性气质的,球队的攻防速度也比较快,进攻的同时也注重防守的层次,主帅要求球员回防,所以瑞士队很少有大比分惨败的比赛,结合以上的分析看好瑞士不败。
后续将持续采用AI算法、数据分析等手段来对2021年欧洲杯比赛结果进行预测,如:
- 增加数据内容,如:球员身价、伤病情况、天气情况、FIFA排名数据等。
- 使用机器学习算法,如线性回归、逻辑回归、决策树、随机森林等。
- 进行数据特征工程,对不同的数据设置权重,加入到AI算法中训练。
- 欢迎大家自由发挥,使用各种AI方法来分析,并预测出每场赛果,及欧洲杯冠军。