由于2020年疫情的影响,2021欧洲杯延期到了今年举行,24支劲旅将在31天内,展开51场精彩对决。作为与奥运会、世界杯比肩的世界顶级三大体育赛事之一,四年一届的欧洲杯吸引了全世界球迷的目光。
欧足联为纪念欧洲杯60周年,本届赛事将采无主办国的巡回赛方式在欧洲的12个国家13个城市举行:丹麦哥本哈根、比利时布鲁塞尔、匈牙利布达佩斯、荷兰阿姆斯特丹、爱尔兰都柏林、罗马尼亚布加勒斯特、苏格兰格拉斯哥、西班牙毕尔巴鄂、阿塞拜疆巴库、德国慕尼黑、意大利罗马、俄罗斯圣彼得堡、英格兰伦敦。其中半决赛和决赛都将在伦敦的温布利球场举行。
本届比赛的参赛球队共24支,分成6个小组,每个小组前两名和4支成绩最好的第三名球队晋级16强,随后是淘汏赛,直至决出最后冠军。
分组情况如下:
- A组:土耳其、意大利、威尔士、瑞士。
- B组:丹麦、芬兰、比利时、俄罗斯。
- C组:荷兰、乌克兰、奥地利、北马其顿。
- D组:英格兰、克罗地亚、捷克、苏格兰。
- E组:西班牙、瑞典、波兰、斯洛伐克。
- F组:德国、法国、葡萄牙、匈牙利。
大赛开始之际,我们可以尝试用 ModelArts 来对各参赛队伍的实力情况进行分析,并结合数据挖掘、机器学习等人工智能技能来初步预测本届世界杯的夺冠热门球队。
在本次预测中,我们将结合历史胜负关系、FIFA积分关系、球队德转身价、主场优势这四个主要影响因素,来对比赛进行胜负判断。
from modelarts.session import Session
sess = Session()
bucket_path="modelarts-labs-bj4/2021EurCup/results.csv"
sess.download_data(bucket_path=bucket_path, path="./results.csv")
bucket_path="modelarts-labs-bj4/2021EurCup/FIFA RANKINGS.csv"
sess.download_data(bucket_path=bucket_path, path="./FIFA RANKINGS.csv")
bucket_path="modelarts-labs-bj4/2021EurCup/Market Values.csv"
sess.download_data(bucket_path=bucket_path, path="./Market Values.csv")
bucket_path="modelarts-labs-bj4/2021EurCup/Schedule.csv"
sess.download_data(bucket_path=bucket_path, path="./Schedule.csv")
Successfully download file modelarts-labs-bj4/2021EurCup/results.csv from OBS to local ./results.csv
Successfully download file modelarts-labs-bj4/2021EurCup/FIFA RANKINGS.csv from OBS to local ./FIFA RANKINGS.csv
Successfully download file modelarts-labs-bj4/2021EurCup/Market Values.csv from OBS to local ./Market Values.csv
Successfully download file modelarts-labs-bj4/2021EurCup/Schedule.csv from OBS to local ./Schedule.csv
1. 历史胜负关系
获取历史比赛数据,并在每场比赛后面增加胜利队伍。
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('ggplot')
df = pd.read_csv('results.csv')
#增加获胜队伍列
winners = []
for i, row in df.iterrows():
if row['home_score'] > row['away_score']:
winners.append(row['home_team'])
elif row['home_score'] < row['away_score']:
winners.append(row['away_team'])
else:
winners.append('Draw')
df['winner'] = winners
df.tail()
date | home_team | away_team | home_score | away_score | tournament | city | country | neutral | winner | |
---|---|---|---|---|---|---|---|---|---|---|
42100 | 2021-05-25 | Indonesia | Afghanistan | 2 | 3 | Friendly | Dubai | United Arab Emirates | True | Afghanistan |
42101 | 2021-05-25 | Oman | Thailand | 1 | 0 | Friendly | Dubai | United Arab Emirates | True | Oman |
42102 | 2021-05-27 | Turkey | Azerbaijan | 2 | 1 | Friendly | Alanya | Turkey | False | Turkey |
42103 | 2021-05-28 | Bahrain | Malaysia | 2 | 0 | Friendly | Riffa | Bahrain | False | Bahrain |
42104 | 2021-05-28 | Italy | San Marino | 7 | 0 | Friendly | Cagliari | Italy | False | Italy |
从总数据中提取出参加本届比赛的24支球队间相互进行的比赛。
teams = ["Turkey", "Italy", "Wales", "Switzerland",
"Denmark", "Belgium", "Finland", "Russia",
"Netherlands", "Ukraine", "Austria", "North Macedonia",
"England", "Scotland", "Czech Republic", "Croatia",
"Spain", "Sweden", "Poland", "Slovakia",
"France", "Germany", "Hungary", "Portugal"]
df_all = df[(df['home_team'].isin(teams))&(df['away_team'].isin(teams))]
df_all.tail()
date | home_team | away_team | home_score | away_score | tournament | city | country | neutral | winner | |
---|---|---|---|---|---|---|---|---|---|---|
42067 | 2021-03-30 | Slovakia | Russia | 2 | 1 | FIFA World Cup qualification | Trnava | Slovakia | False | Slovakia |
42081 | 2021-03-31 | Switzerland | Finland | 3 | 2 | Friendly | St Gallen | Switzerland | False | Switzerland |
42089 | 2021-03-31 | Austria | Denmark | 0 | 4 | FIFA World Cup qualification | Vienna | Austria | False | Denmark |
42091 | 2021-03-31 | England | Poland | 2 | 1 | FIFA World Cup qualification | London | England | False | England |
42095 | 2021-03-31 | Germany | North Macedonia | 1 | 2 | FIFA World Cup qualification | Duisburg | Germany | False | North Macedonia |
定义函数 vs_AB,得到两支球队历史对阵胜负情况。
def vs_AB(teamA, teamB):
df_AB = df_all[(df_all['home_team'].isin([teamA,teamB]))&(df_all['away_team'].isin([teamA,teamB]))]
result_vs = df_AB.groupby('winner')['winner'].count()
result_vs.sort_values(ascending=False, inplace=True)
dict_AB = dict(result_vs)
if teamA not in dict_AB:
dict_AB[teamA] = 0
if teamB not in dict_AB:
dict_AB[teamB] = 0
print('双方历史上共交手次数:')
print(df_AB.shape[0])
print('胜负情况如下')
for key,value in dict_AB.items():
print(str(key)+':'+str(value))
print('\n')
vs_A = dict_AB[teamA] / df_AB.shape[0]
vs_B = dict_AB[teamB] / df_AB.shape[0]
return vs_A, vs_B
2. FIFA积分关系
读取FIFA RANKINGS.csv,获取全部24支参赛球队的FIFA排名和积分数据。
df_FIFA = pd.read_csv('FIFA RANKINGS.csv')
print(df_FIFA)
Rank Team Points
0 1 Belgium 1783.38
1 2 France 1757.30
2 4 England 1686.78
3 5 Portugal 1666.12
4 6 Spain 1648.13
5 7 Italy 1642.06
6 10 Denmark 1631.55
7 12 Germany 1609.12
8 13 Switzerland 1606.21
9 14 Croatia 1605.75
10 16 Netherlands 1598.04
11 17 Wales 1570.36
12 18 Sweden 1569.81
13 21 Poland 1549.87
14 23 Austria 1523.42
15 24 Ukraine 1514.64
16 29 Turkey 1505.05
17 36 Slovakia 1475.24
18 37 Hungary 1468.75
19 38 Russia 1462.65
20 40 Czech Republic 1458.81
21 44 Scotland 1441.43
22 54 Finland 1410.82
23 62 North Macedonia 1374.73
定义函数 FIFA_AB,得到两支球队FIFA积分情况
def FIFA_AB(teamA, teamB):
rank_A = float(df_FIFA[df_FIFA['Team']==teamA]['Points'])
rank_B = float(df_FIFA[df_FIFA['Team']==teamB]['Points'])
print(teamA + ' 的FIFA积分为:' + str(rank_A))
print(teamB + ' 的FIFA积分为:' + str(rank_B))
print('\n')
FIFA_A = rank_A / (rank_A + rank_B)
FIFA_B = rank_B / (rank_A + rank_B)
return FIFA_A, FIFA_B
3. 球队德转身价
读取Market Values.csv,获取全部24支参赛球队的价值排名数据。
df_value = pd.read_csv('Market Values.csv')
print(df_value)
Rank Team Value_billion_pounds
0 1 England 12.700
1 2 France 10.300
2 3 Germany 9.365
3 4 Spain 9.150
4 5 Portugal 8.725
5 6 Italy 7.710
6 7 Belgium 6.694
7 8 Netherlands 6.070
8 9 Croatia 3.758
9 10 Turkey 3.250
10 11 Austria 3.201
11 12 Denmark 3.107
12 13 Switzerland 2.835
13 14 Scotland 2.699
14 15 Poland 2.548
15 16 Sweden 2.151
16 17 Ukraine 1.972
17 18 Russia 1.908
18 19 Czech Republic 1.900
19 20 Wales 1.768
20 21 Slovakia 1.311
21 22 Hungary 0.745
22 23 North Macedonia 0.618
23 24 Finland 0.446
定义函数 value_AB,得到两支球队价值情况
def value_AB(teamA, teamB):
values_A = float(df_value[df_value['Team']==teamA]['Value_billion_pounds'])
values_B = float(df_value[df_value['Team']==teamB]['Value_billion_pounds'])
print(teamA + ' 全队身价为:' + str(values_A) + ' Billion Pounds')
print(teamB + ' 全队身价为:' + str(values_B) + ' Billion Pounds')
print('\n')
value_A = values_A / (values_A + values_B)
value_B = values_B / (values_A + values_B)
return value_A, value_B
4. 主场优势
读取Schedule.csv,获取全部比赛赛程数据。
df_schedule = pd.read_csv('Schedule.csv')
df_schedule.head()
Match Number | Round Number | Date | Location | Home Field | Team1 | Team2 | Group | |
---|---|---|---|---|---|---|---|---|
0 | 1 | 1 | 11/06/2021 20:00 | Olimpico in Rome | Italy | Turkey | Italy | Group A |
1 | 2 | 1 | 12/06/2021 14:00 | Baki Olimpiya Stadionu | The Republic of Azerbaijan | Wales | Switzerland | Group A |
2 | 3 | 1 | 12/06/2021 17:00 | Parken | Denmark | Denmark | Finland | Group B |
3 | 4 | 1 | 12/06/2021 20:00 | Saint Petersburg Stadium | Russia | Belgium | Russia | Group B |
4 | 5 | 1 | 13/06/2021 14:00 | Wembley Stadium | England | England | Croatia | Group D |
定义函数 home_field_AB,得到是否某支球队有主场优势
def home_field_AB(teamA, teamB):
df_schedule_AB = df_schedule[(df_schedule['Team1'].isin([teamA,teamB]))&(df_schedule['Team2'].isin([teamA,teamB]))]
if len(df_schedule_AB) == 0:
print('本场比赛未在小组赛的赛程表中。\n')
return
else:
home_field = df_schedule_AB['Home Field'].tolist()[0]
if teamA==home_field:
home_field_A, home_field_B = 1, 0
print(teamA + ' 占据主场优势。')
elif teamB==home_field:
home_field_A, home_field_B = 0, 1
print(teamB + ' 占据主场优势。')
else:
home_field_A, home_field_B = 0, 0
print('两支球队均没有主场优势。')
print('\n')
return home_field_A, home_field_B
以上分别将历史胜负关系、FIFA积分关系、球队德转身价、主场优势定义了四个函数vs_AB、FIFA_AB、value_AB、home_field_AB
5.定义比赛结果预测模型
def score(teamA, teamB, weight_vs, weight_FIFA, weight_value, weight_home_field):
vs_A, vs_B = vs_AB(teamA, teamB)
FIFA_A, FIFA_B = FIFA_AB(teamA, teamB)
value_A, value_B = value_AB(teamA, teamB)
home_field_ = home_field_AB(teamA, teamB)
if not home_field_:
print('如果进行比赛,\n')
home_field_A, home_field_B = 0, 0
else:
home_field_A, home_field_B = home_field_
score_A = vs_A * weight_vs + FIFA_A * weight_FIFA + value_A * weight_value + home_field_A * weight_home_field
score_B = vs_B * weight_vs + FIFA_B * weight_FIFA + value_B * weight_value + home_field_B * weight_home_field
if score_A > score_B:
print(teamA + ' 获胜概率大。')
elif score_A < score_B:
print(teamB + ' 获胜概率大。')
else:
print(teamA + ' 与 ' + teamB + ' 大概率打平。')
return
模型将根据历史胜负关系、FIFA积分关系、球队德转身价、主场优势来判断比赛结果,可根据实际情况输入权重提升模型预测准确度。
# 权重的总和为1,如认为此场比赛主场优势很关键,可以增加第4个参数的值,来提升模型预测的准确度。
score('Denmark','Finland', 0.2, 0.2, 0.2, 0.4)
双方历史上共交手次数:
57
胜负情况如下
Denmark:38
Finland:10
Draw:9
Denmark 的FIFA积分为:1631.55
Finland 的FIFA积分为:1410.82
Denmark 全队身价为:3.1069999999999998 Billion Pounds
Finland 全队身价为:0.446 Billion Pounds
Denmark 占据主场优势。
Denmark 获胜概率大。
6.结合模型的分析,最后结果预测--------> 丹麦获胜!
- 丹麦队有着不俗的实力,门将舒梅切尔有望复刻当年父亲的辉煌;后防线上,有克里斯滕森和克亚尔;中场有埃里克森和霍伊别尔;锋线上有布莱斯维特和波尔森。三线均有效力五大联赛的球星,以目前的状态实力来看,本场比赛丹麦借助主场优势,赢芬兰队问题不大。
- 芬兰队球员大多效力于本国联赛,在英冠诺维奇效力的前锋普基是头号球星头号射手,在预选赛阶段,普基一人独进10球,芬兰一共才打入了16球。总体来看,这支芬兰没有多少星味,大多数球员都是名不见经传。但对于第一次打入欧洲杯正赛的芬兰队而言,他们已经创造了自己的历史,欧洲杯的每一场比赛、每一分、每一粒进球,都是崭新的历史。
后续将持续采用AI算法、数据分析等手段来对2021年欧洲杯比赛结果进行预测,如:
- 增加数据内容,如:球员身价、伤病情况、天气情况、FIFA排名数据等。
- 使用机器学习算法,如线性回归、逻辑回归、决策树、随机森林等。
- 进行数据特征工程,对不同的数据设置权重,加入到AI算法中训练。
- 欢迎大家自由发挥,使用各种AI方法来分析,并预测出每场赛果,及欧洲杯冠军。