Soccer Data Analysis

3 min readSep 15, 2021

Introduction

I selected the soccer database from Kaggle. It contains more than 25,000 matches and more than 10,000 players , players and from several European countries from 2008 to 2016. By means of Exploratory Data Analysis method. Although we won’t be getting into the details of it for our example, the dataset even has attributes on weekly game updates, team line up, and detailed match events. The goal of this notebook is to walk you through an end to end process of analyzing a dataset . Our simple analytical process will include some steps for exploring and cleaning our dataset.

About dataset :

+25,000 matches
+10,000 players
11 European Countries with their lead championship
Seasons 2008 to 2016
Players and Teams’ attributes* sourced from EA Sports’ FIFA video game series, including the weekly updates
Team line up with squad formation (X, Y coordinates)
Betting odds from up to 10 providers
Detailed match events (goal types, possession, corner, cross, fouls, cards etc…) for +10,000 matches

Libraries

We will start by importing the Python libraries we will be using in this analysis. These libraries include: sqllite3 for interacting with a local relational database pandas and numpy for data ingestion and manipulation matplotlib and seaborn for data visualization specific methods from sklearn for Machine Learning and customplot, which contains custom functions we have written for this notebook

Research Question 1

at the end when megre all tables there will be cells have nulls ?

yes ,due to different shapes of tables

Research Question 2

when merged,it will affect in prediction ?

I think it is , and we should select an algorithm to work well with that dataset

Research Question 3

is there a correlation between features?

there is a positive correlation between attack features and a negative between attack and defense features.

I think it is ,and we should select algorithm to work well with that dataset

Research Question 4

Check Hight and Weight Feature follow which distribution?

Research Question 5 & 6

plot figure to show how many players preferred right foot and left foot?

plot to show the distribution of attacking work rate and defense work rate?

How many Countries are in the dataset and list their names?

Conclusions

Analyzing the league, country, player, player attributes, team, team attributes, and match tables gave a better understanding of the data. once features are merged pass it to machine learning algorithm to predict the winner of the future match.

We use the player attributes table to group/cluster the players based on their skills like “passing”,” long pass” etc, to identify which player belongs to this group.

And we can use any table and make our prediction on it, I selected the player attributes to make my own