Modeling Premier League Success with Extreme Gradient Boosting

Name: Dillon Wheeler
Majors: Statistical and Data Sciences
Minors: Economics
Advisors: Christina Horr, Moses Luri

In this study data over the course of seven seasons in the Premier League, the highest division soccer league in England, has been collected and analyzed. Multiple statistical methods are applied to data on these Premier League teams in order to not only predict future match outcomes, but to also discern significant in-game statistics that are associated with winning matches. This study begins by constructing a multinomial logistic regression model to give an initial impression of potential significant game statistics, and then proceeds to apply a machine learning algorithm known as XGBoost to increase predictive accuracy of match outcomes in the league (wins, losses, and ties). Several models are built to adjust to the data, with the final model being an XGBoost model that correctly classifies the binomial variable, result, with outcomes of win or not win. The final model predicted match outcomes with correctly about 75 percent of the time. This study will discuss in detail how these various algorithms work, adjustments needed to maximize effectiveness on the given data set, and how the predictive power of machine learning models can give us insights into large and complex data. Later, I discuss which Premier players are most influential in making their team win based upon the final model, as well as the issues the model had with it’s true positive misclassification rate, as well as misclassification of matches that were ties.

Posted in Comments Enabled, Independent Study, Symposium 2022 on April 26, 2022.

2 responses to “Modeling Premier League Success with Extreme Gradient Boosting”

  1. Brendan Bittner says:

    Hi Dillon, This is a really interesting look at statistical modeling for soccer. I’m wondering, beyond clubs using this data in their team-building, what supporters and fans of the Premier League could glean from this data. Perhaps there are inefficiencies in the ways fans wager on games, etc. I’d also be interested to know what piece of data came up in your research that you found most surprising. Excellent work and congratulations!

  2. Dillon Wheeler says:

    Thanks for the questions, Brendan! Often times supporters of different teams will have positive/negative reactions to certain players being signed to the club they support based upon public perception rather than their actual skill. Maybe this model could help combat the difference between perceived player performance vs. actual performance.

    To answer your second question, I would say that I was surprised at how irrelevant successful dribbles were in predicting match results. Before I made the model, I would have assumed that dribbling past a defender successfully more often would have a positive association with winning games.

Related Posts

Alix Printup '23

I.S. research explores psychology’s approach to historical trauma within Indigenous community

Audrey Klosterman '23

Theatre student examines and executes stage adaptation

Zoe Seymour '23

‘Overlooked Adoptees’ I.S. research earns Equity, Diversity, and Inclusion Award at the Senior Research Symposium

Related Areas of Study

Statistical & Data Sciences

Use statistics, math, and computer science to gain insights into data and solve real-world problems.

Major Minor


Learn how humans organize to sustain life and enhance its quality from a diversity of economic perspectives

Major Minor

Connect with Wooster