Modeling Premier League Success with Extreme Gradient Boosting

Name: Dillon Wheeler
Majors: Statistical and Data Sciences
Minors: Economics
Advisors: Christina Horr, Moses Luri

In this study data over the course of seven seasons in the Premier League, the highest division soccer league in England, has been collected and analyzed. Multiple statistical methods are applied to data on these Premier League teams in order to not only predict future match outcomes, but to also discern significant in-game statistics that are associated with winning matches. This study begins by constructing a multinomial logistic regression model to give an initial impression of potential significant game statistics, and then proceeds to apply a machine learning algorithm known as XGBoost to increase predictive accuracy of match outcomes in the league (wins, losses, and ties). Several models are built to adjust to the data, with the final model being an XGBoost model that correctly classifies the binomial variable, result, with outcomes of win or not win. The final model predicted match outcomes with correctly about 75 percent of the time. This study will discuss in detail how these various algorithms work, adjustments needed to maximize effectiveness on the given data set, and how the predictive power of machine learning models can give us insights into large and complex data. Later, I discuss which Premier players are most influential in making their team win based upon the final model, as well as the issues the model had with it’s true positive misclassification rate, as well as misclassification of matches that were ties.

Posted in Comments Enabled, Independent Study, Symposium 2022 on April 26, 2022.

2 responses to “Modeling Premier League Success with Extreme Gradient Boosting”

Brendan Bittner says:

April 28, 2022 at 8:58 pm

Hi Dillon, This is a really interesting look at statistical modeling for soccer. I’m wondering, beyond clubs using this data in their team-building, what supporters and fans of the Premier League could glean from this data. Perhaps there are inefficiencies in the ways fans wager on games, etc. I’d also be interested to know what piece of data came up in your research that you found most surprising. Excellent work and congratulations!
Dillon Wheeler says:

April 29, 2022 at 5:21 pm

Thanks for the questions, Brendan! Often times supporters of different teams will have positive/negative reactions to certain players being signed to the club they support based upon public perception rather than their actual skill. Maybe this model could help combat the difference between perceived player performance vs. actual performance.

To answer your second question, I would say that I was surprised at how irrelevant successful dribbles were in predicting match results. Before I made the model, I would have assumed that dribbling past a defender successfully more often would have a positive association with winning games.

Senior research project ties together arts and sciences majors to reveal the morality underlying psychological disorders

Grace Dunlay ’26 worked closely with mentor Heather Fitzgibbon, professor of sociology and anthropology as guiding force throughout her I.S.

Sociology major examines the effect of Airbnb short-term rentals on neighborhood identity

Neuroscience major researches effects of drug and alcohol use on young brain development

Related Areas of Study

Statistical & Data Sciences

Use statistics, math, and computer science to gain insights into data and solve real-world problems.

Major Minor

Economics

Learn how humans organize to sustain life and enhance its quality from a diversity of economic perspectives

Major Minor

Helpful Links

Info For

Locations

Guides

Modeling Premier League Success with Extreme Gradient Boosting

2 responses to “Modeling Premier League Success with Extreme Gradient Boosting”

Related Posts

Senior research project ties together arts and sciences majors to reveal the morality underlying psychological disorders

Sociology major examines the effect of Airbnb short-term rentals on neighborhood identity

Neuroscience major researches effects of drug and alcohol use on young brain development

Related Areas of Study

Statistical & Data Sciences

Economics

Connect with Wooster

What can we help you find?

Helpful Links

Info For

Locations

Guides

Modeling Premier League Success with Extreme Gradient Boosting

2 responses to “Modeling Premier League Success with Extreme Gradient Boosting”

Related Posts

Senior research project ties together arts and sciences majors to reveal the morality underlying psychological disorders

Sociology major examines the effect of Airbnb short-term rentals on neighborhood identity

Neuroscience major researches effects of drug and alcohol use on young brain development

Related Areas of Study

Statistical & Data Sciences

Economics

Connect with Wooster