The All-Math Team: A comparative study to predict end of the year award voting in baseball for the North Coast Athletic Conference using an empirical analysis
Name: Tyler Chumita
Majors: Mathematics, Education
Advisors: Christina Horr and Drew Pasteur
Over the last two decades, sabermetrics havetaken over baseball by means of advanced statistics. Throughout the game one can see how much data analytics affects decision-making both on and off the field. This study focuses on statistics at the collegiate level specifically the NCAC, as it contains data from the College of Wooster. The aim of this Independent Study is to compare three commonly used baseball statistics: the Slash Line, the Three True Outcomes, and Wins Above Replacement (WAR). The Slash Line is one of baseball’s more traditional statistics and only looks at offensive categories. These components include batting average, on-base percentage (OBP), and slugging percentage. The three components are written in this order (Batting Average/OBP/Slugging Percentage) with a slash in between each, hence the name. The Three True Outcomes is another offensive statistic that looks at a player’s home runs, strikeouts, and walks. These components are the outcomes of no fielder contributing to the play. Finally, WAR looks at a player’s total ability in all facets of the game and how valuable they are to their team. This statistic includes hitting, fielding, and base running. WAR calculates one number that deciphers how many wins a player contributes to their team compared to a “replacement level” player (also known as a back-up player). Each statistic was calculated separately using the programming language, R. Players were then ranked based upon their statistics in each respective category such as WAR value, home runs hit, and OBP to name a few. The empirical analysis covered methods such as: linear regression, K-means clustering, and random forests. Nine models were created; the three methods applied to each of the three sabermetrics to produce lists of what players are the best in terms of their quantified value. This led to a prediction on who should be voted All-Conference at the end of the year for each position. The results were compared with the actual voting awards from the 2020-2021 season and K-means clustering when applied to WAR was the most accurate. This study concluded that WAR is the most accurate sabermetric regardless of methodology used but, K-means clustering gave us the best predictor for future award voting.
Related Areas of Study
Numbers + patterns + structures multiplied by a zest for analysis and inquiryMajor Minor
Graduate as a licensed teacher in pre-K-12 in fields ranging from science to music educationMajor Minor Teaching Licensure