The All-Math Team: A comparative study to predict end of the year award voting in baseball for the North Coast Athletic Conference using an empirical analysis

Tyler Chumita

Name: Tyler Chumita
Majors: Mathematics, Education
Advisors: Christina Horr and Drew Pasteur

Over the last two decades, sabermetrics havetaken over baseball by means of advanced statistics. Throughout the game one can see how much data analytics affects decision-making both on and off the field. This study focuses on statistics at the collegiate level specifically the NCAC, as it contains data from the College of Wooster. The aim of this Independent Study is to compare three commonly used baseball statistics: the Slash Line, the Three True Outcomes, and Wins Above Replacement (WAR). The Slash Line is one of baseball’s more traditional statistics and only looks at offensive categories. These components include batting average, on-base percentage (OBP), and slugging percentage. The three components are written in this order (Batting Average/OBP/Slugging Percentage) with a slash in between each, hence the name. The Three True Outcomes is another offensive statistic that looks at a player’s home runs, strikeouts, and walks. These components are the outcomes of no fielder contributing to the play. Finally, WAR looks at a player’s total ability in all facets of the game and how valuable they are to their team. This statistic includes hitting, fielding, and base running. WAR calculates one number that deciphers how many wins a player contributes to their team compared to a “replacement level” player (also known as a back-up player). Each statistic was calculated separately using the programming language, R. Players were then ranked based upon their statistics in each respective category such as WAR value, home runs hit, and OBP to name a few. The empirical analysis covered methods such as: linear regression, K-means clustering, and random forests. Nine models were created; the three methods applied to each of the three sabermetrics to produce lists of what players are the best in terms of their quantified value. This led to a prediction on who should be voted All-Conference at the end of the year for each position. The results were compared with the actual voting awards from the 2020-2021 season and K-means clustering when applied to WAR was the most accurate. This study concluded that WAR is the most accurate sabermetric regardless of methodology used but, K-means clustering gave us the best predictor for future award voting.


Posted in Comments Enabled, Independent Study, Symposium 2022.

Comments are closed.

Related Posts

Head shot of Natalie Bean

Sioux Resistance: How the Lakota, Dakota and Nakota People Maintain Their Fight Against the United States for Sovereignty and Land

A Knock-out Experiment on a Neuronal Boolean Model


The Infection Frequency and Severity of Batrachochytrium Dendrobatidis in Northern Two-Lined Salamanders in Wooster Ohio

Related Areas of Study


Numbers + patterns + structures multiplied by a zest for analysis and inquiry

Major Minor


Graduate as a licensed teacher in pre-K-12 in fields ranging from science to music education

Major Minor Teaching Licensure

Connect with Wooster