14 Feb What is a Classification Algorithm, and How Can You Use it to Project NFL Draft Outcomes?
The 2021 NFL season wound to a close Sunday with a Los Angeles Rams victory over the Cincinnati Bengals in the Super Bowl, but for NFL organizations and diehard fans, that only means that the 2022 NFL Draft Season has begun. To help you understand how teams and fans alike can leverage analytic tools to evaluate draft prospects, this article will walk you through how classification algorithms can be used to project the future outcomes of the upcoming crop of NFL Draft prospects, and show you how NuMantra’s platform makes this process easy for the non-data scientist.
What is a Classification Algorithm?
A classification algorithm is a machine learning approach used to sort objects. A basic example you will be familiar with is your spam filter; if an email contains a set of criteria that meet the algorithm’s threshold, it is marked as spam and does not hit your inbox.
The classification algorithm in our example will sort draft prospects into “hits” and “misses”. We will layer in complexity throughout the process, but the end goal will be to sort draft prospects into these categories as accurately as possible.
It is important to note that classification algorithms are a type of supervised learning, which means that the training set data must be labeled with an outcome. We will address this requirement as we develop our examples.
How Can We Apply this to the NFL Draft?
For our example, we will build two types of models to highlight a couple of the options available in NuMantra’s Machine Learning platform.
- Linear Binary Classifier: This algorithm sorts outcomes into two classes. For our example, we will consider a linear binary classifier that projects whether first-round picks from the 2022 NFL Draft will be “hits” or “misses”.
- Linear Multiclass Classifier: This algorithm sorts outcomes into multiple classes. For our example, we will consider a linear multiclass classifier that projects whether a player drafted in any round of the NFL Draft will be a Pro Bowler, Starter, Backup, or Miss.
These models will both require two types of variables: predictor variables and outcome variables. As the name indicates, the predictor variables will be used to inform our model as it tries to predict the outcome, which we will define in our model-building process.
Where do we get our data?
Before we can get started, we need data. Fortunately, the internet has a wealth of information on NFL Draft prospects; some of this is from paid resources such as Pro Football Focus, while other sites use a hybrid of free and paid information. For this example, we will focus on information that is freely available at Pro Football Reference and ESPN.
- Pro Football Reference (Draft, By Year): This dataset lists all of the players selected in the NFL Draft by year, along with a selection of their NFL statistics and basic college information.
- Pro Football Reference (Draft, Draft Combine): This dataset contains the testing numbers of players who attended the NFL Combine each year. A variety of tests are administered at the combine; this dataset focuses on testing numbers from strength and speed drills.
- ESPN (NFL, Draft, Player Rankings): This dataset has draft analyst Todd McShay’s player grades and player rankings for prospects in draft classes dating back to 2009.
Pro Football Reference offers an export tool, but it requires a bit more work to get the data off ESPN. Fortunately, NuMantra has you covered there; our data extraction and upload tools make pulling numbers more like downloading a PDF, so if you aren’t a fan of McShay’s work, you can use the numbers from your favorite draft analyst.
Our data will be broken into two sets.
- Training Set: Data from past draft classes used to inform model
- Test Set: 2022 draft class data
Since our first data source, which lists the order of player selection in past drafts, is not available until after the draft takes place, the test set would be created from the second and third data sources.
What types of insights do our algorithms provide?
Let’s start with a more basic question: NuMantra’s platform can make the modeling process easier for the non-data scientist, but what does a data scientist do?
Data Scientists work in myriad industries and leverage a wide variety of technologies, but like any other scientist, they are creating hypotheses and devising experiments to test these hypotheses. Their tests involve data rather than volatile chemicals, but the goal is to learn more about a problem through observation.
That means that our algorithm can answer a variety of questions about the relationships between elements included in the model, which could take on many forms.
- We could ask a basic question by building a Linear Binary Classifier that projects whether a first-round pick will be a hit or a miss based on the player’s college.
- We could build a Linear Multiclass Classifier that projects whether a draft pick will be a Pro Bowler, Starter, Backup, or Miss based on his height, weight, and position.
These models from these simple examples are not robust enough to drive the draft selection process, but they can put actual numbers on commentary such as “I can’t remember the last time a quarterback came out of (fill in the blank) school,” or lend insight into how a player’s size at a given position impacts his odds of success. Robust models that account for a player’s off-field background, on-field evaluations, and testing numbers could have enough predictive value to drive personnel decisions, but even as models become more complex, it is helpful to come back to the question of what the model tells you about how the predictor and outcome variables are related.
Building a More Advanced Model
For our example, we will use Pro Football reference’s Approximate Value metric, which attempts to apply a uniform measure across all NFL positions to assess player value. This number is in the Draft by Year dataset on Pro Football Reference.
We will use this metric in two different ways.
- In our Linear Binary Classifier, we will assume that NFL fans are upset when their first-round pick is a bust due to missed games due to injury or a lack of ability, so we will use the raw Approximate Value as our outcome variable.
- In our Linear Multiclass Classifier, we will look at the value a player provided while on the field to determine if he is a Pro Bowler, Starter, Backup, or Miss, and will therefore divide the player’s Approximate Value by his games played to get a per-game metric.
Before we build our model, we need to define thresholds for our prospect categories. This can be done using a formal or informal approach, but cut-off points for the Approximate Values for hit/miss and Pro Bowler/Starter/Backup/Miss need to be defined.
Once we establish the outcome variables, the models need access to predictor variables. The potential predictor variables are those in our three data sets; remember that any data used to inform the model must come from the player’s collegiate career to remain predictive.
- Pro Football Reference (Draft, By Year): Position, College
- Pro Football Reference (Draft, Draft Combine): Position, College, Height, Weight, 40-yard dash, Vertical Jump, Bench Press Reps, Broad Jump, 3 Cone, Shuttle
- ESPN (NFL, Draft, Player Rankings): Player Grade, Player Ranking
NuMantra’s platform will allow you to mix and match these various metrics and test which model performs best. Here are a few examples.
- Linear Binary Classifier (Hit/Miss based on Approximate Value)
- 1. (Outcome Variable) Hit/Miss
- 2. (Predictor Variables) ESPN Player Grade + 40-yard dash + College + Height + Weight + Bench Press Reps
- Linear Multiclass Classifier (Pro Bowler/Starter/Backup/Miss based on App. Value per Game)
- 1. (Outcome Variable) Pro Bowler/Starter/Backup/Miss
- 2. (Predictor Variables) ESPN Player Grade + Broad Jump + 3 Cone + Bench Press Reps
As these examples demonstrate, the limiting factors to the questions you can ask are imagination and available data. For most data scientists, this process of devising questions and experiments is the best part of the process; NuMantra allows the fully-trained data scientist to spend less time writing code and more time developing experiments while also providing a way for the non-data scientist to ask questions.
How NuMantra Makes It Easy
There was a period when coding was considered “the” skill to have for the future, but much as you don’t need to understand the markup behind Microsoft Word to produce a document, or the coding behind Squarespace to produce a website, NuMantra’s platform allows non-data scientists to produce models without the code.
As the screenshots below demonstrate, NuMantra makes every step of the process easy. The data management tool makes it easy to do anything from uploading a .csv file to connecting to a database, and if there are issues with your data, the NuMantra platform has built-in options to clean up the data without the need to code. You can perform operations such as removing null/empty values or replacing empty values with the mean via a simple dropdown menu.
Once your dataset is ready, NuMantra allows you to select your model type from a dropdown menu. Don’t be fooled by the simplicity; as easy as NuMantra’s platform makes this process, you are leveraging powerful tools similar to what data scientists around the world put into use through code.
NuMantra’s platform allows the user to spend more time creating novel models and less time on code issues, which creates more opportunity to discover useful insights. In this case, we can spend more time studying combinations of player metrics and whether a particular group of predictor variables can create a more accurate model. We covered some key considerations in that process, but the data scientist has to make other considerations, such as the number of past draft classes to include in the training sample, that will impact the model’s performance. By simplifying the process, NuMantra allows the investigator to study more of these variations and discover the optimal approach.
This example of building models to project outcomes for NFL Draft prospects is serious business for the teams in the NFL, but the casual fan building models for fun is likely to see how this general problem-solving process could be applied to their own business. For instance, this example translates neatly to the college admissions process, where a model that accounts for academic performance, extra-curricular activities, and other relevant criteria could gauge whether an applicant is likely to thrive at a particular institution.
Even with the most robust data, certain factors are difficult to capture in a model, particularly one that models human behavior, but we can learn an immense amount if we approach the modeling process with the understanding that the results are constrained by the training inputs. With NuMantra, non-data scientists can ask the types of questions that will help their organization connect with the right audiences, bring in the right contributors, and optimize the efficiency of their operation.
Interested in enhancing your organizational decision-making with AI? Connect with a NuMantra expert to schedule a demo and see how your organization can boost efficiency.
Sorry, the comment form is closed at this time.