Why NFL Correlation Is So Hard
Correlation in NFL games is a surprisingly complex topic. It may seem simple to understand: when a quarterback throws a touchdown, the receiver’s yards or touchdowns also increase. Other offensive players are also correlated. But it’s more complicated than that.
Correlation is an important factor in same game parlays, in which people select multiple players from the same game or same team. Operators need to adjust their lines and odds to account for the increased or decreased chance of correlated events.
Pricing based on correlation is important in every sport. Without this, payouts will not reflect the underlying risk of a parlay of multiple players: it won’t be accurate.
Overall there’s a significant amount of correlation in National Football League games, but it’s not evenly distributed. HotStreak's data science team has poured over every possible combination of players. The result: many combinations of players have no correlation. But there are some that have a significant amount of correlation.
It helps to think of a football game as two different games in one. Team A’s offense vs Team B’s defense is one game. And Team B’s offense against Team A’s defense. The relationship within the same offense or within the same defense are stronger than the relationship between an offense and defense.
Why does correlation matter?
HotStreak’s ability to measure correlation and use technologies such as artificial intelligence and machine learning allows us to price all combinations of markets correctly. So we can offer all combinations of player props and users can stack as many as they like (see example below of HotStreak vs competitors). As a result, users get a payout that reflects the underlying probability of their entry. This means users can either: play a “safer” strategy by placing their entries in the same direction of correlated markets (EG: Patrick Mahomes “over passing” and Travis Kelce “over receiving” for positively correlated), or they can make more “risky” strategies by placing entries in the opposite direction of correlated markets (EG: Patrick Mahomes “under passing”, Travis Kelce “over receiving” for positively correlated).
Some may mistakenly believe that HotStreak’s multiples are sometimes lower than other companies due to correlation calculations. However, they may not always realize that while HotStreak will recognize positive correlation, which can lower the payouts on an entry, Hotstreak will always have another entry that has a higher payout on the related negative correlation. For example:
Positive Correlation:
- Competitor: QB Pass Yards/WR Receiving Yards: Over/Over: Multiple of 3x
- HotStreak: QB Pass Yards/WR Receiving Yards: Over/Over: Multiple of 2.7x
Negative Correlation:
- Competitor: QB Pass Yards/WR Receiving Yards: Over/Under: Multiple of 3x
- HotStreak: QB Pass Yards/WR Receiving Yards: Over/Under: Multiple of 4.4x
Moreover, even if other companies can offer slightly higher multiples for positive correlation, this isn’t sustainable for a business. Over time, companies that can’t account for correlation will take major losses and will either have to shut down—or stop offering these correlated markets in order to slow down the losses.
Finally, because of this technology, HotStreak can adjust lines and odds for multiple markets based on one entry: which is correlated risk.
Few other operators can do this because they don’t measure correlation. As a result, they either take the losses, or restrict certain combinations of props they cannot price correctly or reduce payouts below the market value. Since HotStreak can calculate correlation, we can offer higher multipliers to our users, and more combinations of markets for them, because we know the true price.
In terms of risk management, correlation allows us to assess where risks are much better, which leaves us much less vulnerable to a black swan event, which we’ve seen happen elsewhere in the industry.
Challenges
While it may seem obvious that certain relationships like a quarterback and wide receiver are correlated, the challenge is pricing. How do you calculate the probability for two players in a parlay based on the correlation? Now add in a third prop and that problem becomes exponentially more difficult. Now do that five more times. Even if you can do these calculations, you cannot offer the entry under a fixed payout schema.
There are a number of factors that make calculating correlation difficult. First, the NFL season is relatively short in terms of the number of games—compared to other leagues such as the NBA or MLB. That means there is less data each year and less opportunity to adjust models.
Rookies are another challenge to traders and artificial intelligence alike. A small number of rookies start the season playing significant roles. But without any NFL history, it’s difficult to estimate how those players will play and how their teammates will play with them.
Another challenge is “unconventional” players, who break the mold of their position. For example, quarterback Lamar Jackson is known for rushing much more than other quarterbacks. Christian McCaffrey is a running back with exceptional receiving skills, making him a dual threat in the backfield. George Kittle is a tight end known for extraordinary blocking and receiving skills. These types of players make calculating correlation difficult due to the unique skills that set them apart from other players in the same position.
Correlation occurs across all sports—we’ve previously talked NBA and esports. However each sport has its specific nuances. In the NFL, some relationships between players are intuitive, such as passing yards and receiving yards. The NFL may not have as many relationships compared to some other sports like esports, which is heavily correlated. But the knowledge about these relationships for casual users is much higher.
Surprises
There are some surprising cases of correlation. For example, a kicker and a quarterback have a relationship. Our data science team has found that extra points and passing touchdowns are strongly correlated. This makes sense since earning an extra point requires a touchdown to happen. Meanwhile, field goals and passing touchdowns are negatively correlated. When a team fails to pass into the end zone, it will often bring out the kicker to get 3 points.
So Total Kicking Points is an interesting market with respect to the quarterback. An average quarterback may actually lower Total Kicking Points since the kicker’s points will mostly come from extra points. But a bad quarterback may actually boost a kicker’s total points since the kicker’s primary source of points will be field goals. Meanwhile, a great quarterback could do either of these, or both. So if you can build a better narrative around a quarterback's performance, you can also make a better estimate about how the kicker will do.
This is just one example of the nuances involved in addressing correlation in the NFL. It's a challenging problem, which requires a range of technology including AI/ML, to solve. But it's one which HotStreak is excited to keep working on.