How Our New NFL Model Did This Season

Quarterbacks ruled the 2019 NFL season, with Patrick Mahomes bringing the Lombardi Trophy to Kansas City and Lamar Jackson emerging as the league MVP. Quarterbacks were in control of the FiveThirtyEight prediction model, too, as a key factor of the new version of our Elo rating system, which adjusted for the performance of every starting QB. Now that the season is over, in the spirit of checking our work, we wanted to look back at the 2019 season and see how well the new system did — and whether it improved on our old, simple Elo system from years past.

One simple way to judge prediction accuracy is to look at how close the predicted point spread came to the actual score differential of each game (squaring the errors to give a larger penalty to bad misses). And in that department, new Elo beat old Elo this season, albeit by a smaller margin than we might have expected based on the preceding five seasons.

But our preferred way to judge the accuracy of a forecast is using Brier scores, which are essentially the average squared error between a probabilistic forecast and what actually happened.¹ (Lower Brier scores are better because they mean your prediction was closer to being correct.) And by that standard, our new Elo ratings basically performed as expected. It was a bit of an unpredictable NFL season according to either system, particularly during the playoffs, but the improvement in Brier score from the old version of Elo (0.224) to the new Elo (0.219) by the end of the 2019 season ended up being almost exactly what it had been when it was backtested over the previous five seasons, on average:

Using Brier scores, let’s look at how the model’s accuracy evolved over time. Very early in the season, new Elo had an edge, perhaps because it was accounting for the many quarterback injuries that beset teams during the first few weeks. Then things in the league got weird. And the old system — which didn’t adjust for QBs, travel distance or rest days — was actually handling the weirdness better for most of the first half of the year. The new model didn’t pull ahead for good in terms of seasonlong Brier score until Week 11, at which point it maintained a lead and even expanded it, with injuries and teams resting starters in the closing weeks of the schedule.

The playoffs were a bit rough for the new model, primarily because of two games: Seattle at Philadelphia in the wild-card round (where new Elo’s Brier was 0.480, compared with 0.380 for the old model) and Tennessee at Baltimore in the divisional round (new Elo’s Brier was 0.755 — really bad! — compared with 0.582 for the old system). Our backtesting suggested that there are real predictive effects to late-season QB hot and cold streaks, and that favorites tend to play better in the postseason, but both of those factors ended up haunting the new model in that pair of upsets. Overall in the playoffs, new Elo had a worse Brier score (0.272) than the old model did (0.261) — although, as we mentioned earlier, that didn’t really cause it to do worse than expected for the entire season overall. And, of course, it also helped that the new system did much better in the conference championships and the Super Bowl.

Finally, just for fun, let’s look at the games in which the new model had its best and worst picks of the season, relative to the old system:

QB-adjusted Elo’s greatest hits (and misses) of 2019

Highest and lowest Brier score differentials between FiveThirtyEight’s old and new QB-adjusted NFL Elo models by game, 2019 season

Hits:	Winner		Loser		Winner’s W% by Elo
Date	Team	QB	Team	QB	Old	New	Brier Diff.
10/27/19	GB	Rodgers	KC	Moore	32%	58%	-0.296
12/29/19	TEN	Tannehill	HOU	McCarron	35	59	-0.249
10/6/19	OAK	Carr	CHI	Daniel	27	47	-0.242
12/29/19	CHI	Trubisky	MIN	Mannion	28	45	-0.207
9/22/19	SF	Garoppolo	PIT	Rudolph	53	79	-0.177
9/5/19	GB	Rodgers	CHI	Trubisky	24	36	-0.165
10/6/19	BAL	Jackson	PIT	Rudolph	42	58	-0.165
10/13/19	NYJ	Darnold	DAL	Prescott	30	42	-0.159
12/22/19	NYJ	Darnold	PIT	Hodges	34	46	-0.142
11/17/19	DAL	Prescott	DET	Driskel	55	75	-0.140
Misses:	Winner		Loser		Winner’s W% by Elo
Date	Team	QB	Team	QB	Old	New	Brier Diff.
10/13/19	PIT	Hodges	LAC	Rivers	39%	20%	0.266
12/1/19	CIN	Dalton	NYJ	Darnold	42	24	0.238
9/22/19	CAR	K. Allen	ARI	Murray	54	35	0.219
9/29/19	CAR	K. Allen	HOU	Watson	35	22	0.189
11/3/19	KC	Moore	MIN	Cousins	56	39	0.178
1/11/20	TEN	Tannehill	BAL	Jackson	24	13	0.174
11/3/19	DEN	K. Allen	CLE	Mayfield	58	41	0.170
11/10/19	PIT	Rudolph	LAR	Goff	56	40	0.158
9/22/19	NO	Bridgewater	SEA	Wilson	42	30	0.153
9/29/19	NO	Bridgewater	DAL	Prescott	63	47	0.147

Unsurprisingly, most of these examples revolved around backup quarterbacks, for good or bad — either because the regular starter was knocked out (which old Elo didn’t know about) or because he was returning after a long absence. Sometimes adjusting for this resulted in an overcorrection, such as when Pittsburgh was down to third-string QB Devlin Hodges in Week 6 yet somehow managed to still win. But more often it helped, such as when Mahomes went down and Kansas City lost with Matt Moore at the helm in Week 8.

So overall, we think new Elo had a solid rookie season, and the new changes helped the model’s predictions. Although there are a few areas of improvement to potentially investigate over the offseason, it was encouraging that the new system outperfomed the old system by almost precisely what we expected based on our backtesting. It was also a good sign that the model was able to consistently outpredict the average reader in our forecast game, “winning” all but two weeks of the season and continuing the old system’s pattern of dominance over the field from previous seasons:

Our new Elo had a pretty good season vs. the field

Weekly average differences between points won by our new QB-adjusted Elo and by readers in FiveThirtyEight’s NFL prediction game, 2019 regular season and playoffs

Week	Games	Avg. Net Pts	Week	Games	Avg. Net Pts
1	16	+7.9	11	14	+27.8
2	16	+13.6	12	14	+54.0
3	16	-1.0	13	16	+35.6
4	15	-2.7	14	16	+57.7
5	15	+19.8	15	16	+3.4
6	14	+24.6	16	16	+14.1
7	14	+8.1	17	16	+72.2
8	15	+26.8	Playoffs	11	+69.4
9	14	+45.9	Season total	267	+548.6
10	13	+71.4

Speaking of which, congrats to Jordan Sweeney, who led all readers in the postseason with 275 points, and to Griffin Colaizzi, who used the Super Bowl to pull ahead and win the full-season contest with 1,126.2 points. And a big thanks to everyone who played all season! We can’t wait to fire up the model again in about six months and try to get that Brier score even lower next year.

Footnotes

So if I said the Miami Sharks had a 75 percent chance of winning and they won, that would be a Brier score of (1 – 0.75)^2, or 0.063. (That’s a good pick!) If they lost, however, my Brier score would be (0 – 0.75)^2, or 0.563. (That’s bad!)

Footnotes

Comments