Before Thursday night’s game between the Cincinnati Bengals and Cleveland Browns, our Elo ratings predicted a 77 percent probability of a Bengals win. Cincinnati had the higher Elo rating before the game, 1568 to 1420, and it was playing at home, both of which fed a pregame point spread of -8.5 points in the Bengals’ favor.

Instead, the Browns dominated, winning 24-3 (and forcing Bengals quarterback Andy Dalton into one of the worst passing games ever). How unexpected is it for a team favored by 8.5 to lose by 21 points? And what does it say about the confidence we should have in these types of point-margin predictions — whether generated by Elo or otherwise?

Nearly three decades ago, statistician Hal Stern found that, for NFL games, “the margin of victory over the pointspread (number of points scored by the favorite minus the number of points scored by the underdog minus the pointspread) is not significantly different from the normal distribution” with a mean of zero and a standard deviation of 13.86 points. In other words, the likelihood of the actual margin in any given game can be described by a bell-shaped probability distribution centered on the pregame spread (in Stern’s case, the Vegas line).

Stern’s original research had only tested results from the 1981, 1983 and 1984 NFL seasons. But using Vegas spread data from 1978 to 2012, I replicated Stern’s work and confirmed his findings — the final margin of victory in an NFL game can be approximated by a normal random variable with a mean of the Vegas line and a standard deviation somewhere between 13 and 14 (for the entire 35-season sample, that standard deviation was 13.45).

We can also see this effect if we plot a histogram of the prediction errors between the actual scoring margins of games and those predicted by the pregame Elo ratings:

In the case of Elo, the normal distribution predicting a game’s final margin of victory is centered around the difference between the two teams’ Elo ratings divided by 25 (adding or subtracting 2.6 points if the team is at home or on the road), with a standard deviation of 13.65 points. That means the likelihood of last night’s score was about 1.5 percent, based on the pregame ratings and the location of the game.

But Thursday night’s outcome wasn’t the most unexpected of the season thus far. When the Atlanta Falcons faced the Tampa Bay Buccaneers on Sept. 18, they were favored by 5 points; there was only a 0.3 percent chance they would decimate the Bucs by 42 points, which is what ended up happening. Likewise, there was only a 0.3 percent probability that the Miami Dolphins would win by 37 as one-point underdogs against the San Diego Chargers last week.

Here are the most unexpected results of the 2014 season to date:

It bears mentioning that the normal distribution model is an approximation, and approximations can break down at the extremes. If we toss all NFL games since 1978 into buckets based on the probability of exceeding the actual point margin and compare those buckets to the expected frequencies, we can see where the model is over- or under-estimating the likelihood of a given outcome:

If the model is properly calibrated, each bucket should contain exactly 5 percent of all games. And that’s basically the case, but there are small deviations. For example, the model predicts slightly more games than it should in the top and bottom 5 percent buckets, fewer than it should in the 10 to 25 percent and 75 to 90 percent likelihood zones, and more than it should in the buckets between 40 and 60 percent.