In 2007, the sports media couldn’t stop talking about “Spygate,” in which the New England Patriots were caught illegally taping sideline defensive signals from New York Jets coaches during the teams’ opening week matchup. It soon became known that Patriots coach Bill Belichick had been engaging in such activity since 2000, and although the league had only expressly clarified the practice as forbidden in a September 2006 memo, the perception still lingers that New England gained an unfair advantage during the first seven seasons of Belichick’s tenure with the team. That period happened to include three Super Bowl championships and five division titles.
Such sentiment bubbled to the surface again this week, when Philadelphia Eagles cornerback Cary Williams ripped the Patriots as “cheaters” ahead of the two teams’ mid-August joint practice.
Williams’ comments are nothing new. In the years after Spygate, many players and coaches have alluded to the incident as a means of questioning the legitimacy of the Patriots’ championships. But what does the evidence say about the actual effect of the Patriots’ taping?
We can count the rings: three Super Bowl wins while taping, zero after. But given that the best team in the NFL wins the Super Bowl only about 24 percent of the time, it’s possible such a split could occur due to chance alone.
A more legitimate way to examine the question is to compare the Patriots’ total record under Belichick before Spygate broke (including the offending Sept. 9, 2007, game against the Jets) to their record after. While the Patriots were taping opposing signals, they won 69.3 percent of their 127 games (including the playoffs), but since they ostensibly quit the practice, their winning percentage has been even higher — 75.6 percent — in 123 games. At a glance, it seems rule-bending didn’t add much of note to the Patriots’ chances of winning.
Of course, that’s also a simplistic approach; it just uses binary team-wide outcomes and ignores complicating factors like differences in talent between the two eras of New England football. A more rigorous study would track the Patriots’ offensive performance only (aside from whispers about additional tapes of offensive signals, the Spygate scandal focused mainly on the Patriots’ theft of defensive play calls) and control for the team’s fluctuating ability level.
To that end, I gathered data on the Las Vegas point spread and over/under point total for each New England game back to 2000. Because the taping was of defensive signals, I focused on the Patriots’ points scored relative to that which Vegas predicted (we can compute “predicted points” in any game by subtracting the spread from the over/under and dividing the result by two). And because whatever advantage the tapes yielded could only be gleaned upon postgame review, for use when the Patriots faced that opponent again, I limited my sample to regular season and playoff games where New England was playing an opponent they had already faced earlier in the season.
Those filters produced 61 games — 31 while taping, and 30 since the practice was ceased:
Relative to Vegas’s expectations, the Patriots scored 2.4 more points per game than they “should” have during the pre-Spygate era. That might lend credence to the idea that taping defensive signals gave them an advantage, if it weren’t for the fact that they also outscored Vegas’s expectations by exactly 2.4 in the post-Spygate era as well. That means New England’s offensive overachievement was more likely due to great coaching and quarterback play, which persisted across both eras, than to any illicit edge.
But before we close the book on Spygate, there is the not-so-small matter of the playoffs. In the postseason, New England’s pre-Spygate record was 12-2; after, it’s fallen to 6-6 (that’s a difference just on the edge of what could occur due to random variation in a small sample). Things get more complicated if we look at New England’s scoring relative to Vegas in the postseason. In eight playoff games against repeat opponents before Spygate, the Patriots exceeded offensive expectations by 4.0 points per game, beating the market forecast five times. In nine tries since, they’ve fallen short of expectations by an average of 6.6 points per game, failing to meet the forecast seven times.
Using conventional testing techniques, this difference is, again, right on the edge of statistical significance. With a two-tailed t-test (which evaluates the hypothesis that a difference could occur by chance in either a positive or a negative direction), there’s a slightly greater than 5 percent chance that such a split could be observed randomly (p=0.061). But under a one-tailed t-test (which only considers the possibility of a change in one direction), the probability of the split is below the threshold of what can be explained by chance alone (p=0.030).
Because we would only expect taping to improve the Patriots’ performance, a one-tailed t-test is probably the appropriate choice, which, in turn, suggests there’s something real to the Patriots’ before-and-after Spygate split in the playoffs. But there’s also one more consideration: the Wyatt Earp Effect, which we’ve covered several times at FiveThirtyEight. In short, it’s a phenomenon that can cause conventional significance testing to understate the probability of an event occurring due to chance. Because we pre-selected the Patriots as our test subject on the basis of Spygate — and then specifically selected their playoff games out of that sample — it’s possible that our p-values are not answering the right question (what are the odds that any NFL team would observe a similar split in offensive performance?).
As is usually the case when dealing with real-world data, the answer isn’t totally conclusive. From a holistic viewpoint, using all repeat-opponent games in the regular season and playoffs, there hasn’t been any difference — significant or otherwise — in the Patriots’ offensive performance since the league mandated the team stop taping opposing play calls. Looking at the playoffs alone yields a more nebulous picture but also introduces methodological questions about the aptness of conventional significance testing. And we must always keep in mind that splits happen if we look for them hard enough.
All of this means that, even almost a decade later, the controversy over Spygate isn’t likely to go away anytime soon.