The Uncanny Accuracy of Polling Averages*, Part II: What the Numbers Say

This is the second article in a three-part series in which I’m addressing a question from a reader whom I’m calling Skeptical Sam.

Sam wondered why our election forecasts seem so confident. For instance, in the Senate race in Pennsylvania, the Democrat, Joe Sestak, trails the Republican Pat Toomey by around 7 points in most polls. Sure, Mr. Toomey is the favorite, Skeptical Sam concedes. But is he really a 92 percent favorite, which is where we have him? It’s early October, and the race is fairly close: shouldn’t we be hedging a little bit more?

I agree that some of our forecasts seem assertive. Expert forecasters, including The New York Times’s political desk, describe the Senate race in Pennsylvania as either still being a tossup or as merely “leaning” toward Mr. Toomey.

But in Part I, I explained why our intuitions can mislead us when it comes to something like evaluating Mr. Sestak’s chances. The data from recent elections, by contrast, suggest that even relatively small leads in the polls can be difficult to overcome.

I have a database containing almost all polls conducted in all Senate and governors’ races since 1998. I say almost all because it excludes internal polls released by campaigns and other explicitly partisan groups, and it excludes Internet polls conducted by Zogby Interactive, which in my view are not scientific. My database also does not include polls for irregularly scheduled special elections, like the one in Massachusetts this year — only contests in November.

Mr. Toomey’s lead is around 7 points in the polls. How have Senate and governor candidates with a 7-point lead in the polling average — with about a month to go in the campaign — fared in the past? Let’s construct about the simplest possible study around this:

Step 1. Take all polls conducted 30 to 60 days from the election.

Step 2. Average them together.

That’s it. We’re not doing any of the fancy stuff that we do in our actual Senate model, like weighting the polls based on sample size or the quality of the pollster. We’re just taking a simple average.

There is one “trick,” though: we’re only looking at races in which at least two different polling firms published a survey in the 30-to-60-day window. If you have just one company polling a race, you don’t really have much of an average, properly speaking. Our model addresses this by assuming much greater uncertainty in cases where the polling data is sparse.

We can then see whether the candidate with a lead in the polling average ultimately won his election; those results are below.

Senate candidates who have a lead of 6 to 9 points in the simple polling average, with 30 days to go until the election — about where Mr. Toomey’s lead stands now — are undefeated since 1998. This isn’t quite as impressive as it sounds, since there are only seven such candidates in the database. But if we expand the scope of our study just a bit, it proves to be the norm rather than the exception. Senate candidates with a slightly larger lead in the polling average — 9 to 12 points — are also undefeated. Candidates with a slightly smaller lead in the polling average — 3 to 6 points — have a pretty good track record, with nine wins against three defeats.

Indeed, no Senate candidate with a lead of more than 5.5 points in the polling average, with 30 days to go in the race, has lost since 1998: these candidates are 68-0. (Martha Coakley in Massachusetts would have been an exception, but special elections, where the polling can be much more erratic because of lower turnout, are outside the scope of our study.)

Candidates for governor with a lead of 6 to 9 points in the polling average, meanwhile, have a 9-to-2 record. If we combine their numbers with those of the Senate candidates, we find that candidates with a lead comparable to Mr. Toomey’s (6 to 9 points, with 30 days to go) have 16 wins against two defeats, which corresponds to an 89 percent winning percentage.

Our actual Senate and governor models use a somewhat different (and more complex) process to calculate the probability of a candidate winning his race. But they are derived from the same data, and they usually come up with similar numbers.

Mr. Toomey, for instance, is regarded as a 92 percent favorite by our model, which corresponds quite nicely to the 89 percent winning percentage that I described above. His winning percentage is a tiny bit higher than it might be for another candidate with a similar lead in the polls, because some of the other factors we account for in our model. For instance, there are an especially large number of polls in Pennsylvania, and they are all quite consistent with one another, which speaks toward his lead being slightly more robust than usual. In other cases — if the polling is sparse or inconsistent, or if an unusually large number of undecided voters remain in the race — the model will increase the uncertainty it attaches to a forecast.

The above data should be used cautiously: it does not apply, for instance, to House races, which are considerably harder to poll, and it does not apply to primaries, which are much harder to poll. It applies to the polling average, and not to individual polls.

But the bottom line is that, over the course of the past half-dozen election cycles, constructing a simple polling average has provided a reliable indication of which candidates will win the general election in Senate and governors’ races. Polling deficits in the high single digits, with 30 days to go in the campaign, have only rarely been overcome. Candidates who trail by double digits in the polling average have almost never defied the odds. Even relatively small leads of 3 to 4 points can be surprisingly meaningful.

That is why I don’t consider Pennsylvania a “tossup.” Now some may still argue that the Pennsylvania race is particularly unusual: that even though a lead of 6 or 7 points in the polls is ordinarily quite solid, there are special circumstances in this race. Or some might argue that the polls in all races are much less reliable than they have been in the recent past.

The first argument was addressed to some extent in yesterday’s article: it’s tempting to think of each Senate race as its own little unique snowflake. But the polling has provided a reliable guide in the vast majority of races. It is not enough for a race to be unique: it has to be unique in a way that renders the polling much less accurate than it ordinarily would be. If you think you’ve encountered such a case, you should be prepared to make a strenuous argument for it.

The second — that polling as a whole is deteriorating — is the one I have more sympathy toward, and also the one that would be the most consequential for our forecasts. We’ll address some of these concerns in Part III of the series.

FiveThirtyEight

The Uncanny Accuracy of Polling Averages*, Part II: What the Numbers Say

Comments