A Warning on the Accuracy of Primary Polls

After another wild polling ride in Michigan, it is time for a reflection on just how accurate primary and caucus polls have been — both in an absolute sense and as compared with past years. This discussion, of course, also has implications for the FiveThirtyEight forecast model, which is based upon the polls.

The short version: the polls have been reasonably good in the last few days before the election. Not perfect by any means — worse than general election polling typically is, for example. But no worse, and probably somewhat better, than in past primaries.

In densely polled states — that term, importantly, would disqualify Colorado — there haven’t been any huge surprises on Election Day itself. If you think it counts as a surprise that Mitt Romney won Michigan by three points when polls showed a rough tie, or that Rick Santorum narrowly won Iowa when he was a couple of points back, you don’t have a realistic conception of how reliable primary and caucus polling is.

On the other hand, the polls have been pretty awful at most points prior to about three days before the election, seeing surges and momentum shifts that often dissipated.

The chart below tracks the error in the polls and compares it to the number of days in advance of the election that they were conducted. The error is measured by looking at how much the polls missed the final margin between the top two candidates. For example, if Newt Gingrich beat Mitt Romney by 12 points in South Carolina, and the poll called for Mr. Gingrich to win by 5 points instead, that would count as a 7-point error. And if the poll had forcasted Mr. Romney to win the state by 5 points instead, it would represent a 17-point error.

Only the candidates who actually finished in the top two are considered. If an Iowa poll had Mr. Romney in first, Ron Paul in second and Rick Santorum in third, this method looks only at the difference it showed between Mr. Romney and Mr. Santorum, ignoring the value it had for Mr. Paul. (This is the same technique that I use to calculate my pollster ratings.)

On average, a poll conducted on the day just before the election has missed the final margin between the candidates by about 4 percentage points. That is reasonably good; the comparable statistic for state polls in presidential general elections is something like 2 or 3 points, and primaries and caucuses are much more challenging to poll.

However, the errors have increased significantly the further you go out. Polls conducted just three days before the primary have missed by an average of about 7 points, and those conducted a week out have missed by about 10.

And the whole period from about one week to two weeks before the primary has been a disaster, with an average miss of about 12 points. That’s just the average, not even the worst of it; quite a few polls, especially in Florida and South Carolina, missed by 20 or more points.

Things, oddly, actually get a bit better when you go further back than that. Polls conducted a month before the primary have missed by an average of about 9 points — actually a bit better than those only a week or so in advance.

This could just be a fluke — this looks like a ton of data, but almost all of it is from about six states, some of which voted at the same time as one another and were subject to the same currents of momentum.

With that said, if you see a sudden shift in the momentum in a state, it’s at least worth considering what the polls had said about the state beforehand. The momentum shifts — at least as measured by the polls — have been very significant in this race, and unlike anything we have seen routinely in the past. The problem is that sometimes that momentum has been a false alarm, with the polls soon reverting back to form. The exception has been momentum swings in the final few days of the campaign; those usually have held up and have been reflected in the actual results.

The FiveThirtyEight forecast model, as you might expect, has been affected by these quirks. Unlike most of our other forecasting products, which tend to blend polls with various types of economic or demographic data, our primary forecasts look at polls and polls alone. In fact, they double-down on them: the program is designed to place a heavy emphasis on the most recent polls and tries to infer what momentum exists in the race and extrapolate that forward.

If you look at how the FiveThirtyEight forecasts have performed on Election Day itself, they’ve done pretty well. On average, they’ve missed the final margin between the top two candidates by 2.8 points so far.

(Note: I exclude Nevada from the calculation, although the forecast there was pretty good, because we issued that prediction only a day or two before the state voted. We did not issue forecasts, thankfully, for Minnesota, Colorado or Maine, since the polling there was thin to nonexistent.)

The 2.8-point miss is a fair bit better than how individual polls have done: it is useful to take an average of different surveys on the chance that their errors will cancel out. In addition to taking a simple average, however, the FiveThirtyEight model also does some more complicated stuff. It weights the polls differently based on their past accuracy and their sample size, for instance, although in practice this makes very little difference. What does distinguish the FiveThirtyEight model is that it is very aggressive about trying to determine the momentum or trend in the race.

This has served the model well on Election Day. By comparison, the Real Clear Politics forecasts — which use a perfectly sensible but simpler and more conservative approach — have missed by an average of 4.4 points. Most of the difference comes from Iowa and South Carolina, states where there was a very late momentum swing that the FiveThirtyEight model captured more fully.

However, this aggressive approach has decidedly not paid dividends at earlier periods in these contests, when the model made big bets on what turned out to be false starts. On average, the forecasts we published one week before each election missed the final margin by an average of 13.8 points.

Most of this is just because the polling itself has been inaccurate, but the simpler approach used by Real Clear Politics average has done slightly better, missing by an average of 12.9 points instead.

In addition to comparing the FiveThirtyEight model with its competition, however, it is also worth looking at the standards it sets for itself. It does not claim to be all that accurate — but is it accurate about how inaccurate it is? (Although this might sound ridiculous, it is precisely the kind of thing that forecasters in fields ranging from economics to climate change need to spend more time thinking about.)

Our current forecast in Ohio is that Mr. Romney will get 31 percent of the vote there. But the confidence interval attached to the forecast (which represents 90 percent of the possible outcomes) is wide: it ran from 17 points to 42 points. The reason these intervals are so wide is simply because they are built from historical data, and this isn’t the first year that polls in primaries and caucuses have missed the mark.

What’s been unusual, however, is the way in which these errors have been related to the timing of the election. In the past, polls have gotten somewhat more accurate as we’ve approached Election Day, but the improvement has been gradual. This year, the polls have gone from quite bad to quite good almost literally overnight — typically about three days before the election.

The next chart provides a clear demonstration of this. It compares the actual error in the FiveThirtyEight model at points in time ranging to 25 days before the election against what the model thinks the error should be based on the historical data. Less technically, it compares the error in primary polls this year with that of past election cycles.

For instance, the model is supposed to miss the final margin by about 6 points when it issues a forecast on Election Day. The actual error — 2.8 points — has been much less than that.

That’s a good thing, right? Well, maybe. It probably makes my life easier. In the short term, however, it is probably just good luck. And in the long term, it could imply that the forecasts are underconfident — that the confidence intervals we publish are too wide and reflect more uncertainty about the outcome than there really is.

You would have just the opposite discussion, of course, if you looked at the forecasts we have issued 7 to 14 days in advance. Not only have they been inaccurate, but they have been even less accurate than claimed — and we don’t claim they are very accurate to begin with!

But this, too, arguably reflects a certain amount of “luck” (bad luck in this case). Over the whole 25-day period, the forecasts have been pretty well-calibrated: it’s just that the “hits” have tended to be concentrated on election night itself, and the misses have peaked a week or two beforehand.

This need not be so abstract a discussion, however. Can we come up with a better explanation than “luck” to describe what we are seeing in the polls?

If I had to use a word to describe the behavior of Republican voters so far in this race, it would be this one: indifferent. Their preferences between the various candidates have been very weak, shifting after seemingly every debate, primary and news cycle. Many Republican voters like two or more candidates equally well — or at least see two or more of them as being less bad than the others.

To be sure, it may also be that the polls are exaggerating this tendency. Most polls now, and especially automated polls, have very low response rates, only getting a small fraction of the people they call to stay on the phone. This may bias the polls in favor of candidates whose voters are most excited about the race on that particular evening. If you called a Rick Santorum supporter the day after Mr. Santorum won Minnesota and Colorado, he would probably be giddy and pleased to talk to you — whereas a Mitt Romney supporter would have been despondent and hung up the phone.

It is, of course, a good thing for a candidate to have enthusiastic supporters. But their voters need to be excited at the right time: on Election Day. Thus, this tendency to pick up on the most excited supporters may be a boon to pollsters immediately before an election, but can make the polls buggy beforehand.

I doubt this accounts for all or even most of the momentum shifts we are seeing, much of which is probably real, but I suspect it accounts for some of it. As I’ve mentioned previously, I’d be especially cautious about polls conducted in the 24 or 48 hours after a candidate achieves a big “win” in a primary or in a debate, especially if it is an automated survey.

And if you’re a Republican voter who does have a favorite candidate, I’d be even more cautious. If I were a Mitt Romney supporter, for instance, I’m not sure I’d be “rooting” for polls that showed Mr. Romney ahead in Ohio right now — something we have not seen yet but may see soon. That could set up exaggerated expectations for Mr. Romney in a state that is ultimately pretty tough for him and where his polls have usually lagged behind his national averages. Then, if Mr. Romney goes on to lose the state — and we have seen a number of states revert to the mean implied by their demographics as the election draws near — he could be viewed as “blowing” the lead when he never had a firm grasp on it to begin with.

Of course, Mr. Romney benefited from this very pattern in Michigan, a state where he had a dominant position in the polls until the final two weeks, and an underwhelming win looked like a triumph.

But if the pattern continues to repeat itself — polls alter and exaggerate expectations for a candidate, leading to inevitable disappointment when he does not live up to them — we may see the see-saw patterns of momentum continue in the race.

We’re going to continue publishing our forecasts — they are as useful a way to summarize the polls as anything else — but please regard them with caution, and remember that those wide confidence intervals are there for a reason.

Comments