A First Iowa Forecast: Race Is Still Wide Open

We’re a few days away from releasing the official version of our polling-based forecasts for the Iowa caucuses and New Hampshire primary. But the statistical work behind the model is done, so I can give you a preview of what the forecasts will look like. Conveniently, this will also serve as an overview of the current state of play in Iowa.

The most important ground rule is that the forecasts are based on one type of information and one type of information only: state-by-state polls. You are absolutely welcome to consider other objective and subjective factors in addition to the state polls. We will continue to discuss those factors on this blog. We think these forecasts provide a valuable perspective but that doesn’t mean that you should ignore everything else. However, building a well-calibrated forecast from the polls is challenging enough given the uncertainties inherent in primary polling, and so that’s what we’ve decided to focus on rather than anything fancier.

Over all, the model is much simpler than something like our presidential or Senate race forecasts. It will help if we look at some real data: here is the model’s current Iowa forecast, which incorporates the new polls from Public Policy Polling and Insider Advantage that were released Tuesday:

The first step in the forecast is the Weighted Polling Average. As is the case with our other forecasting products, polls are weighted based on sample size, how recently the poll was conducted, and a pollster rating that accounts for the past accuracy of each polling firm as well as its methodological standards.

What’s a little different about these forecasts is that one of these factors — how recently the poll was conducted — really dominates everything else. We’ve analyzed literally thousands of primary and caucus polls dating to 1972, and what we’ve found is that you optimize forecast accuracy by being extremely aggressive about trying to identify the current trend. In the late stages of a primary or caucus race, a week is an eternity and even a couple of days can be meaningful.

This characteristic is especially relevant in this instance since there seems to have been a shift in the Iowa polls over the last week or so. In particular, the candidates who have been leading nationally, Newt Gingrich and Mitt Romney, have lost a bit of ground while other candidates are gaining. Our current polling average puts Mr. Gingrich at 25 percent and Mr. Romney at 15 percent of the vote, as compared to 29 and 17 percent at the Real Clear Politics average (which is less aggressive about weighting recent polls). Conversely, our numbers are a little higher for the other five candidates, including Ron Paul.

The model next calculates the Reallocated Average, which takes undecided voters and allocates them among the viable candidates. The allocation is a compromise between dividing these votes evenly among the candidates and dividing them proportionately. A small portion of votes are reserved for minor candidates, write-in candidates, uncommitted delegate slates and so forth, so the total among the viable candidates will be close to 100 percent but usually not exactly 100 percent.

The next step in the process is what we call Momentum. It looks at the near-term trend in a candidate’s polling and assigns a small bonus or penalty based upon it. In this case, for instance, Mr. Gingrich gets a penalty of about 2 points because his polls have sagged slightly, while Mr. Paul gets a 1 point bonus.

This momentum factor is something that requires a lengthier explanation, but consider the following: Suppose that a candidate polled at 15 percent last week and then polled at 20 percent this week. What is the best guess for what his or her polls will look like next week?

There are three answers to this question, each of which are entirely sensible:

(i) You might say that you’d expect the candidate’s polls next week to be below 20 percent but above 15 percent on the theory that they will revert to the mean.

(ii) Or, you could say that 20 percent exactly is the best guess, which hypothesizes that polls follow a random walk.

(iii) Finally, you might guess that the polls will be somewhat higher than 20 percent on the theory that there is momentum or positive serial correlation in the polls.

In general elections, the “right” answer to this question is probably somewhere between (i) and (ii). General election polling data is quite stable — voters have a lot of information about the candidates and perhaps 80 percent of them are going to vote based on party identification alone. There is often reason to be suspicious of near-term trends in general election polling data, especially after events like conventions, which frequently produce short-lived bounces.

In primaries, on the other hand, we find that the right answer is probably somewhere in between (ii) and (iii): that is, there is some slight evidence for momentum. I would urge you not to overdo this: if you chase down every trend, sometimes you’ll get burned by a small sample size or a poorly conducted poll. But if there are several different polls that are telling you the same thing, it can be proper to speak of “momentum.” The trend will fairly often dissipate or prove to be transient or even reverse itself. But slightly more often, it will continue in the same direction.

Our momentum factor is based not just on the magnitude of the trend but also how robust the polling evidence is for it. In this case, a few different polls showed a decline in Mr. Gingrich’s numbers, although they are of mixed quality; the model assigns him a penalty, but not a gigantic one.

The Vote Projection is the sum of the Reallocated Average and the Momentum factor and represents the single best guess of how a candidate will perform on Election Day. Right now, our best guess is that Mr. Gingrich will get 25 percent of the vote in Iowa, Mr. Paul about 21 percent, and Mr. Romney 16 percent, with two other candidates in the double-digits.

But how precise are these projections? By design, they are not very precise, because primary polling is not very precise. It is foolish to pretend otherwise, especially in the early stages of a race when voters are still trying to get a handle on the candidates and pollsters are still trying to get a handle on the voters.

However, we can take some steps toward measuring and quantifying this uncertainty, as you’ll see in the next chart.

For each candidate, in addition to the projected vote, we’ve also provided what we call the Low Range and the High Range. These represent the 5th percentile and the 95th percentile respectively of the probability distribution (or in other words the 90 percent confidence interval). A candidate’s actual vote total should be within this range 9 times in 10 and outside of it in 1 of 10 cases.

Note that the confidence intervals are extremely wide. In Mr. Gingrich’s case, for instance, the 90 percent confidence interval runs from 9 percent of the vote to about 40 percent of the vote. This is simply a reflection of how inaccurate primary polls have been in the past, as well as the wide range of developments that are possible over the final three weeks of the campaign. (If I were to show you the confidence intervals for a state like South Carolina or Florida, where the results will be affected by what happens in Iowa and New Hampshire, the confidence intervals would need to be even wider.)

You might also note that the confidence intervals are somewhat asymmetric, especially for the candidates lower in the polls. For instance, Jon M. Huntsman Jr. is projected to get 6 percent of the vote, but his confidence interval runs from 0 to 15 percent. Again, this is a reflection of how these polls have behaved historically.

The final step in the calculation is Win Probability, which combines the vote projections and the uncertainty together to estimate how likely each candidate is to win the caucuses. Right now, Mr. Gingrich’s win probability is just under 50 percent, but Mr. Paul has gained ground and now has a 28 percent chance of winning. Mr. Romney, Gov. Rick Perry of Texas and Michele Bachmann are all between about 5 and 10 percent, while Rick Santorum has about a 2 percent chance of winning based on the current surveys.

Thus, although Mr. Gingrich still has the lead, Iowa looks to be fairly wide open with as many as five plausible winners. It should be a fun few weeks.

FiveThirtyEight

A First Iowa Forecast: Race Is Still Wide Open

Comments