This is the finale of a four-part series (Part I, Part II, Part III) evaluating the utility of early presidential primary polls as forecasting instruments. My contention is that these polls have enough predictive power to be a worthwhile starting point for handicapping a field of candidates. In this article, we’ll see what they have to say about the Republican contenders for 2012.
Here is a chart summarizing the 28 scientific polls that have been conducted on the Republican field since the start of the year, covering a total of 23 different candidates or prospective candidates. (For the ground rules used to assemble this data, see Part III).
Name recognition figures are mainly taken from Gallup, and reflect an average of all of Gallup’s surveys since the start of the year. The exceptions are a handful of relatively obscure candidates whom Gallup has not yet polled on — in those cases the name recognition figures are estimates, and are indicated in red in the table. (Some of the polls were conducted in multiple versions with varying lists of candidates; that’s why the table shows, for example, that Mike Huckabee was included in 26.2 polls out of 28.)
Our first model for translating this polling data into probabilities works as follows.
I’m calling this the Classical Model, since it’s a little bit more elegant than an alternative method that we’ll examine later on. Divide a candidate’s polling average by name recognition, and you have a pretty decent benchmark for the candidate’s upside.
One thing that stands out is that this method gives the leading candidate, Mitt Romney, is given only about a one-in-four chance of winning (more precisely, a 27 percent chance).
How unusual is that? Have there been other races in the modern (post-1972) primary era that were more wide open? Here’s how this method would have designated a favorite in past election cycles:
The current Republican race is, by some margin, the most wide-open in the modern era on the G.O.P. side, but there are a couple of comparable examples if you look at the Democrats. The model would have had Scoop Jackson as the nominal favorite to win the Democratic nomination in 1976 — but still would have given him only a 20 percent chance. Michael Dukakis in 1988 (26 percent chance of winning) and John Kerry in 2004 (29 percent) were in the same range as Mr. Romney is now, though for different reasons — their polling wasn’t quite as strong as Mr. Romney’s, but they were doing it with considerably lower name recognition.
That brings me to the second point. What makes the 2012 Republican race unusual is not that there isn’t much of a frontrunner at this point — that’s happened before — but rather that both the high-recognition and low-recognition names are underwhelming.
On the one hand, while Mr. Romney’s numbers and Mike Huckabee’s are considerably better than Sarah Palin’s or Newt Gingrich’s, they both fail to crack 20 percent in the polling average despite very wide name recognition. Both are also polling lower now than at the end of the 2008 campaign, in which Mr. Romney ultimately wound up with 22 percent of the Republican primary vote and Mr. Huckabee 21 percent.
On the other hand, there’s no sign yet of a breakout candidate from the low-recognition group. Tim Pawlenty’s name recognition has improved more than any other Republican candidate since the start of the year — it’s increased to 49 percent from 39 percent, according to Gallup — but that hasn’t translated into any additional support in the horse race polling, where his numbers have been stuck at about 4 percent all year. The same holds for Mitch Daniels — and with Mr. Daniels there’s the added complication that he might not run at all.
This method is also not very enamored of Donald Trump, although that is partly because he was not included in many of the polls at the start of the year, and the model scores those as zeroes.
That effect becomes clear if we use the same methodology but exclude the polls conducted before April 1:
That pushes Mr. Trump up considerably. Then again, though, there were reasons why pollsters did not include Mr. Trump in surveys early in the year: it was not clear whether he would run, or take the campaign seriously if he did. And now, indeed, Mr. Trump’s rise in the polls seems to be reversing.
There’s another method of evaluating the race that is even more dismissive of Mr. Trump’s chances. In this version, I break a candidate’s polling average into two factors:
This model treats name recognition as a separate variable, rather than meshing it together with a candidate’s polling average. So it fits a three-variable regression model.
It turns out that one of the more potent predictors of success in past primary races was simply how frequently a candidate’s name was included in the early polls. Although there have been winning candidates in the modern era, like Bill Clinton, who waited until quite late in the process to officially declare that they were running, there haven’t been any who were not laying the groundwork for a run quite early on, to the point that they were routinely included in the polls. It’s not so easy to make up for lost time if you’ve dawdled rather than hire staff, cultivate elite support, brush up your media skills and so forth. Being included in a poll in the early going is an indication that you are in fact doing those things.
Under this method, which treats inclusion in polls from the start of the year as something close to a prerequisite for winning the nomination, candidates like Mr. Pawlenty and Mr. Daniels do considerably better, while Mr. Trump’s chances look considerably worse:
I call this the Aggressive Model because it can deviate quite a bit more from the horse race numbers — although it’s more in line with how political scientists like Jonathan Bernstein and Brendan Nyhan, who place more emphasis on factors like elite support, think about the race.
Here, then, is the optimistic case for Tim Pawlenty — what the Aggressive Model would say if it spoke in English rather than statistics.
1. Mr. Pawlenty is definitely running, and has been preparing to do so for a long time now — which is true of surprisingly few candidates.
2. His lack of popular support certainly is problematic — and is only partially excused by his relative lack of name recognition. But all of the candidates have their problems, so he looks pretty decent by comparison.
One of the reasons I was skeptical of Mr. Pawlenty early on is that there seemed to be a lot of potential candidates who might fill the same niche, as a “safe” consensus choice acceptable to both moderates and conservatives. But John Thune isn’t running; Mike Pence isn’t running; Haley Barbour isn’t running. There’s no sign of Jeb Bush, Rick Perry, or Chris Christie. Mitch Daniels might run — but he doesn’t have any more popular support than Mr. Pawlenty, and he is several months, at the very least, behind Mr. Pawlenty in his preparations. Jon Hunstman might run, but he’s got a variety of positions that are going to make him unpopular with conservatives — whereas Mr. Pawlenty is positioned pretty close to the center of the Republican primary electorate.
However, while the Aggressive Model does have some theoretical appeal — and while it fits the historical data a tiny bit better than the Classical Model — it presents some potential issues. It really goes all-in on the assumption that a candidate cannot win unless he or she starts making preparations very early on, to the point of being considered viable enough by pollsters to be included in their surveys.
While it is true that no winning candidate in modern times has violated that paradigm, the data is not all that robust — just 15 nominally competitive primary races since 1972, of which only a handful have been as competitive as this one. That probably isn’t enough to rule out the possibility that a late entrant could run away with things, and the Aggressive Model may be a bit overfit, meaning that it describes the historical data well but could be sub-par at making predictions.
So I think these two models work best when viewed in tandem.
For that matter, just as we did with the Classical Model, we can also run a version of the Aggressive Model based solely on polling data from April 1 onward:
Let’s summarize these models and compare their results with the current betting lines at Intrade, a political futures market that captures the bettors’ view of the candidates’ current chances.
We can see some differences between our polling-based models and Intrade on several candidates:
The value of an approach like this is not that these models are infallible. Instead, they’re a pretty rough cut, as revealed by the fact that relatively small changes in methodology can produce large shifts in the chances attributed to candidates like Mr. Trump or Mr. Pawlenty.
My contention, though, is that we’ll both do a better job of handicapping and will have more productive conversations about the primaries if we start with the assumption that the polls tell us something rather than nothing.
(Stated far more technically, the polls are useful enough to serve as good Bayesian priors).
You want to argue that Jon Hunstman is a more likely Republican nominee than Mike Huckabee? That’s fine. But know that, in the past, candidates who have polling numbers like Mr. Huckabee’s have had a pretty good shot at their nominations, while those with Mr. Huntsman’s profile have faced much longer odds — not just a little bit longer, but a lot longer. Maybe you can still win the argument, but it raises your burden of proof.