A ‘Radical Centrist’ View on Election Forecasting

Ron Klain, the former chief of staff to Vice President Joseph R. Biden Jr., has a long critique up at Bloomberg about the piece that I published in The New York Times Magazine recently about President Obama’s re-election prospects. His article takes the view that campaign strategy and other intangibles are quite important in presidential campaigns and that statistical forecasts do a poor job of accounting for these.

Unfortunately, Mr. Klain’s article attributes to me a number of views that I am ambivalent about or actively disagree with, so it deserves a fairly long reply. I will also use this opportunity to respond to some criticisms that I have been receiving from political scientists. The irony is that I agree with Mr. Klain more than he realizes.

But let’s start with Mr. Klain’s central question: how much difference does campaign strategy make in determining the outcome of presidential elections?

Do all the ads, speeches, mailings, debates, online activity and rallies really change minds? Or is the outcome of the election the product of underlying fundamentals that are scarcely affected by such efforts?

This is obviously something of a false juxtaposition. It is extremely unlikely that campaigns don’t matter at all. Now and then, you’ll see a political scientist come fairly close to expressing this viewpoint, but that is certainly not the majority opinion within the discipline. The question, instead, is how much campaigns matter, and that is a difficult question to answer.

I strongly agree with Mr. Klain that political scientists as a group badly overestimate how accurately they can forecast elections from economic variables alone. I have written up lengthy critiques of several of these models in the past, which suffer from fundamental problems regardless of which variables they choose.

One of the things it took me a long time to learn about forecasting is that there’s a difference between fitting data to past results and actually making a prediction. A regression model built from historical data is really just a description of statistical relationships that existed in the past. The forecaster hopes or assumes that the relationships will also apply in the future, but there is often a significant deterioration in performance.

I’m not just talking about obvious examples of spurious correlation like that the winner of the Super Bowl was once a highly “statistically significant” predictor of the direction of the stock market. (In recent years, this indicator has performed badly.) The problems run a lot deeper than that, affecting many or perhaps even most of the statistical relationships documented in the peer-reviewed literature in some fields.

John P.A. Ioannidis, for instance, has described how most published research findings in medical journals cannot be replicated independently. Scott Armstrong of the Wharton Business School, who has devoted most of his life to studying prediction, has found analagous problems in the social sciences. My research into the Survey of Professional Forecasts suggests that actual economic data falls outside the 90 percent confidence intervals as claimed by economists somewhere between one-third or one-half of the time, meaning that they are extremely overconfident about the reliability of their forecasts.

Presidential forecasting models that rely on economic data are likely to be especially susceptible to these problems. Most of them are fit to data from a small sample of 10 to 20 past elections but have a choice of literally hundreds of defensible economic or political variables to sort through. Forecasters who are not conscientious about their methodology will wind up with models that make overconfident forecasts and that impute meaning to statistical noise.

I would not paint all the forecasters with the same brush. Two political scientists who I know have a very sophisticated understanding of these problems are Larry Bartels at Princeton and Robert Erikson at Columbia. Others, like Hans Noel, will publish models, but provide very explicit disclaimers about their limitations. But there others who tweak as many knobs as they can, and there are bloggers and reporters who take all of the results at face value and don’t distinguish the responsible forecasts from the junk science. The problem is made worse when a game show is made out of forecasting and everyone competes to see who can get the most overfit model published in a peer-reviewed journal.

A more tangible question is how well economic statistics alone can really predict elections. I have written previously that a good assumption is that they can explain perhaps 50 percent of the results. But based on some further research that I will soon publish, I suspect that estimate was too high, and that the answer is more like 30 or 40 percent when the models are applied to make real, out-of-sample forecasts. Economic variables that perform better than that over a small subset of elections tend to revert to the mean or even perform quite poorly over larger samples.

So say that 60 percent of elections cannot be explained by economic variables. Should all of the remaining credit go to campaigns?

No, of course not. First, the fact that widely available published economic statistics cannot explain more than about 40 percent of election results does not mean that the actual living and breathing economy cannot. The American economy is a very hard thing to measure. Gross domestic product was originally estimated to have declined at a rate of about 3.5 percent in the fourth quarter of 2008; revised data puts the decline at almost 9 percent instead. The government first reported that the economy had grown by 4.2 percent in the fourth quarter of 1977, but that figure was later revised to negative 0.1 percent. Using revised data can reduce the error to some extent, but there is quite a lot of intrinsic uncertainty of measurement. Some of the debates about why variable X is superior to variables P, D and Q are no more productive than debating the number of angels that can dance on a pinhead; the measurement error swamps any marginal gain that might be made from the choice of one reasonable variable over another.

Moreover, there are differences between what the statistics say about the economy and how Americans actually experience it. Some of these differences can be exploited by campaigns, but others fall into the category of being “unknown knowns”: things that are manifestly important but that we don’t have a good way to measure. Beyond the economy, likewise, there are other sorts of factors that campaigns may have little control over. Wars. Terrorist attacks. Earthquakes. Hurricanes. Sex scandals. Most of the attempts to translate these events into statistical variables have been quite silly. But that doesn’t mean the uncertainty they introduce into forecasting should be mistaken for the skill of a campaign. It’s not to Michael Dukakis’s credit that Gary Hart was dumb enough to get caught on a yacht with a swimsuit model.

Next, we have to make a distinction between candidates and campaigns. Sometimes a very appealing candidate runs a terrible campaign — Hillary Rodham Clinton comes to mind — or vice versa. Variables related to the candidates themselves are potentially easier to quantify than those related to campaign strategy.

One of those variables is the left-right ideology of the candidate, which I do include in my model and which political scientists have sometimes included in the past. Measuring ideology is not easy — although in practice it is probably no harder than accurately measuring the economy — but there are a few well-regarded methods for doing so. I have evaluated a couple, and they perform quite well according to statistical tests, even when taken in conjunction with factors like the president’s approval rating on Election Day or robust measures of economic performance.

To be sure, statistical tests may miscalibrate the impact of ideology just as they exaggerate the impact of particular economic variables. But there are strong theoretical reasons to believe that ideology matters, and there is moderately strong evidence from other contexts, like Congressional elections and elections in parliamentary systems, that it does. It does not seem plausible, meanwhile, as some political scientists’ models imply, that the difference between Representative Michele Bachmann or Mitt Romney would amount to only 1 or 2 points at the polls.

Another difficulty is that candidate ideology is correlated with other variables, like the length of the time that a party has been out of office, and those variables in turn have been correlated with election results. My view is that there are strong reasons to believe that ideology is in fact the causal factor — models that make the opposite assumption come up with some highly implausible results — but it is hard to know for sure when you’re dealing with highly correlated variables over small samples. We will be publishing more about this topic in the coming weeks.

Nevertheless, Michele Bachmann’s campaign would have to work with Mrs. Bachmann, while Mitt Romney’s would have to work with him; how much difference can their strategies make at the margin?

One of the more tangible examples of campaign strategy mattering was in 2008, where by a variety of measures Mr. Obama’s campaign (which Mr. Klain was a part of) overperformed by a net of about 3 or 4 points in swing states. In this case, Mr. Obama’s sound strategy was superfluous since he was likely to have won the campaign either way, but had the election been closer, it may have made a difference. Our models in 2008 generally found that Mr. Obama was about 5 percent more likely to win the Electoral College but lose the popular vote rather than the other way around.

My guess — and it’s just a guess — is that this may be as good an estimate as any at the effects that a well-run campaign might have. Perhaps a very well run campaign can improve a president’s chances of winning re-election by 5 or 10 percent. But who knows. We have already had an extremely wide array of outcomes in the various special and interim elections that have taken place around the country so far this year, and we’ve had a very wild Republican primary, suggesting that voter preferences may be more malleable than normal.

In fact, I discussed some of these effects in my article. I wrote, for instance, about a scenario in which the economic numbers might be relatively good, but Mr. Obama nevertheless would lose to Mr. Romney:

Romney is much different stylistically from Bush’s opponent, Bill Clinton, but both are skilled at driving an economic message. Romney would bring out his PowerPoint and seek to persuade voters that the growth had been too little and too late. After all, if killing bin Laden couldn’t lift Obama’s approval rating much above 50 percent, who knows whether one year of good-but-not-great growth would?

My article was full of these sorts of devil’s-advocate cases — reminders that it’s hard to forecast an election a year in advance, and that even when you get closer to it, things might not go according to the formula. Among other years, 1948, 1952, 1960, 1968, 1976, 1992 and 2000 were problematic for at least some of the model-based forecasts. More recently, a lot of “fundamentals” models badly underestimated Republican gains in the 2010 midterm elections.

These models are not that good, so my view is that if you’re going to build one, it ought to have a nice wide confidence interval that is designed to apply in the real world and not just in the software package. I also hold the view that one should switch to polling-based metrics sooner rather than later. These models are easier to calibrate, are less prone to overfitting (they have essentially just one variable: the polling average) and are far less presumptuous about why the electorate votes the way it does. They make the very reasonable assumption that voters will do a better job of explaining why they vote the way they do than can be inferred from a series of quasi-random economic inputs.

If polling-based models do a much better job of prediction, they sacrifice something in explaining elections, leaving some of Mr. Klain’s questions unresolved. At the same time, it is important to be aware of the elections in which no campaign would have changed the result. One of these is an example that Mr. Klain cites: 1984.

But, to use just one example, if cold, hard economic data were decisive in elections, then President Ronald Reagan, seeking re-election in 1984 when the economy was beset by a 7.5 percent unemployment rate, wouldn’t have won 49 states. After all, his successor, President George H.W. Bush lost by more than 200 electoral votes when he ran for re-election in 1992, with the jobless rate at 7.4 percent.

The unemployment rate may have been 7.5 percent when voters went to the polls to pick between Ronald Reagan and Walter Mondale. But it had declined from as high as 10.8 percent earlier in Mr. Reagan’s term. Moreover, economic growth was exceptional in both 1983 and 1984, with G.D.P. advancing at almost 6 percent in the election year.

One thing the statistical evidence is quite clear upon is that voters are reasonably forward-looking and weigh the rate of change much more heavily than how the economy is performing in an absolute sense. (Historically, the raw unemployment rate has been among the very worst predictors of election outcomes, while the change in job growth during the election year has been among the very best.) Mr. Reagan’s morning in America campaign seemed brilliant when the unemployment rate had fallen to 7 percent from 11 percent. The same message would have been ridiculous had the unemployment rate risen to 7 percent from 3 percent instead.

I apologize if some of this seems prickly. I lived through the Moneyball wars in baseball and then saw how much progress the sport made once everyone learned how much they had in common.

Baseball games, however, are played 162 times per year, so the learning process is accelerated. But presidential elections are held only once every 4 years, and we make the same mistakes over and over again. The outcome of the election isn’t especially predictable right now, but here are four predictions you can take to the bank:

1. Next year, the strategists of the winning campaigns will be praised as brilliant.
2. Next year, the strategists of the losing campaign will be blamed for a long series of mistakes.
3. Next year, some of the political science models will hit the outcome right on the nose.
4. Next year, some of the political science models will miss wildly in one direction or another.

Maybe one of the campaigns really will have made the difference; the forecasting models can tell us something about that, by the way. But it’s just as likely that a campaign that deserves praise for keeping the election to within 2 points when its candidate “should” have lost by 6 will get blame when it deserves credit. Meanwhile, I’ll be rooting for the models that apply more responsible forecasting practices, but most of how they perform over the next few elections will be determined by luck. Better if we acknowledge their limitations in advance.

FiveThirtyEight

A ‘Radical Centrist’ View on Election Forecasting

Comments