The 1988 presidential election was a victory for political science. Michael Dukakis had led George Bush in the polls for much of the spring and summer. But Mr. Bush had some fundamental advantages — the economy was sound, and he was the vice president to a popular incumbent, Ronald Reagan. As the fall came, Mr. Dukakis’s numbers wilted, and Mr. Bush captured a decisive victory, winning 426 electoral votes and taking the popular vote by about nine percentage points.
The election came at a time when political scientists and economists were interested in evaluating the relationship between economic performance — sometimes along with other “fundamental” (if harder-to-quantify) factors like war — and the fate of incumbent presidents.
This had long been something of a controversial subject. The elections of 1948 through 1968 had been a quirky lot. Foreign policy — the aftermath of World War II and the wars in Korea and Vietnam — had played an unusually important role. There were sometimes wide disparities in the strength of the candidates, with parties nominating figures as compelling as Dwight D. Eisenhower or as far from the center of the electorate as Barry M. Goldwater. There were other quirky circumstances: Lyndon B. Johnson and Harry S. Truman had succeeded deceased presidents in midterm and won a full term of their own, but refused to run for a third term even though they were eligible for one. The modern primary system was gradually developing, and television was becoming more widespread.
Whatever it was about this period, the relationship between elections and the economy seemed to be very weak — and, in fact, often seemed to run in the opposite direction of what we might expect. The incumbent party had won in 1948 and 1956 despite a middling economy but lost in 1952 and 1968 despite a pretty good one.
Nevertheless, some economists and political scientists asserted, there was a relationship between the economy and elections in the long term. In 1978, the Yale economist Ray C. Fair published a well-known paper that looked to data as far back as the 1890s to make the case. The relationship between the economy and elections wasn’t particularly strong — it could be overridden by war, scandals and other factors — and the quality of the data was mixed. But there seemed to be something there if one looked carefully enough.
Other political scientists looked toward data from other countries to expand the sample size. This produced interesting results; a lot of factors had to be controlled for to identify the relationship, but the ruling parties seemed to be having an easier time keeping hold of power when the economy was good.
Still, the notion that economic performance helped to predict elections remained controversial in some circles. Some papers claimed that there wasn’t any relationship at all and poked fun at the news media for inflating it.
The American elections of 1972 through 1988 went much better for the theory of economic voting. The incumbent party had won big victories with a sound economy in 1972, 1984 and 1988, while Jimmy Carter had been trounced by Mr. Reagan in 1980 with a bad one. The year 1976 was something of a mixed bag — polls fluctuated wildly, the economic numbers were all over the map, and Gerald R. Ford had to deal with the aftermath of Watergate. But the others had seemed pretty darned predictable, and they were better predicted by the fundamentals than by the polls — Mr. Carter and Mr. Reagan had been fairly close in surveys until late in the 1980 campaign, for instance, whereas economic factors suggested all along that Mr. Carter would lose.
During this period, there had been a couple of cases in which scholars like Dr. Fair published predictions about the outcome well in advance of the election. They had gone well enough — Dr. Fair’s forecasts were fairly close to the mark in 1980, 1984 and 1988, for instance.
Suddenly, the science of election forecasting became a hot topic and the number of models proliferated. Instead of a model here and there, six of them were published in advance of 1992 and then 12 in 1996.
By now, the doubt has pretty much erased itself — probably to an unhealthy extent. It is often asserted that elections are easy to predict and that the economy decides most of them. The publisher’s description for Lynn Vavreck’s excellent 2009 book, “The Message Matters,” for instance, made the following claim:
The economy is so powerful in determining the results of U.S. presidential elections that political scientists can predict winners and losers with amazing accuracy long before the campaigns start.
To be clear, that is the publisher’s copy and not Ms. Vavreck’s. However, statements like these have become fairly common, especially among a savvy group of bloggers and writers who sit at the intersection of political science and the mainstream media (a space that this blog, of course, occupies).
But is it true? Can political scientists “predict winners and losers with amazing accuracy long before the campaigns start”?
The answer to this question, at least since 1992, has been emphatically not. Some of their forecasts have been better than others, but their track record as a whole is very poor.
And the models that claim to be able to predict elections based solely on the fundamentals — that is, without looking to horse-race factors like polls or approval ratings — have done especially badly. Many of these models claim to explain as much as 90 percent of the variance in election outcomes without looking at a single poll. In practice, they have had almost literally no predictive power, whether looked at individually or averaged together.
I have made theoretically grounded critiques of these models before — but I had never gone back and looked at how well they actually did. Nor, to my knowledge, has anybody else done so in a comprehensive way. There have usually been postmortems after every election cycle, but we now have quite a lot of data — nearly 60 forecasting models published by political scientists or economists in advance of the 1992 through 2008 elections. We should at least be able to get some basic sense for whether these models are as accurate in practice as they claim to be in theory.
I will be going through these models one election cycle at a time, but I need to articulate a few ground rules first. If you
don’t care about this detail, scroll down past the bullet points.
First, the key is that these models were published in advance of the election. It’s not a prediction if you already know what happened. These modelers have a lot of choices to work with — literally millions of plausible combinations of economic variables, alongside other factors like polls, variables to indicate wartime and peacetime, incumbency, and so forth. It’s easy to fit these past data well just by testing out one specification after another until you come across a lucky one, like the robber who depresses every buzzer in a 15-story apartment complex until someone lets him in. It’s much harder actually to make good predictions. Everything I’m looking at here was an actual prediction — not a “retrodiction” that was published after the fact.
Second, all of the forecasts were published by academic experts — specifically, by economists or political scientists. Most were presented at the annual meeting of the American Political Science Association, which takes place around Labor Day. A few others were published in a journal but not presented at the meeting, while some were published at the academic’s personal or professional Web site — the idea is to be comprehensive. Most of the forecasts were published based on data available as of about August of the election year. A few were published sooner than that, and a couple were published later — one or two in very early October. But July, August and September were the peak period. If a forecaster regularly updated his forecast over the course of the election (which is a good habit), I use his forecast from this July-to-September period to keep everyone on a level playing field.
Third, I evaluate all published versions of a forecast. It’s fairly common for a forecaster to publish multiple models — and they don’t always produce similar results. There’s nothing inherently wrong with publishing different versions of a forecast. But one “trick” that some of the forecasters use is to highlight the version of the forecast that seems to match the polls or the consensus view, while burying the others in the fine print. One forecaster in 2008, for instance, published four versions of his model, two of which showed a clear advantage for John McCain and the other two for Barack Obama. Since there was abundant evidence by the late summer that Mr. Obama was the favorite, he recommended that people look at the pro-Obama versions. However, he was presumably doing so because he was aware of factors like polls that he hadn’t originally deemed useful to include in his model. We treat all these versions equally: if it was published and has your name on it, it counts. But I do not go back and retrofit models that were discontinued and ceased to publish new forecasts.
Fourth, there were some cases in which the models included variables that were not yet known at the time — for instance, the economic growth rate in the fourth quarter of the election year. In each case, however, the forecaster pointed to what he believed to be the most likely value for this unknown variable and published a prediction accordingly. This was the figure I used — what the forecaster published at the time — and not the one based on data that became available after the fact. You should be aware, however, that economic data can be revised significantly. The forecasters tend to skirt this issue in their presentation of the models, which may be problematic since some economic series are revised in predictable ways and since it is often not clear whether their models are meant to be applied to original or revised data.
Finally, I’ve placed the models into two broad groups. The ones I call “fundamentals” models do not look at any horse-race data — meaning, no approval or favorability ratings, no head-to-head polls, and no primary results — and claim to make a good forecast despite this. The models that I call “horse-race” models do include polls or (much less commonly) primary results alongside economic variables and other factors. The “fundamentals” models are highlighted in yellow in the charts that you will see below.
Let’s begin our review in 1992, when Bill Clinton beat the incumbent, George Bush.
In the chart, I’ve listed what looks to be a number of different columns for each model, but really they’re all getting at the same thing: How close was the model to the actual result?
The industry standard is to publish a forecast of something called the “incumbent party two-way vote share,” which is the percentage of votes the incumbent party’s presidential candidate gets with third-party candidates excluded. In 1992, for instance, Mr. Bush got 46.6 percent of the two-way vote, while Mr. Clinton got 53.4 percent, ignoring Ross Perot’s vote.
I’ve always found this choice to be counterintuitive; most of us might know that Mr. Clinton defeated Mr. Bush by a margin in the mid-single digits, but people rarely talk about an incumbent’s two-way vote share. So I’ve translated each forecast into a projected margin of victory for the incumbent candidate, treating the third-party vote as a known variable. These two versions are mathematically identical for all intents and purposes. Note, however, that the error will be expressed to be about twice as large by the second version. If you miss high on the Democrat’s vote share by four percentage points — which doesn’t sound that bad — most of that will have gone to the Republican instead, so you will have missed the margin between the candidates by about eight points (the difference between a toss-up race and a borderline landslide).
You’ll also see a calculation called the R.M.S.E., or root-mean-square error, sometimes also called the standard error. This is the type of error that these models are trying to minimize, so it is the most appropriate measuring stick. The standard error is slightly larger than the average error. However, it is only about half as much as the margin of error. The six models in 1992 had a standard error of 7.5 points in predicting the incumbent party’s margin, versus an average error of 5.6 points and a margin of error of 14.6 points.
Those numbers might sound fairly high, and they are. A couple of models came quite close to the result in 1992, but others missed badly; Dr. Fair’s, for instance, projected a nine-point win for Mr. Bush, when Mr. Clinton won by six points instead, an error of 15 points.
The error figures for Dr. Fair’s model and one other are highlighted in blue because they fell outside the 95 percent confidence interval as claimed by the forecast, something that should happen only one time in 20. In 1992, however, two of the six models failed by this test.
The other thing to notice is that these models weren’t in very much agreement with one another. Despite relying on much of the same data (and having “retrodicted” nearly identical and nearly perfect results for past elections), they projected everything from a nine-point win for Mr. Bush to a six-point win for Mr. Clinton.
This is not a very good start: l
ittle consensus among the models, a high standard error, and much less accuracy than claimed. The 1992 election was quirky in some ways, however, because of the presence of Mr. Perot and because the economic data was mixed; G.D.P. grew, but jobs did not.
Few such excuses could be made for the next election, in 1996, an easy win for Mr. Clinton in what was perhaps the least eventful election cycle of our lifetimes. Unfortunately, the models had many of the same problems.
The spread in the models this year was even wider: everything from a 13-point win for Clinton to an 8-point win for Dole was forecast. Nine of the 12 models correctly predicted a Clinton win, but with that kind of spread, their overall performance wasn’t much better: a standard error of 6.6 points.
Since the misses were in opposite directions, taking an average of all 12 models would have given you a pretty good answer. But that’s mostly because the horse-race models had a good year and were doing the hard work; three of the four “fundamentals” models missed badly (and called for a Dole win) despite what seemed like favorable circumstances for Mr. Clinton, given the peacetime and a good economy.
It was 2000, however, when the models had their worst year:
This was the only election of the five in which the models all picked the same winner — all had Al Gore defeating George W. Bush. And Mr. Gore did win the popular vote by half a percentage point. The problem was that the margin wasn’t large enough to allow him to prevail in the Electoral College. That wasn’t supposed to happen, according to the models; instead, most had predicted a clear win for him, with two calling for him to win by almost 20 percentage points. Although one or two models, like Dr. Fair’s, had a decent result, these cases were the exceptions, and the standard error for the 10 models that year was 11.3 points.
The big miss in 2000 sparked a lively debate in the literature — I would recommend the papers by Larry M. Bartels and Morris Fiorina on the subject. Nevertheless, the forecasters continued on undeterred; in fact, a couple of new forecasts jointed the ranks in 2004.
That year was a little better than 2000 — only one of the 15 models called for a John Kerry win, with the rest predicting Mr. Bush’s re-election. The problem is that these models judge themselves by how close they come to the incumbent candidate’s vote share or margin of victory, and a number of them envisioned a much clearer win for Mr. Bush than he actually received.
In fact, the model that called for a Kerry win (by two-tenths of a percentage point) was one of the better ones given how close the vote was; five others expected Mr. Bush to triumph by double digits. The “fundamentals” models were especially likely to miss high, since they ignored polling that showed a close race all year. The standard error for the 15 models was 7.1 points, similar to 1992 or 1996.
There were even more models in 2008 — 16 of them, counting different versions by the same forecaster. But this didn’t increase the degree of consensus. Instead, the divide in the models was especially wide, with different “fundamentals” models predicting everything from a 7-point win for John McCain to a 16-point win for Barack Obama. Six of the 16 models produced results that fell outside of their 95 percent confidence intervals, mostly to the high side of Mr. McCain’s vote. Their standard error was 7.8 points.
That does not seem to me like amazing accuracy. In every election, some models did well but others did badly, with each cycle featuring at least one model that missed the vote margin by 13 or more points.
In total, 18 of the 58 models — more than 30 percent — missed by a margin outside their 95 percent confidence interval, something that is supposed to happen only one time in 20 (or about three times out of 58).
Across all 58 models, the standard error was 8 points of vote margin or 4.6 points of incumbent vote share. That was much larger than the error that the models claimed they would have — about twice as large, in fact.
The good news is that one group of models was considerably better than another, so at least there is something to learn from this.
What does not seem to work as well is making forecasts based solely on “fundamentals” like economic data without also including polls. The models that ignored polls had a standard error of 9.8 points in predicting the vote margin (or a simple average error of 8.1 points and a margin of error of nearly 20 points). The “horse race” models were about 40 percent more accurate, with a standard error of 6.9 points and a simple average error of 5.6 points.
The “fundamentals” models, in fact, have had almost no predictive power at all. Over this 16-year period, there has been no relationship between the vote they forecast for the incumbent candidate and how well he actually did — even though some of them claimed to explain as much as 90 percent of voting results.
Averaging the fundamentals models together reduces error slightly but not really enough to make them useful; it improves their explanatory power only to 7 percent from 4 percent. (We will explore this topic of model averaging more in the next installment of this series.)
The horse-race models haven’t been especially accurate, but they have at least gotten us somewhere. The typical horse-race model has explained about 50 percent of voting results during this period (and closer to 70 percent when you take a consensus of them).
There is still some more work to do here — comparing these models against how you would have done by looking at polls alone, looking at simple and unadorned economic variables like G.D.P., or even comparing the models against naïve forecasting techniques like expecting the vote to be split 50-50. The results are not going to be flattering, especially for the fundamentals models. That will be coming in the follow-up article.
However, I do want to pre-empt one potentially in
correct interpretation of this evidence.
I do not mean to suggest that the economy does not matter to elections, or that there is no predictive content in looking at economic variables. As this experiment should show you, the economy assuredly does not account for 90 percent of voting results. But it may well account for half of them. That doesn’t mean these effects are easy to quantify, but you can probably get somewhere — perhaps explaining about 40 percent of election results — by using more sensible techniques. That’s still enough to give a huge tailwind to an incumbent running in a good economy, and represents a big problem for a candidate running into a recession.
To be sure, 40 percent is also much different than 90 percent. (And even that 40 percent will be hard to achieve given the uneven quality of economic data and economic forecasts — it’s more a ceiling than a floor.) I have certainly seen cases of political scientists going too far in playing down the effects of things like campaigns and candidate quality or ideology, but there is a lot of rich research in this area, and the forecasting models described here reflect just a small portion of it.
Why have the fundamentals models had zero predictive power instead of even 40 percent? To be blunt, a lot of them are badly designed. They come up with illogical combinations of variables that fit the noise in the data rather than the signal and that are not robust to the difficulties in measuring the economy. Often, these forecasters are trying to maximize the fit in their software package in ways that directly trade off with predictive accuracy in the real world and that risk undermining their hypothesis.
Some of the models are more sensible. The forecasts made by Robert S. Erikson and Christopher Wlezien, for instance, are done very well and do a good job of accounting for pertinent information without resorting to data-dredging.
The broader point is that we can get into trouble when we exaggerate how much we know about the future. Although election forecasting is a relatively obscure topic, you’ll see the same mistakes in fields like finance and earthquake prediction in which the stakes are much higher.
The book I’ve been working on is all about the mistakes made by forecasters and how to improve upon them. Examples like these of predictions failing badly in the real world are very common, and forecasts that seem too good to be true usually are.
This year looks to be one of those complicated elections, and already the models are producing wildly diverging results — everything from a Republican landslide to a lock-solid Obama win. Be careful when you see these forecasts; the most confident-sounding predictions are often the most likely to fail.