Presidential elections are high-stakes affairs. So perhaps it is no surprise that when supporters of one candidate do not like the message they are hearing from the polls they tend to blame the messenger.
In 2004, Democratic Web sites were convinced that the polls were biased toward George W. Bush, asserting that they showed an implausible gain in the number of voters identifying as Republicans. But in fact, the polls were very near the actual result. Mr. Bush defeated John Kerry by 2.5 percentage points, close to (in fact just slightly better than) the 1- or 2-point lead that he had on average in the final polls. Exit polls that year found an equal number of voters describing themselves as Democrats and Republicans, also close to what the polls had predicted.
Since President Obama gained ground in the polls after the Democrats’ convention, it has been the Republicans’ turn to make the same accusations. Some have said that the polls are “oversampling” Democrats and producing results that are biased in Mr. Obama’s favor. One Web site, unskewedpolls.com, contends that even Fox News is part of the racket in what it says is a “trend of skewed polls that oversample Democratic voters to produce results favorable for the president.”
The criticisms are largely unsound, especially when couched in terms like “oversampling,” which implies that pollsters are deliberately rigging their samples.
But pollsters, at least if they are following the industry’s standard guidelines, do not choose how many Democrats, Republicans or independent voters to put into their samples — any more than they choose the number of voters for Mr. Obama or Mitt Romney. Instead, this is determined by the responses of the voters that they reach after calling random numbers from telephone directories or registered voter lists.
Pollsters will re-weight their numbers if the demographics of their sample diverge from Census Bureau data. For instance, it is typically more challenging to get younger voters on the phone, so most pollsters weight their samples by age to remedy this problem.
But party identification is not a hard-and-fast demographic characteristic like race, age or gender. Instead, it can change in reaction to news and political events from the party conventions to the Sept. 11 attacks. Since changes in public opinion are precisely what polls are trying to measure, it would defeat the purpose of conducting a survey if pollsters insisted that they knew what it was ahead of time.
If the focus on “oversampling” and party identification is misplaced, however, FiveThirtyEight does encourage a healthy skepticism toward polling. Polling is difficult, after all, in an era in which even the best pollsters struggle to get 10 percent of households to return their calls — and then have to hope that the people who do answer the surveys are representative of those who do not.
So perhaps we should ask a more fundamental question: Do the polls have a history of being biased toward one party or the other?
The polls have no such history of partisan bias, at least not on a consistent basis. There have been years, like 1980 and 1994, when the polls did underestimate the standing of Republicans. But there have been others, like 2000 and 2006, when they underestimated the standing of Democrats.
We have an extensive database of thousands of polls of presidential and United States Senate elections. For the presidency, I will be using all polls since 1972, which is the point at which state-by-state surveys became more common and our database coverage becomes more comprehensive. For the Senate, I will be using all polls since 1990.
The analysis that follows is quite simple. I’ll be taking a simple average of polls conducted each year in the final 21 days of the campaign and comparing it against the actual results. There are just two restrictions.
First, I will be looking only at polls of likely voters. Polls of registered voters, or of all adults, typically will overstate the standing of Democratic candidates, since demographic groups like Hispanics that lean Democratic also tend to be less likely to turn out in most elections. (The FiveThirtyEight forecast model shifts polls of registered voters by 2.5 percentage points toward Mr. Romney for this reason.)
Second, the averages are based on a maximum of one poll per polling firm in each election. Specifically, I use the last poll that each conducted before the election. (Essentially, this replicates the methodology of the Real Clear Politics polling average.)
Let’s begin by looking at the results of national polls for the presidential race.
In the 10 presidential elections since 1972, there have been five years (1976, 1980, 1992, 1996 and 2004) in which the national presidential polls overestimated the standing of the Democratic candidate. However, there were also four years (1972, 1984, 1988 and 2000) in which they overestimated the standing of the Republican. Finally, there was 2008, when the average of likely voter polls showed Mr. Obama winning by 7.3 percentage points, his exact margin of victory over John McCain, to the decimal place.
In all but three years, the partisan bias in the polls was small, with the polling average coming within 1.5 percentage points of the actual result. (I use the term “bias” in a statistical sense, meaning simply that the results tended to miss toward one direction.)
The first major exception was 1980, when late polls showed Ronald Reagan leading Jimmy Carter by only two or three percentage points on average — but Mr. Reagan won by almost 10 points. There were some complicating factors that year: the first and only debate between Mr. Carter and Mr. Reagan was held very late in the election cycle, perhaps too late to be captured by the polls. In addition, that race had a third-party candidate, John Anderson, and independent, and third-party candidates contribute significantly to polling volatility. And some private polls of the campaign showed Mr. Reagan with a much wider advantage.
Still, it is hard to make too many excuses for the polls: 1980 was probably the worst year for them since 1948, when the Gallup poll favored th
e Republican candidate, Gov. Thomas E. Dewey of New York, but the Democratic incumbent, Harry S. Truman, won instead.
In 1980, the miss was in Mr. Reagan’s favor, meaning that the polls had a Democratic bias. But you do not have to go back to 1948 to find a year when they had a Republican bias instead. In 2000, national polls showed George W. Bush winning the popular vote by about three percentage points — but Al Gore narrowly won the popular vote.
The other year in which the polls were reasonably poor was 1996, when most of the national polls projected Bill Clinton to win re-election by double digits, but he defeated Bob Dole by 8.5 percentage points. The results received little attention since Mr. Clinton’s victory was not in any real doubt before or after the election. But the polls had a Democratic bias that year, as they had in 1980.
Over the long term, however, the polls have been about as likely to miss in either direction. Since 1980, they have overestimated the Democratic candidate’s margin by an average of 0.9 percentage points, and by a median of 0.3 percentage points. These errors are so modest that they cannot really be distinguished from statistical noise.
We can also look for signs of bias in the state-by-state presidential polls. Since 1948, there have been 146 states that have had at least one poll conducted in the final three weeks of the campaign.
I took the average of late polls in each state, using the same rules as for the national polls (one poll per firm, and only likely voter polls). Then I took the average of these state polling averages, comparing them against the actual results in states where there was at least some late polling.
The state polls do not eliminate the problem for 1980. In the six states where there were late polls, Mr. Reagan led by an average of 3 percentage points — but he won by a much wider margin, 12 percentage points, on average in these states.
In 1996, however, the state polls did not show any bias toward Mr. Clinton, even though the national polls did. This is one reason why we say that state polls can be informative about the national campaign. Sometimes, a “bottom-up” strategy of adding the results from individual states will produce a better estimate of the national popular vote than national polls do themselves.
Similarly, in 2000, the state polls were less biased than the national polls. They underestimated Mr. Gore’s standing by about one percentage point on average, better than the three-point Republican bias in the national surveys. (Ironically, the speculation before the 2000 election was that Mr. Gore might win the Electoral College despite losing the popular vote — exactly the opposite of what happened.)
Over all, the state polls have had little bias. Since 1972, they have overestimated the standing of the Democratic candidate by an average of half of a percentage point.
We can also evaluate whether there was bias in the polls of Senate races. In some ways, this is a much richer data set, since there are different candidates and different conditions in each of the 33 or 34 states that hold Senate contests every two years. If there is a persistent Democratic or Republican bias in the polls that transcends fluke circumstances, we might expect it to show up in the Senate data.
As in the case of presidential polls, there have been years in which most of the Senate polls missed in the same direction. Senate polls had a Democratic bias in 1992 and 1994 but a Republican bias in 1998, 2000 and 2006.
(A Republican bias, although it was very modest, shows up in 2010. The two Senate races that the FiveThirtyEight forecasts “called” wrong in 2010 were Colorado and Nevada, where the polls had Republicans as favored but where Democrats won instead.)
But as in the case of the presidential polls, the years in which the Senate polls missed in either direction have tended to cancel one another out. On average across 240 Senate races since 1990, the polls have had a Republican bias of just 0.4 percentage points, a trivial number that is of little meaning statistically.
On the whole, it is reasonably impressive how unbiased the polls have been. In both presidential and Senate races, the bias has been less than a full percentage point over the long run, and it has run in opposite directions.
That does not mean the pollsters will necessarily get this particular election right. Years like 1980 suggest that there are sometimes errors in the polls that are much larger than can be explained through sampling error alone. The probability estimates you see attached to the FiveThirtyEight forecasts are based on how the polls have performed historically in practice, and not how well they claim to do in theory.
But if there is such an error, the historical evidence suggests that it is about equally likely to run in either direction.
Nor is there any suggestion that polls have become more biased toward Democratic candidates over time. Out of the past seven election cycles, the polls had a very slight Republican bias in 2010, and a more noticeable Republican bias in 1998, 2000 and 2006.
They had a Democratic bias only in 2004, and it was very modest.
Still, 2004 went to show that accusations of skewed polling are often rooted in wishful thinking.