# The Polls Are All Right

## Election polls sometimes get the answer wrong — but they’re about as accurate as they’ve always been.

Filed under 2018 Election

With the 2018 midterm elections approaching, we’ve updated FiveThirtyEight’s pollster ratings for the first time since the 2016 presidential primaries. Based on how the media portrayed the polls after President Trump’s victory over Hillary Clinton later that year, you might expect pollsters to get a pretty disastrous report card.

But here’s a stubborn and surprising fact — and one to keep in mind as midterm polls really start rolling in: Over the past two years — meaning in the 2016 general election and then in the various gubernatorial elections and special elections that have taken place in 2017 and 2018 — the accuracy of polls has been pretty much average by historical standards.

You read that right. Polls of the November 2016 presidential election were about as accurate as polls of presidential elections have been on average since 1972. And polls of gubernatorial and congressional elections in 2016 were about as accurate, on average, as polls of those races since 1998. Furthermore, polls of elections *since* 2016 — meaning, the 2017 gubernatorial elections and the various special elections to Congress this year and last year — have been slightly *more* accurate than average. This isn’t just a U.S. phenomenon: Despite often inaccurate and innumerate criticism over how polling fared in events like Brexit, a recent, comprehensive study of polling accuracy by Professor Will Jennings of the University of Southampton and Professor Christopher Wlezien of the University of Texas at Austin found polling accuracy has been fairly consistent over the past several decades in a variety of democratic countries in Europe, Asia and the Americas.

The media narrative that polling accuracy has taken a nosedive is mostly bullshit, in other words. Polls were never as good as the media assumed they were before 2016 — and they aren’t nearly as bad as the media seems to assume they are now. In reality, not that much has changed.

That’s not to say there aren’t reasons for concern. National polls were pretty good in the 2016 presidential election, but state-level polling was fairly poor (although still within the “normal” range of accuracy). Polls of the 2016 presidential primaries were sometimes way off the mark. And in many recent elections, the polls were statistically biased in one direction or another — there was a statistical bias toward Democrats in 2016, for instance.

There’s also reason to worry about what’s going *into* the polls as response rates to polls decline and as newsrooms cut their budgets for traditional, high-quality surveys. Internet-based polling may eventually be a part of the solution, but for the most part,
it was quite inaccurate in 2016 (we’ll go into more detail on this point in another article later this week). Let’s dig further into the evidence:

FiveThirtyEight’s pollster ratings database, which you can download here, includes all polls in the final 21 days of gubernatorial and congressional elections since 1998 and presidential primaries and general elections since 2000. It also includes polling of special elections for these offices, such as the race in Pennsylvania’s 18th Congressional District in March. We’ve also done a bit of cleanup on the pre-2016 polls in our database (see the footnotes for details ). In total, the database contains more than 8,500 polls.

Our preferred way to evaluate poll accuracy is simply to compare the margin in the poll against the actual result.
For instance, if a poll showed the Democrat winning by 2 percentage points in a race that the Republican ended up winning by 3 points, that would be a 5-point error. It would *also* be a 5-point error if the Democrat won by 7 points. We consider these errors to be equally bad even though the pollster “called” the winner correctly in one case and failed to do so in the other.

In the table below is the average error in different types of races and different election cycles since 1998. A few methodological notes as you browse through it: Special elections and off-year elections are grouped with the next even-numbered year; for instance, the 2009 Virginia gubernatorial race is included as part of the 2009-10 political cycle. Pollsters that are banned by FiveThirtyEight because we know or suspect that they faked their data or we are otherwise not confident in the legitimacy of their polling operation are not included in the averages. And some polling firms are considerably more prolific than others — for instance, Gravis Marketing conducted 33 polls of the 2016 presidential general election that count in our database, while the Iowa-based Selzer & Co. conducted just three. To partly counteract this, the averages are weighted in such a way that the highly prolific firms don’t dominate too much.

##### How accurate have U.S. polls been?

Weighted-average error in polls in final 21 days of the campaign

There’s quite a lot of info to digest in that table. But it’s worth starting with the number in the bottom right corner: The average error in *all* polls conducted in the late stage of campaigns since 1998 is about 6 percentage points. If the *average* error is 6 points, that means the true, empirically derived *margin of error* (or 95 percent confidence interval) is closer to 14 or 15 percentage points! That’s much more than you’d infer from the margins of error that pollsters traditionally list, which consider only sampling error and not other potential sources of error and which pertain only to one candidate’s vote share and not the margin between the candidates.

This means that you shouldn’t be surprised when a candidate who had been trailing in the polls by only a few points wins a race. And in some cases, even a poll showing a 10- or 12- or 14- point lead isn’t enough to make a candidate’s lead “safe.”

In other cases, you can expect a bit more accuracy. For instance, you may have more than one poll, and polling averages are more accurate than individual polls. The average isn’t foolproof — it doesn’t help when all the polls miss in the same direction — but you’re usually better off taking your chances with it than with individual surveys. And polls are slightly more accurate in the final few *days* of the campaign than in the final few weeks, although the accuracy gains are more modest than you might assume.

Polling error also varies based on the type of election. The general rules of thumb are that polling error in the primaries is *much* greater than in the general election and that polling error increases the further down the ballot you go. Thus, polls for U.S. House races are more error-prone than gubernatorial or U.S. Senate polls, which in turn are more error-prone than presidential election polls. This is important to remember once you start seeing polls of House races later this year. While some will be spot-on, many others will be off by 5 or 10 points or even more — and this will be perfectly “normal.”

This is a lot of words to spend without addressing the question of how the polls performed in 2016. In the case of House, Senate and gubernatorial polls — as the table shows — the answer is a pretty straightforward “about average” (and in the case of House polls, maybe even slightly better than average).

It’s also relatively easy to address the case of presidential primary polls: They were pretty darn bad in 2016, with an average error of 10.1 percentage points. Polling the primaries is hard — the average polling error in all presidential primaries since 2000 is 8.7 percentage points. But primary polls aren’t usually as bad as they were in 2016. Because voting in general elections operates along increasingly predictable demographic lines, pollsters can use demographic weighting to make up for other problems in their samples. They don’t always have that luxury in the primaries, where demographic coalitions are more fluid and turnout is more difficult to model. Polling in the 2020 primaries could be a pretty wild ride.

Polling of the 2016 presidential general election is the trickiest case to evaluate. The average error was 4.8 percentage points — slightly higher than in 2000 (4.4 points) and considerably higher than in 2004 (3.2 points), 2008 (3.6 points) or 2012 (3.6 points).

However, the error was about average as compared to the long-term accuracy of presidential polls. Our 2016 presidential election model gave Trump a much better chance than other forecasts did, in part because it derived its probabilities based on polls from elections dating back to 1972 (not just since 2000). Our data from those earlier election cycles isn’t quite as comprehensive or well-curated as the stuff in our official pollster ratings database. But it’s certainly good enough for a comparison on an aggregate basis. Below are the error calculations for polls of presidential elections dating back to 1972. I’ve listed the error for state polls and national polls separately and combined.

##### 2016′s presidential polls were about as accurate as average

Weighted-average error in polls in final 21 days of the campaign

On average since 1972, polls in the final 21 days of presidential elections have missed the actual margins in those races by 4.6 percentage points, almost exactly matching the 4.8-point error we saw in 2016. As we tried to emphasize before the election, it didn’t take any sort of extraordinary, unprecedented polling error for Trump to defeat Clinton. An ordinary, average polling error would do — one where Trump beat his polls by just a few points in just a couple of states — and that’s the polling error we got.

That error was concentrated much more in state polls, which missed by an average of 5.2 percentage points, than in national polls, which missed by just 3.1 percentage points.
This is somewhat typical, as national polls have been more accurate than state polls over the long run. The gap in 2016 was larger than usual, however. Polls underestimated Trump’s margin in states with large numbers of white voters without college degrees, but they also underestimated *Clinton* in states with large non-white or college-educated populations such as California. At the national level, these errors somewhat canceled each other out, but not so much at the state level.

But even the state polling errors were well within the normal range. Their 5.2-point average error isn’t far from the 4.8-point error that state polls have had on average since 1972. It wasn’t a year like 1980, when both state polls and national polls were off by almost 9 points, incorrectly showing a near dead heat between Jimmy Carter and Ronald Reagan (Reagan won the Electoral College 489-49).

Two other factors undoubtedly contributed to the widespread criticism about how polls performed in 2016.

One is that people had gotten spoiled by recent presidential elections. When looked at in historical context, what stands out isn’t that polling in 2016 was unusually poor, but that polling of the 2004, 2008 and 2012 presidential races was uncannily good — in a way that may have given people false expectations about how accurate polling has been all along.

The other factor is that the error was more *consequential* in 2016 than it was in past years, since Trump narrowly won a lot of states where Clinton was narrowly ahead in the polls. By contrast, in 2012, the polls somewhat underestimated Barack Obama’s numbers in several swing states as well as in the national popular vote. (National polls were actually a bit *more* accurate in 2016 than in 2012.) But it didn’t usually change the winner in these contests — Obama just won them by a clearer margin instead of a narrower one.

In the table below, you can see what percentage of races were “called” correctly in different election years. (A “call” is correct if the candidate who is leading in the poll wins the race. ) Over the long run, polls get the winner right about 80 percent of the time. That accuracy rate was just 71 percent in the 2016 presidential election, however. Down-ballot polls — House, Senate and gubernatorial — also had a bad year.

##### How often do polls “call” races correctly?

Weighted-average share of polls that correctly identify the winner in final 21 days of the campaign

But this is definitely *not* how FiveThirtyEight recommends evaluating pollster accuracy. In a true toss-up race, where public opinion is split evenly, a poll is going to be “wrong” approximately 50 percent of the time no matter what it says. There have been a lot of close elections recently — and polls get too much criticism when they correctly point toward a close race but the outcome goes against the media’s expectations. Polls often get too little criticism, however, when they “call” the winner correctly but are way off on the margin in landslide elections, such as in the French presidential election last year.

A more serious concern is that polls are sometimes statistically biased in one direction or the another. We measure statistical bias by accounting for the direction of the polling error — for instance, if a poll shows the Democrat winning by 9 percentage points and she actually wins by 4 points, that poll is biased in the Democrat’s favor by 5 percentage points.

In the 2016 general election, polls had a pro-Democratic bias of about 3 percentage points. This was fairly consistent across presidential, gubernatorial and congressional races; Trump outperformed his polls, but Republican candidates for Congress and governor did so by just as much. Polls also had a pro-Democratic bias in 2014.

##### Polling bias shifts from election to election

Weighted-average statistical bias in polls in final 21 days of the campaign

But the bias tends to shift unpredictably from election to election. Polls had a pro-Republican bias in 2012, for example. They’ve also had a pro-Republican bias in elections so far in 2017 and 2018. We strongly encourage readers to remember that polling error can occur in both directions and that it’s almost impossible to predict which direction in advance. There have been cases, such as in last year’s U.K. general election, in which pollsters overcompensated for past errors and introduced new ones that caused polls to miss in the opposite direction. Over the long run, U.S. election polls have had very little overall bias toward either Democrats or Republicans.

I recognize that to some readers, this will have seemed like an overly sanguine view of the state of the polling industry. But it’s not that we don’t have concerns; in fact, we’ve been concerned about problems like declining response rates for a long time.

Nonetheless, those concerns are not particularly larger or smaller than they were a few years ago because *polling performance has been about average for the past few years*. While there have been some genuine trouble spots, like polling in the 2016 presidential primaries, overall there simply hasn’t been a clear trend toward polls becoming either more accurate or less accurate over time. Polling continues to present new challenges, but pollsters also continue to learn from their mistakes and make improvements to their methods.

Media organizations need to do a better job of informing their readers about the uncertainties associated with polling and of distinguishing cases in which polls were within a few percentage points of the correct result, like in the Brexit vote, from true polling failures, like the 2016 Michigan Democratic primary. But they also need to recognize that polls pick the winner correctly most of the time — about 80 percent of the time — and that media’s attempts to outguess the polls have a really bad track record. Since 2016, we’ve already seen examples in which the media overcompensated for its past failures by mischaracterizing polls or refusing to draw any conclusions from them at all, badly misinforming their readers.

Finally, it’s worth re-emphasizing that while the House will probably be “in play” this November for the first time since 2010, polling of individual House races is historically some of the *least* accurate polling. House polls only identify the right winner about 70 percent of the time, as compared to 80 percent of the time in the other types of elections we track. Given a long list of potential Democratic pickups, it’s likely that the outcomes of dozens of House races will be at least somewhat uncertain heading into Election Day. Be wary of news accounts and of statistical models that claim to be able to forecast the number of Democratic pickups within just a few seats; that sort of precision isn’t realistic. Instead, this will be another election in which it’s important to think probabilistically about a fairly broad range of outcomes.

There were some exceptions such as YouGov.

There were some exceptions such as YouGov.

As measured by the median field date of the poll. In some cases in the primaries, a cutoff of fewer than 21 days is used; see our methodology for more detail.

There were some exceptions such as YouGov.

As measured by the median field date of the poll. In some cases in the primaries, a cutoff of fewer than 21 days is used; see our methodology for more detail.

That is, races for both the U.S. Senate and U.S. House, including generic ballot polls. For generic ballot polls, we compare the margin in the poll against the aggregate popular vote for the U.S. House.

There were some exceptions such as YouGov.

As measured by the median field date of the poll. In some cases in the primaries, a cutoff of fewer than 21 days is used; see our methodology for more detail.

That is, races for both the U.S. Senate and U.S. House, including generic ballot polls. For generic ballot polls, we compare the margin in the poll against the aggregate popular vote for the U.S. House.

The largest change is that we’ve removed a series of U.S. House polling that YouGov conducted of congressional districts in 2014 from our database. This is because these were not polls in a traditional sense but instead polling data blended with regression-model based forecasts. FiveThirtyEight considers techniques like these to be models rather than “polls,” and we do not include them in our own forecasts or polling averages. In addition, we’ve removed roughly two dozen duplicates from our polling database, fixed a few typos, changed the attribution of a few polls and updated the names of a few polling firms to reflect how they currently describe themselves.

There were some exceptions such as YouGov.

That is, races for both the U.S. Senate and U.S. House, including generic ballot polls. For generic ballot polls, we compare the margin in the poll against the aggregate popular vote for the U.S. House.

The largest change is that we’ve removed a series of U.S. House polling that YouGov conducted of congressional districts in 2014 from our database. This is because these were not polls in a traditional sense but instead polling data blended with regression-model based forecasts. FiveThirtyEight considers techniques like these to be models rather than “polls,” and we do not include them in our own forecasts or polling averages. In addition, we’ve removed roughly two dozen duplicates from our polling database, fixed a few typos, changed the attribution of a few polls and updated the names of a few polling firms to reflect how they currently describe themselves.

In races with multiple candidates, we evaluate the margin between the top two finishers.

There were some exceptions such as YouGov.

The largest change is that we’ve removed a series of U.S. House polling that YouGov conducted of congressional districts in 2014 from our database. This is because these were not polls in a traditional sense but instead polling data blended with regression-model based forecasts. FiveThirtyEight considers techniques like these to be models rather than “polls,” and we do not include them in our own forecasts or polling averages. In addition, we’ve removed roughly two dozen duplicates from our polling database, fixed a few typos, changed the attribution of a few polls and updated the names of a few polling firms to reflect how they currently describe themselves.

In races with multiple candidates, we evaluate the margin between the top two finishers.

Specifically, the weights are based on the *square root* of the number of polls in a particular category that each firm conducted. Each poll receives a weight of sqrt(n)/n, where “n” is that firm’s number of polls in that category. Each Gravis Marketing poll, for example, has a weight of approximately .17 therefore, while each Selzer & Co. poll has a weight of .58. However, because Gravis Marketing conducted much more polling, their polls still have more weight in the aggregate (about three times more in this example).

There were some exceptions such as YouGov.

In races with multiple candidates, we evaluate the margin between the top two finishers.

Specifically, the weights are based on the *square root* of the number of polls in a particular category that each firm conducted. Each poll receives a weight of sqrt(n)/n, where “n” is that firm’s number of polls in that category. Each Gravis Marketing poll, for example, has a weight of approximately .17 therefore, while each Selzer & Co. poll has a weight of .58. However, because Gravis Marketing conducted much more polling, their polls still have more weight in the aggregate (about three times more in this example).

Polls in the final seven days have an average error of 5.5 percentage points, compared with the 5.9 points for polls in the final 21 days.

There were some exceptions such as YouGov.

In races with multiple candidates, we evaluate the margin between the top two finishers.

Specifically, the weights are based on the *square root* of the number of polls in a particular category that each firm conducted. Each poll receives a weight of sqrt(n)/n, where “n” is that firm’s number of polls in that category. Each Gravis Marketing poll, for example, has a weight of approximately .17 therefore, while each Selzer & Co. poll has a weight of .58. However, because Gravis Marketing conducted much more polling, their polls still have more weight in the aggregate (about three times more in this example).

Polls in the final seven days have an average error of 5.5 percentage points, compared with the 5.9 points for polls in the final 21 days.

It’s worth remembering that most polls correctly predicted that Clinton would win the popular vote.

There were some exceptions such as YouGov.

In races with multiple candidates, we evaluate the margin between the top two finishers.

*square root* of the number of polls in a particular category that each firm conducted. Each poll receives a weight of sqrt(n)/n, where “n” is that firm’s number of polls in that category. Each Gravis Marketing poll, for example, has a weight of approximately .17 therefore, while each Selzer & Co. poll has a weight of .58. However, because Gravis Marketing conducted much more polling, their polls still have more weight in the aggregate (about three times more in this example).

Polls in the final seven days have an average error of 5.5 percentage points, compared with the 5.9 points for polls in the final 21 days.

It’s worth remembering that most polls correctly predicted that Clinton would win the popular vote.

Pollsters get half-credit if they show a tie for the lead and one of the leading candidates wins.

There were some exceptions such as YouGov.

In races with multiple candidates, we evaluate the margin between the top two finishers.

*square root* of the number of polls in a particular category that each firm conducted. Each poll receives a weight of sqrt(n)/n, where “n” is that firm’s number of polls in that category. Each Gravis Marketing poll, for example, has a weight of approximately .17 therefore, while each Selzer & Co. poll has a weight of .58. However, because Gravis Marketing conducted much more polling, their polls still have more weight in the aggregate (about three times more in this example).

It’s worth remembering that most polls correctly predicted that Clinton would win the popular vote.

Pollsters get half-credit if they show a tie for the lead and one of the leading candidates wins.

Statistical bias doesn’t necessarily have anything to do with partisan bias; some media outlets that are accused of having a pro-Republican bias in their coverage have had a pro-Democratic statistical bias in their polls, for example.

There were some exceptions such as YouGov.

In races with multiple candidates, we evaluate the margin between the top two finishers.

*square root* of the number of polls in a particular category that each firm conducted. Each poll receives a weight of sqrt(n)/n, where “n” is that firm’s number of polls in that category. Each Gravis Marketing poll, for example, has a weight of approximately .17 therefore, while each Selzer & Co. poll has a weight of .58. However, because Gravis Marketing conducted much more polling, their polls still have more weight in the aggregate (about three times more in this example).

It’s worth remembering that most polls correctly predicted that Clinton would win the popular vote.

Pollsters get half-credit if they show a tie for the lead and one of the leading candidates wins.

Statistical bias doesn’t necessarily have anything to do with partisan bias; some media outlets that are accused of having a pro-Republican bias in their coverage have had a pro-Democratic statistical bias in their polls, for example.

This is perhaps evidence against the theory of the “shy” Trump voter.

There were some exceptions such as YouGov.

In races with multiple candidates, we evaluate the margin between the top two finishers.

*square root* of the number of polls in a particular category that each firm conducted. Each poll receives a weight of sqrt(n)/n, where “n” is that firm’s number of polls in that category. Each Gravis Marketing poll, for example, has a weight of approximately .17 therefore, while each Selzer & Co. poll has a weight of .58. However, because Gravis Marketing conducted much more polling, their polls still have more weight in the aggregate (about three times more in this example).

It’s worth remembering that most polls correctly predicted that Clinton would win the popular vote.

Pollsters get half-credit if they show a tie for the lead and one of the leading candidates wins.

Statistical bias doesn’t necessarily have anything to do with partisan bias; some media outlets that are accused of having a pro-Republican bias in their coverage have had a pro-Democratic statistical bias in their polls, for example.

This is perhaps evidence against the theory of the “shy” Trump voter.

If anything, the error is often in the opposite direction of what most people expect.

There were some exceptions such as YouGov.

In races with multiple candidates, we evaluate the margin between the top two finishers.

*square root* of the number of polls in a particular category that each firm conducted. Each poll receives a weight of sqrt(n)/n, where “n” is that firm’s number of polls in that category. Each Gravis Marketing poll, for example, has a weight of approximately .17 therefore, while each Selzer & Co. poll has a weight of .58. However, because Gravis Marketing conducted much more polling, their polls still have more weight in the aggregate (about three times more in this example).

It’s worth remembering that most polls correctly predicted that Clinton would win the popular vote.

Pollsters get half-credit if they show a tie for the lead and one of the leading candidates wins.

This is perhaps evidence against the theory of the “shy” Trump voter.

If anything, the error is often in the opposite direction of what most people expect.

This past week, for example, The Guardian characterized Ireland’s vote on its abortion referendum as “close” even though polls this month showed the “yes” side leading by margins ranging from 11 to 29 points. (A “yes” vote would authorize the Irish parliament to legalize abortion.) In fact, “yes” won the referendum by 33 percentage points.