Two years ago, in advance of the 2014 midterms and in conjunction with the release of FiveThirtyEight’s pollster ratings, I wrote an article headlined “Is The Polling Industry In Stasis Or In Crisis?” It pointed out a seeming contradiction: Although there were lots of reasons to be worried about the reliability of polling — in particular, the ever-declining response rates for telephone surveys — there wasn’t much evidence of a crisis in the results pollsters were obtaining. Instead, the 2008, 2010 and 2012 election cycles had all featured fairly accurate polling.

Has the reckoning come for the polls since then? The evidence is somewhat mixed. The 2014 midterm elections were not a banner year for pollsters, with most polls showing a statistical bias toward Democrats (reversing their statistical bias toward Republicans in 2012). As a result, there were a handful of major upsets by Republican candidates, along with a few near misses. Still, the error in the polls was reasonably in line with historical norms. It wasn’t the disaster that pollsters have experienced in other countries, such as the United Kingdom and Greece, or even in previous U.S. midterm elections, such as 1994 and 1998.

**Pollster Ratings:**
We’ve analyzed the historical accuracy of more than 350 polling agencies and rated them according to performance and methodology. See all of our pollster ratings »

If the 2014 midterm polls were a little better than reputed, however, the reverse might be true of the 2016 presidential primaries polls. Importantly, the polls (and even more so, the polling averages) had a good track record of calling winners, with the polling front-runner winning the vast majority of the time. Furthermore, the polls caught wind of Donald Trump’s popularity among Republicans early in the cycle, even as a lot of journalists (including, uhh, yours truly) were deeply skeptical about his chances. But the margins were often pretty far off, especially in the Democratic race, with Hillary Clinton often blowing away her polling numbers in the South and Bernie Sanders often doing so elsewhere in the country. Furthermore, although there weren’t many upsets, at least one of them — Sanders’s win in Michigan — was historically epic.

Don’t take our word for it, though: We’d encourage you to explore the data for yourself. We’ve just released a new set of pollster ratings, based on data up through and including the Oregon presidential primary May 17. We’ve also published the raw data behind these ratings: more than 7,900 polls conducted in the final three weeks before presidential primaries and caucuses and general elections for president, governor, and the U.S. Senate and House since 1998.

The methodology we use to calculate the pollster ratings is highly similar to the procedures we followed to generate our 2014 ratings, with a handful of exceptions that I describe in the footnotes.

As before, the ratings are based both on a pollster’s past accuracy and on two easily measurable methodological standards:

- The first standard is whether the firm participates in the American Association for Public Opinion Research Transparency Initiative, is a member of the National Council on Public Polls or contributes its data to the Roper Center for Public Opinion Research archive. Polling firms that do one or more of these things generally abide by industry-standard practices for disclosure, transparency and methodology and have historically had more accurate results.
- The second standard is whether the firm usually conducts its polls by placing telephone calls with live interviewers and calls cellphones as well as landlines. Automated polls (“robopolls”), which are legally prohibited from calling cellphones, do not meet this standard even if they use hybrid or mixed-mode methodologies (for example, robocalling landlines and then supplementing with cellphone calls placed by live interviewers). It’s increasingly essential to call cellphones given that about half of American households no longer have a home landline. Although internet polls show promise as a potential alternative, they do not yet have a long enough or consistent enough track record to be placed on the same pedestal as high-quality, live-interview telephone polls, based on our view of the evidence.

But enough about methodology; let’s return to the question of how the polls fared in the 2014 midterms and 2016 presidential primaries.

First, here’s a calculation we call Simple Average Error. It measures the difference between the percentage of the vote separating the top two finishers in the election and the margin shown by the polls. For instance, if a poll had projected Trump to beat Ted Cruz by 2 percentage points in an election and Trump won by 10 points, that would count as an 8 percentage point error. Likewise, it would count as an 8-point error if Trump had been projected to beat Cruz by 2 points but lost to him by 6 instead.

In 2014, the average gubernatorial poll had an error of 4.5 percentage points as defined in this way, and the average Senate poll had an error of 5.4 percentage points. The gubernatorial polls were a bit more accurate than usual and the Senate polls a bit less, but both figures are reasonably in line with historical norms. A bigger problem is that most of these polls missed in the same direction, underestimating the Republican candidate. (We’ll take that point up in a moment.)

House polls are typically less accurate than Senate or gubernatorial polls — the further down the ballot you go, the larger the polling error tends to be — and 2014 was no exception, with the average House poll missing by 7.9 percentage points. That’s not a good result by any stretch of the imagination, although that number is inflated somewhat by a single polling firm, YouGov, which ambitiously released polls of all 435 congressional districts. Most of those polls had small sample sizes of fewer than 200 respondents, and most of them were in noncompetitive districts, which can be difficult to poll. (Our Advanced Plus-Minus and Predictive Plus-Minus calculations adjust for these factors, but Simple Average Error does not.) Excluding YouGov, the average error for House polls was 6.6 percentage points, still a mediocre performance, although similar to past years such as 1998 and 2010.

If House races can be tricky to poll, presidential primaries are even harder, for some of the reasons I mention here. Turnout is relatively low, people are sometimes choosing among several similar candidates, and voters often wait until the last minute to make their decisions. So primary polling is almost always a fairly wild ride. Still, the average error in primary polls this year was a whopping 9.4 percentage points, worse than the average of 8.1 percentage points in all presidential primaries since 2000. The problems were worse on the Democratic side, with an average error of 10.6 percentage points, compared with 8.3 percentage points in the Republican race.

And yet, the primary polls have done a pretty good job of picking winners. In 85 percent of polls this year, the leading candidate in the poll matched the winning candidate in the election. That’s much better than in 2012, when the leading candidate won in the polls only 61 percent of the time.

What gives? A lot of primaries and caucuses were lopsided this year, with strongly regional voting patterns; Clinton dominated in the South, and Trump cleaned up in the Northeast, for example. (By contrast, a lot of Republican primaries came down to the wire in 2012.) It’s easier to call the winners right, of course, when an election isn’t competitive. On the other hand, it can be harder for pollsters to nail down the margin in such races. A seemingly noncompetitive race can discourage turnout, either allowing the leading candidate to run up the score or, occasionally, the trailing candidate to do much better than expected because the leading candidate’s voters become complacent. As a technical matter, the handling of undecided voters also matters more in noncompetitive races. In a slight change this year, our Advanced Plus-Minus and Predictive Plus-Minus ratings account for the fact that less-competitive races are associated with a larger error, on average. Still, this doesn’t entirely excuse the polls.

Another useful measure of polling performance is statistical bias, which indicates whether polls consistently miss in the same direction. If over a large number of races, for example, your polling firm projects Democratic candidates to win by an average of 5 percentage points, and they win by 2 percentage points instead, that means those polls had a pro-Democratic (and anti-Republican) statistical bias of 3 percentage points. Statistical bias isn’t necessarily an indication of partisan bias; some media outlets that are accused of having a pro-Republican bias in their coverage have a pro-Democratic statistical bias in their polls, and vice versa. But it has nevertheless been a problem in recent years:

In 2012, the polls had a 2 or 3 percentage point pro-Republican bias, meaning that they underrated how well President Obama would do, along with Democratic candidates in gubernatorial and congressional races. In 2014, by contrast, they had roughly a 3 percentage point pro-Democratic bias. So rather than the merely good Republican year implied by the polls, Republicans had a near-landslide in the midterms instead.

The good news is that, over the long run, the polls haven’t had much of an overall bias, having underrated Republicans in some elections and Democrats in others. But the bias has shifted around somewhat unpredictably from election to election. You should be wary of claims that the polls are bound to be biased in the same direction that they were two years ago or four years ago.

You should also recognize the potential for statistical bias even (or perhaps especially) if the polls all seem to agree with one another. An uncannily strong consensus among the polls may indicate herding, which means polling firms are suppressing outlier results that appear out of line with the consensus. Good election forecasting models can account for the possibility of herding and statistical bias by assuming that the error in polls is correlated from state to state, but it raises the degree of difficulty in building a model.

What about bias in the presidential primary polls? Those numbers aren’t shown in the table above because the bias calculations we list in our pollster ratings pertain only to general elections. But we can calculate them by the same method. Indeed, statistical bias has been a problem in both party primaries this year.

In Republican primaries and caucuses, the polls generally had a pro-Trump and anti-Cruz bias. In races where Trump and Cruz were the top two finishers in some order, the bias was 5.5 percentage points in Trump’s favor. The bias dissipated as the race went along, and there wasn’t as much of a bias when another candidate — John Kasich or Marco Rubio — was Trump’s main competitor in a state. Still, the primary results ought to raise doubts about the theory that a “silent majority” of Trump supporters is being overlooked by the polls. In the primaries, Trump was somewhat overrated by the polls.

In the Democratic race, the polls had a 1.8 percentage point bias toward Clinton (and against Sanders) overall. However, it varied significantly based on the demographic makeup of the state, with Clinton outperforming her polls in diverse states and Sanders beating his in whiter ones. Specifically, in states where at least 25 percent of the population is black or Hispanic, the polls had a pro-Sanders, anti-Clinton bias of 5.7 percentage points. But they had an 8.2 percentage point bias toward Clinton, and against Sanders, in states where less than 25 percent of the population is black or Hispanic.

As I mentioned before, the polls mostly identified the right winners, and some of the bias reflected the candidates’ running up the score in demographically favorable terrain. (Clinton, for instance, won South Carolina by 47 percentage points instead of the 27 points projected by the polls.) Still, the results are troubling given that the Sanders and Clinton coalitions each contain hard-to-poll groups. In Clinton’s case, that means black and Hispanic voters, who are usually harder for polls to reach than white voters. For Sanders, that means young and first-time voters, who are also hard to reach and who are sometimes screened out incorrectly by likely voter models. Pollsters should think carefully about their strategies for reaching these groups in the general election.

Finally, let’s take a look at how some of the more prolific pollsters have performed recently. The following table contains our Advanced Plus-Minus scores for polling firms that released at least 20 total polls between the 2014 midterms and 2016 presidential primaries. Advanced Plus-Minus measures a poll’s error as compared against others that surveyed the same race, controlling for factors such as the number of days between the poll and the election.
*Negative scores are good* and mean the pollster performed better than other polling firms under similar conditions.

The best-performing polls recently have been those from Monmouth University and those from Marist College. Both apply “gold standard” methodologies, using live telephone interviews and placing calls to cellphones as well as landlines, and both participate in the AAPOR Transparency Initiative. Quinnipiac University, another “gold standard” pollster, has also performed fairly well of late. So has Fox News, which switched to new polling partners in 2011 and has gotten good results since then.

Automated polling firms have gotten mediocre results in recent years, especially SurveyUSA and Rasmussen Reports, although note that SurveyUSA has a long history of accurate polling and so retains a high grade overall. Public Policy Polling has gotten fairly good results, by contrast, although that may be because the pollster engages in a high degree of herding.

YouGov, which fared poorly by calculations such as Simple Average Error, gets about an average rating according to Advanced Plus-Minus, which accounts for the fact that they were polling under difficult circumstances (for instance, polling obscure House races that nobody else tried to survey). Still, we’re awaiting more evidence about the reliability of online polls. SurveyMonkey, which has sometimes partnered with FiveThirtyEight, released a set of polls of the Democratic and Republican primaries before Super Tuesday and got mediocre results, but it isn’t really enough data to come to conclusions about long-term accuracy. Other online polling firms, such as Morning Consult and Ipsos, have focused on national polls instead of issuing state polls ahead of key primaries and general elections. In my view, the online pollsters have been too gun-shy as a group (with YouGov an important exception) to issue polls of state and local elections. These firms employ a lot of smart people, and my working assumption is that online polls are already more accurate than automated telephone polls (if not necessarily traditional telephone polls, at least not yet). But that’s nothing more than an educated guess until we get more data on how they perform.

Participation in the AAPOR Transparency Initiative, NCPP or the Roper Center archive continues to be a strong predictor of polling accuracy. Polling firms that get our AAPOR/NCPP/Roper check mark have had an Advanced Plus-Minus score of -0.4 since 2014, compared with a score of +0.8 for those that don’t have it.

All told, the evidence is ambiguous enough to be consistent with almost any case you’d like to make: either that the polls are about as accurate as they’ve always been, which isn’t to say that they’re perfect, or that there are real warning signs of trouble ahead, which isn’t to say the polls are useless. The answer may also depend on which polls you’re looking at. As we’ve found in the past, polls that employ more expensive methodologies, and abide by higher levels of disclosure and transparency, tend to be more accurate than those that don’t. It may be that the best polls are roughly as accurate as ever but that the worst polls are increasingly far off the mark.

Methodological changes from the 2014 pollster ratings are as follows:

- Previously, we tracked which polling firms called cellphones in addition to landlines, and those firms that failed to call cellphones received a methodological penalty to their Predictive Plus-Minus scores. Now, in order for a pollster to avoid the methodological penalty, we require both that it calls cellphones and that all of its calls are placed by live interviewers instead of by automated script.
- Previously, we had separate listings for a handful of firms, such as Zogby and Ipsos, that conducted some polls over the telephone and others online. Now, we combine all their polls under one listing. Generally speaking, these firms are conducting all their polls online now.
- Previously, in calculating our Simple Plus-Minus and Advanced Plus-Minus metrics, which both adjust for the sample size of a poll in accounting for its accuracy, we had derived the coefficient for sample size empirically through a regression analysis. This turned out to be problematic because sample size is correlated with other characteristics of a poll. In particular, polls of presidential primaries and U.S. House races, which are hard to poll accurately, tend to have smaller sample sizes than polls of presidential, gubernatorial and U.S. Senate general elections. The regression analysis was conflating these variables and was therefore too forgiving to polls with small sample sizes (and too punitive to polls with large sample sizes). Thus, the effect of sample size is now fixed based on the margin of error formula. We’ve also switched from a linear regression to a nonlinear regression in calculating Simple Plus-Minus and Advanced Plus-Minus.
- Advanced Plus-Minus now accounts for the competitiveness of an election, as measured by the absolute value of the difference separating the top two candidates averaged over all polls of that race. Less-competitive races are associated with higher errors, so Advanced Plus-Minus now adjusts for this when judging the accuracy of a poll.
- Previously, Advanced Plus-Minus adjusted for the number of days between the poll and the election. It continues to do so but now uses different coefficients for primary and general election polls. Primary polls are more volatile, so the timing of the poll matters more.
- We calculate Predictive Plus-Minus by taking Advanced Plus-Minus and reverting it to the mean, where the mean is based on the methodological characteristics of the poll. In calculating Predictive Plus-Minus, we now round any Advanced Plus-Minus scores that are lower than -2 up to -2, since -2 represents the theoretical limit on how accurate polls can be over the long run (negative scores are good). This affects pollsters that got extremely accurate results but polled only a handful of elections; Predictive Plus-Minus treats them a bit more skeptically now.

Methodological changes from the 2014 pollster ratings are as follows:

- Previously, we tracked which polling firms called cellphones in addition to landlines, and those firms that failed to call cellphones received a methodological penalty to their Predictive Plus-Minus scores. Now, in order for a pollster to avoid the methodological penalty, we require both that it calls cellphones and that all of its calls are placed by live interviewers instead of by automated script.
- Previously, we had separate listings for a handful of firms, such as Zogby and Ipsos, that conducted some polls over the telephone and others online. Now, we combine all their polls under one listing. Generally speaking, these firms are conducting all their polls online now.
- Previously, in calculating our Simple Plus-Minus and Advanced Plus-Minus metrics, which both adjust for the sample size of a poll in accounting for its accuracy, we had derived the coefficient for sample size empirically through a regression analysis. This turned out to be problematic because sample size is correlated with other characteristics of a poll. In particular, polls of presidential primaries and U.S. House races, which are hard to poll accurately, tend to have smaller sample sizes than polls of presidential, gubernatorial and U.S. Senate general elections. The regression analysis was conflating these variables and was therefore too forgiving to polls with small sample sizes (and too punitive to polls with large sample sizes). Thus, the effect of sample size is now fixed based on the margin of error formula. We’ve also switched from a linear regression to a nonlinear regression in calculating Simple Plus-Minus and Advanced Plus-Minus.
- Advanced Plus-Minus now accounts for the competitiveness of an election, as measured by the absolute value of the difference separating the top two candidates averaged over all polls of that race. Less-competitive races are associated with higher errors, so Advanced Plus-Minus now adjusts for this when judging the accuracy of a poll.
- Previously, Advanced Plus-Minus adjusted for the number of days between the poll and the election. It continues to do so but now uses different coefficients for primary and general election polls. Primary polls are more volatile, so the timing of the poll matters more.
- We calculate Predictive Plus-Minus by taking Advanced Plus-Minus and reverting it to the mean, where the mean is based on the methodological characteristics of the poll. In calculating Predictive Plus-Minus, we now round any Advanced Plus-Minus scores that are lower than -2 up to -2, since -2 represents the theoretical limit on how accurate polls can be over the long run (negative scores are good). This affects pollsters that got extremely accurate results but polled only a handful of elections; Predictive Plus-Minus treats them a bit more skeptically now.

This is a slight change from 2014, when such hybrid polls were deemed to meet the cellphone standard. Polls using mixed modes have not been very accurate in recent years.

Methodological changes from the 2014 pollster ratings are as follows:

- Previously, we tracked which polling firms called cellphones in addition to landlines, and those firms that failed to call cellphones received a methodological penalty to their Predictive Plus-Minus scores. Now, in order for a pollster to avoid the methodological penalty, we require both that it calls cellphones and that all of its calls are placed by live interviewers instead of by automated script.
- Previously, we had separate listings for a handful of firms, such as Zogby and Ipsos, that conducted some polls over the telephone and others online. Now, we combine all their polls under one listing. Generally speaking, these firms are conducting all their polls online now.
- Previously, in calculating our Simple Plus-Minus and Advanced Plus-Minus metrics, which both adjust for the sample size of a poll in accounting for its accuracy, we had derived the coefficient for sample size empirically through a regression analysis. This turned out to be problematic because sample size is correlated with other characteristics of a poll. In particular, polls of presidential primaries and U.S. House races, which are hard to poll accurately, tend to have smaller sample sizes than polls of presidential, gubernatorial and U.S. Senate general elections. The regression analysis was conflating these variables and was therefore too forgiving to polls with small sample sizes (and too punitive to polls with large sample sizes). Thus, the effect of sample size is now fixed based on the margin of error formula. We’ve also switched from a linear regression to a nonlinear regression in calculating Simple Plus-Minus and Advanced Plus-Minus.
- Advanced Plus-Minus now accounts for the competitiveness of an election, as measured by the absolute value of the difference separating the top two candidates averaged over all polls of that race. Less-competitive races are associated with higher errors, so Advanced Plus-Minus now adjusts for this when judging the accuracy of a poll.
- Previously, Advanced Plus-Minus adjusted for the number of days between the poll and the election. It continues to do so but now uses different coefficients for primary and general election polls. Primary polls are more volatile, so the timing of the poll matters more.
- We calculate Predictive Plus-Minus by taking Advanced Plus-Minus and reverting it to the mean, where the mean is based on the methodological characteristics of the poll. In calculating Predictive Plus-Minus, we now round any Advanced Plus-Minus scores that are lower than -2 up to -2, since -2 represents the theoretical limit on how accurate polls can be over the long run (negative scores are good). This affects pollsters that got extremely accurate results but polled only a handful of elections; Predictive Plus-Minus treats them a bit more skeptically now.

This is a slight change from 2014, when such hybrid polls were deemed to meet the cellphone standard. Polls using mixed modes have not been very accurate in recent years.

The question is whether to split the undecided voters evenly or proportionately. In a lopsided House race where the Republican leads the Democrat 60-20 with 20 percent undecided, for example, splitting them proportionately would result in a 50-point lead (75-25) for the Republican, and splitting them evenly would produce a 40-point lead (70-30) instead. This matters less in a close race because there are typically fewer undecideds and because splitting them evenly or proportionately will produce approximately the same result. For the sake of simplicity, our pollster ratings assume that undecided voters should be split evenly — for instance, they treat a poll showing the Republican ahead of the Democrat 60 percent to 20 percent (with 20 percent undecided) as projecting a 40-point win for the Republican. In practice, however, a blend between an even split and a proportional split will produce slightly more accurate results.

Methodological changes from the 2014 pollster ratings are as follows:

This is a slight change from 2014, when such hybrid polls were deemed to meet the cellphone standard. Polls using mixed modes have not been very accurate in recent years.

The question is whether to split the undecided voters evenly or proportionately. In a lopsided House race where the Republican leads the Democrat 60-20 with 20 percent undecided, for example, splitting them proportionately would result in a 50-point lead (75-25) for the Republican, and splitting them evenly would produce a 40-point lead (70-30) instead. This matters less in a close race because there are typically fewer undecideds and because splitting them evenly or proportionately will produce approximately the same result. For the sake of simplicity, our pollster ratings assume that undecided voters should be split evenly — for instance, they treat a poll showing the Republican ahead of the Democrat 60 percent to 20 percent (with 20 percent undecided) as projecting a 40-point win for the Republican. In practice, however, a blend between an even split and a proportional split will produce slightly more accurate results.

According to data from the 2010 Census, these states are, among those to have held primaries or caucuses so far: Alabama, Arizona, Colorado, Delaware, Florida, Georgia, Illinois, Louisiana, Maryland, Mississippi, Nevada, New York, North Carolina, South Carolina, Texas and Virginia.

Methodological changes from the 2014 pollster ratings are as follows:

The question is whether to split the undecided voters evenly or proportionately. In a lopsided House race where the Republican leads the Democrat 60-20 with 20 percent undecided, for example, splitting them proportionately would result in a 50-point lead (75-25) for the Republican, and splitting them evenly would produce a 40-point lead (70-30) instead. This matters less in a close race because there are typically fewer undecideds and because splitting them evenly or proportionately will produce approximately the same result. For the sake of simplicity, our pollster ratings assume that undecided voters should be split evenly — for instance, they treat a poll showing the Republican ahead of the Democrat 60 percent to 20 percent (with 20 percent undecided) as projecting a 40-point win for the Republican. In practice, however, a blend between an even split and a proportional split will produce slightly more accurate results.

According to data from the 2010 Census, these states are, among those to have held primaries or caucuses so far: Alabama, Arizona, Colorado, Delaware, Florida, Georgia, Illinois, Louisiana, Maryland, Mississippi, Nevada, New York, North Carolina, South Carolina, Texas and Virginia.

Specifically, Advanced-Plus Minus controls for the poll’s sample size, the number of days between the poll and the election, the number of other polling firms surveying the same election (polls perform better when there are more of them, perhaps because of herding), and how lopsided the race was (less competitive races are more difficult to poll).

Methodological changes from the 2014 pollster ratings are as follows:

According to data from the 2010 Census, these states are, among those to have held primaries or caucuses so far: Alabama, Arizona, Colorado, Delaware, Florida, Georgia, Illinois, Louisiana, Maryland, Mississippi, Nevada, New York, North Carolina, South Carolina, Texas and Virginia.

Specifically, Advanced-Plus Minus controls for the poll’s sample size, the number of days between the poll and the election, the number of other polling firms surveying the same election (polls perform better when there are more of them, perhaps because of herding), and how lopsided the race was (less competitive races are more difficult to poll).

Pre-2011 Fox News polls, which were conducted by Opinion Dynamics Corp., were less accurate and are classified separately in our database.

Methodological changes from the 2014 pollster ratings are as follows:

Specifically, Advanced-Plus Minus controls for the poll’s sample size, the number of days between the poll and the election, the number of other polling firms surveying the same election (polls perform better when there are more of them, perhaps because of herding), and how lopsided the race was (less competitive races are more difficult to poll).

Pre-2011 Fox News polls, which were conducted by Opinion Dynamics Corp., were less accurate and are classified separately in our database.

This average is weighted by the square root of the number of polls conducted by each firm.