Much maligned for their performance in the 2016 general election — and somewhat unfairly so, since the overall accuracy of the polls was only slightly below average that year by historical standards — American election polls have been quite accurate since then. Their performance was very strong in the 2018 midterms, despite the challenge of having to poll dozens of diverse congressional districts around the country, many of which had not had a competitive election in years. Polls have also generally been accurate in the various special elections and off-year gubernatorial elections that have occurred since 2016, even though those are also often difficult races to poll.
[Related: FiveThirtyEight’s Pollster Ratings]
Does that mean everything is looking up in the industry? Well, no. We’ll introduce some complications in a moment. But I do want to re-emphasize that opening takeaway, since the media is just flatly wrong when it asserts that the polls can’t be trusted. In fact, American election polls are about as accurate as they’ve always been. That doesn’t mean polls will always identify the right winner, especially in close elections. (As a simple rule of thumb, we’ve found polls “call” the right winner 80 percent of the time, meaning they fail to do so the other 20 percent of the time — although upsets are more likely to occur in some circumstances than others.) But the rate of upsets hasn’t changed much over time.
Before we go any further, I want to direct you to the latest version of FiveThirtyEight’s pollster ratings, which we’ve updated for the first time since May 2018. They include all polls in the three weeks leading up to every U.S. House, U.S. Senate and gubernatorial general election since then,1 including special elections, plus a handful of polls from past years that were missing from previous versions of our database. You can find much more detail on the pollster ratings here, including all the polls used in the ratings calculation. Our presidential approval ratings, generic congressional ballot and impeachment trackers have also been updated to reflect these new ratings, although they make little difference to the topline numbers.
Now then, for those complications: The main one is simply that response rates to traditional telephone polls continue to decline. In large part because of caller-ID and call-blocking technologies, it’s simply harder than it used to be to get people to answer phone calls from people they don’t know. In addition to potentially making polls less accurate, that also makes them more expensive, since a pollster has to spend more time making calls for every completed response that it gets. As a result, the overall number of polls has begun to slightly decline. There were 532 polls in our pollster ratings database, which covers polls in the 21 days before elections occur, associated with elections on Nov. 6, 2018, which is down from 558 polls for Election Day 2014 and 692 polls for Election Day 2010.2
So why not turn to online polls or other new technologies? Well, the problem is that in recent elections, polls that use live interviewers to call both landlines and cellphones continue to outperform other methods, such as online and automated (IVR) polls. Moreover, online and IVR polls are generally more prone toward herding — that is, making methodological choices, or picking and choosing which results they publish, in ways that make their polls match other, more traditional polls. So not only are online and automated polls somewhat less accurate than live-caller polls, but they’d probably suffer a further decline in accuracy if they didn’t have live polls to herd toward.
Still, online polling is undoubtedly a large part of polling’s future — and some online polling firms are more accurate than others. Among the most prolific online pollsters, for example, YouGov stands out for being more accurate than others such as Zogby, SurveyMonkey, and Harris Insights & Analytics. And many former IVR pollsters are now migrating to hybrid methods that combine automated phone polling with internet panels. In the 2018 elections, this produced better results in some cases (e.g., SurveyUSA) than in others (e.g., Rasmussen Reports).
Polls have been quite accurate — and unbiased — in post-2016 elections
Each time we update our pollster ratings, we publish a few charts that depict the overall health of the industry — so let’s go ahead and run the numbers again. The first chart is the one we consider to be the most important: the average error of polls broken down by the type of election. A few quick methodological notes:
- By average error, I mean the difference between the margin projected by the poll and the actual election result. For instance, if the poll shows the Democrat up by 1 percentage point and the Republican wins by 2 points, that would be a 3-point error.
- To not give any one polling firm too much influence, the values in the chart are weighted based on the number of polls a particular pollster conducted for that particular type of election in that particular cycle3
- Polls that are banned by FiveThirtyEight because we know or suspect that they faked data are excluded from the analysis.
- Note that I’ve included the handful of elections that have occurred so far in 2019 with the 2017-18 election cycle, even though we’ll classify them them later as part of the 2019-20 cycle instead.
OK, here’s the data:
|Cycle||Governor||U.S. Senate||U.S. House||General||Primary||Combined|
As I said, the 2017-19 cycle was one of the most accurate on record for polling. The average error of 5.0 points in polls of U.S. House elections is the second-best in our database, trailing only 1999-2000. The 4.3-point error associated with U.S. Senate elections is also the second-best, slightly trailing 2005-06. And gubernatorial polls had an average error of 5.3 points, which is about average by historical standards.
Combining all different types of elections together, we find that polls from 2017 onward have been associated with an average error of 5.0 points, which is considerably better than the 6.7-point average for 2015-16, and the best in any election cycle since 2003-04.
But note that there’s just not much of an overall trajectory — upward or downward — in polling accuracy. Relatively strong cycles for the polls can be followed by relatively weak ones, and vice versa.
One more key reminder now that the Iowa caucuses are only three months away: Some types of elections are associated with considerably larger polling errors than others. In particular, presidential primaries feature polling that is often volatile at best, and downright inaccurate at worst. Overall, presidential primary polls in our database mispredict the final margin between the top two candidates by an average of 8.7 points. And the error was even worse, 10.1 points, in the 2016 primary cycle. Leads of 10 points, 15 points or sometimes more are not necessarily safe in the primaries.
We can also look at polling accuracy by simply counting up how often the candidate leading in the poll wins his or her race.4 This isn’t our preferred method, as it’s a bit simplistic — if a poll had the Republican ahead by 1 point and the Democrat won by 1 point, that’s a much more accurate result than if the Republican had won by 20, even though it would have incorrectly identified the winner. But across all polls in our database, the winner was “called” 79 percent of the time.
|Cycle||Governor||U.S. Senate||U.S. House||General||Primary||Combined|
In recent elections, the winning percentage has been slightly below the long-term average — it was 76 percent in 2017-19. But this reflects the recent uptick in close elections, and that resource-constrained pollsters tend to poll these close elections more heavily.5
As basic as this analysis is, it’s essential to remember that polls are much more likely to misidentify the winner when they show a close race. Polls in our database that showed a lead of 3 percentage points or less identified the winner only 58 percent of the time — a bit better than random chance, but not much better. But polls showing a 3- or 6-point lead were right 72 percent of the time, and those with a 6- or 10-point lead were right 86 percent of the time. (Errors in races showing double-digit leads are quite rare in general elections, although they occur with some frequency in primaries. And errors in races where one candidate leads by 20 or more points are once-in-a-blue-moon types of events, regardless of the type of election.)
|Leading candidate’s margin||Share of polls correctly identifying winner|
Another essential measure of polling accuracy is statistical bias — that is, whether the polls tend to miss in the same direction. We’re particularly interested in understanding whether polls systematically favor Democrats or Republicans. Take the polls in 2016, for instance. Although they weren’t that bad from an accuracy standpoint, the majority underestimated President Trump and Republicans running for Congress and governor, leading them to underestimate how well Trump would do in the Electoral College. Overall in the 2015-16 cycle, polls had a Democratic bias (meaning they overestimated Democrats and underestimated Republicans) of 3.0 percentage points. And that after a 2013-14 cycle when polls also had a Democratic bias (of 2.7 percentage points).
|Cycle||Governor||U.S. Senate||U.S. House||Pres. General||Combined|
In 2017-19, however, polls had essentially no partisan bias, and to the extent there was one, it was a very slight bias toward Republicans (0.3 percentage points). And that’s been the long-term pattern: Whatever bias there is in one batch of election polls doesn’t tend to persist from one cycle to the next. The Republican bias in the polls in 2011-12, for instance, which tended to underestimate then-President Obama’s re-election margins, was followed by two cycles of Democratic bias in 2013-14 and 2015-16, as previously mentioned. There is simply not much point in trying to guess the direction of poll bias ahead of time; if anything, it often seems to go against what the conventional wisdom expects. Instead, you should always be prepared for the possibility of systematic polling errors of several percentage points in either direction.
Which pollsters have been most accurate in recent elections?
Although it can be dangerous to put too much stock in the performance of a pollster in a single election cycle — it takes dozens of polls to reliably assess a pollster’s accuracy — it’s nonetheless worth briefly remarking on the recent performance of some of the more prolific ones. Below, you’ll find the average error, statistical bias and a calculation we call Advanced Plus-Minus (basically, how the pollster’s average error compares to other pollsters’ in the same election),6 for pollsters with at least five polls in our database for the 2017-19 cycle. Note that negative Advanced Plus-Minus scores are good; they indicate that a firm’s polls were more accurate than others in the same races.
|Pollster||Methodology||No. of Polls||Avg. Error||Bias||Adv. Plus-Minus|
|ABC News/Washington Post||Live||5||1.7||R+0.9||-4.1|
|Mason-Dixon Polling & Research Inc.||Live||7||2.8||R+1.0||-3.0|
|Mitchell Research & Communications||IVR/Online||6||2.5||R+0.9||-2.0|
|Siena College/New York Times Upshot||Live||47||3.6||R+1.3||-1.7|
|Harris Insights & Analytics||Online||34||3.7||R+0.2||-0.2|
|Vox Populi Polling||IVR/Online||7||4.5||D+3.6||+0.0|
|St. Pete Polls||IVR||10||2.3||D+1.7||+0.0|
|Fox News/Anderson Robbins Research/Shaw & Co. Research||Live||10||4.7||D+2.7||+0.0|
|Remington Research Group||IVR/Live||5||4.1||D+3.1||+0.3|
|JMC Analytics/Bold Blue Campaigns||Live||5||6.7||R+5.5||+0.9|
|Strategic Research Associates||Live||5||5||D+1.9||+1.0|
|Susquehanna Polling & Research Inc.||IVR/Live||6||8.6||D+8.0||+1.4|
|Rasmussen Reports/Pulse Opinion Research||IVR/Online||5||6.1||R+5.8||+3.2|
Four of the top 5 and 6 of the 10 best pollsters according to this metric were exclusively live-caller telephone polls. In exciting news for fans of innovative polling, the list includes polls from our friends at The New York Times’s Upshot, who launched an extremely successful and accurate polling collaboration with Siena College in 2016. (It also includes ABC News, FiveThirtyEight’s corporate parent, which usually conducts its polls jointly with The Washington Post.)
Conversely, the five of the top six worst-performing pollsters — including firms such as Carroll Strategies, Dixie Strategies, and Rasmussen Reports/Pulse Opinion Research — were IVR pollsters (sometimes in conjunction with other methods), several of which had strong Republican leans in 2017-19. Some IVR pollsters did perform reasonably well in 2015-16, a cycle where most pollsters underestimated Republicans. In retrospect, though, that may have been a case of two wrongs making a right; IVR polls tend to be Republican-leaning, so they’ll look good in years where Republicans beat their polls, but they’ll often be among the worst polls otherwise.
Indeed, aggregating the pollsters by methodology confirms that live caller polls continue to be the most accurate. Below are the aggregate scores for the three major categories of polls — live caller, online, and IVR — by our Advanced Plus-Minus metric, average error and statistical bias.7
|Methodology||No. of Polls||Avg. Error||Bias||Adv. Plus-Minus|
|Live caller w/cell||356||4.9||R+0.5||-0.3|
|Live caller w/cell only||210||4.4||R+0.2||-0.8|
|Live caller w/cell hybrid||146||5.5||R+0.9||+0.4|
|Online or text||358||5||R+0.4||+0.2|
|Online or text only||154||5||D+0.4||+0.5|
|Online or text hybrid||204||5||R+0.8||+0.1|
The differences are clearest when looking at pollsters that exclusively used one method. Polls that exclusively used live callers (including calling cellphones) had an average error of 4.4 percentage points in the 2017-19 cycle, as compared to 5.0 points for polls exclusively conducted online or via text message, and 6.9 points for polls that exclusively used IVR. (Pure IVR polls, however, are now quite rare. Polls that used a hybrid of IVR and other methods did better, with an average error of 5.0 percentage points.)
Polling firms that are members of professional polling organizations that push for transparency and other best practices also continue to outperform those that aren’t. In particular, our pollster ratings give credit to firms that support the American Association for Public Opinion Research (AAPOR) Transparency Initiative, belong to the National Council on Public Polls (NCPP), or contribute data to the Roper Center archive. Pollsters that are part of one or more of these initiatives had an average error of 4.3 percentage points in the 2017-19 cycle, as compared to 5.4 percentage points for those that aren’t.
Another way to detect herding
Our pollster ratings have also long included an adjustment to account for the fact that online and automated polls tend to perform better when there are high-quality polls in the field. We’ve confirmed that this still applies. For instance, polls that are conducted online or via IVR8 are about 0.4 percentage points more accurate based on our Advanced Plus-Minus metric when their polls are preceded by “gold standard” polls in the same race. (“Gold standard” is the term we use for pollsters that are exclusively live caller with cellphones and are also AAPOR/NCPP/Roper members.) Live-caller polls do not exhibit the same pattern, however; their Advanced Plus-Minus score is unaffected by the existence of an earlier “gold standard” poll in the field. This is probably the result of herding; some of the lower-quality pollsters may be doing the equivalent of peeking at their more studious classmate’s answers in a math test. In fact, these differences are especially strong in recent elections, suggesting that herding has become more of a problem.
There is also a second, more direct method to detect herding, which we’re also now applying in our pollster ratings. Namely — as described in this story — there is a minimum distance that a poll should be from the average of previous polls based on sampling error alone. For instance, even if you knew that a candidate was ahead 48-41 in a particular race — a 7-point lead — you’d miss that margin by an average of about 5 percentage points in a 600-person poll because sampling only 600 people rather than the entire population introduces sampling error. That is, because of sampling error, some polls would inevitably show a 12-point lead and some would show a 2-point lead instead of all the polls being bunched together at a 6- or 7- or 8-point lead exactly. If the polls are very tightly bunched together, this is not a good thing — you should be suspicious of herding, which can sometimes yield embarrassing outcomes where every poll gets the answer wrong
Of course, there are other complications in the real world. There’s no guarantee that the race will have been static since other pollsters surveyed the race; one candidate may be losing or gaining ground. And pollsters have healthy methodological disagreements from one another, so the same race may look different depending on what assumptions they make about turnout and so forth. But these should tend to increase the degree to which polls differ from each other, and not produce herding.
But our herding penalty only applies if pollsters show too little variation from the average of previous polls of the race9 based on sampling error10 alone. If a pollster is publishing all its data without being influenced by other pollsters — including its supposed outliers — it should be fairly easy to avoid this penalty over the long run.
Many polls are closer to the average of previous polls than they “should” be, however. Unlike the previous type of herding I described, which is concentrated among lower-quality pollsters who are essentially trying to draft off their neighbors to get better results, this tendency appears among some higher-quality pollsters as well. In some cases, we suspect, this is because, late in the race, a pollster doesn’t want to deal with the media firestorm that would inevitably ensue if it published a poll that appears to be an outlier. In other cases, frankly, we suspect that pollsters rather explicitly look at the FiveThirtyEight or RealClearPolitics polling average and attempt to match it.
In any event, our formula now detects this type of herding, and it results in a lower pollster rating when we catch it.11. Our pollster ratings spreadsheet now calculates each pollster’s Average Distance from Polling Average, or ADPA, which is how much the pollster’s average poll differs from the average of previous polls of that race.12 Among pollsters with at least 15 polls,13 the largest herding penalties are as follows:
|Angus Reid Global||0.82|
|NBC News/Wall Street Journal||0.53|
Other methodological changes
Unless you’re really into details — or you’re a pollster! — you probably aren’t going to care about these … but there are a few other methodological changes we’ve made to our pollster ratings this year.
- Previously, pollsters got a bonus if they exclusively conducted their polls via live callers with cellphones, since these have been the most accurate polls over time. But this year, if a pollster uses live-caller-with-cellphone polls in combination with other methodologies, we now give them partial credit for the live-caller bonus. Even though these hybrid polls did not have a particularly good performance in 2017-19, they’ve been reasonably strong in the long run; also, we’re bowing to the reality that many formerly live pollsters are increasingly incorporating online or other methods into their repertoire.
- In determining whether a poll’s result fell into or outside the margin of error, a calculation that’s available in our spreadsheet, we now use a more sophisticated margin of error formula that accounts for the percentages of the top two candidates and not just the distance between them. The margin of error is smaller in lopsided races, e.g., when one candidate leads 70-20.
- Our Predictive Plus-Minus scores and pollster letter grades are based on a combination of a pollster’s empirical performance (how accurate it has been in the past) and its methodological characteristics. The more polls a firm has conducted, the more the formula weights its performance rather than its methodological prior. In assigning the weights, our formula now considers how recent a particular firm’s polls were. In other words, if a pollster has conducted a lot of surveys recently, its empirical accuracy will be more heavily weighted. But if most of its polling is in the distant past, its pollster rating will gradually revert toward the mean based on its methodology.
- For pollsters with a relatively small sample of polling, we now show a provisional rating rather than a precise letter grade. (An “A/B” provisional rating means that the pollster has shown strong initial results, a “B/C” rating means it has average initial results, and a “C/D” rating means below-average initial results.) It now takes roughly 20 recent polls (or a larger number of older polls) for a pollster to get a precise pollster rating.
That’s all for now! Once again, you can find an interactive version of the pollster ratings here, and a link with further detail on them here. And if you have questions about the pollster ratings, you can always reach us here. Good luck to pollsters on having a strong performance in the primaries.