We Analyzed 40 Years Of Primary Polls. Even Early On, They’re Fairly Predictive.

Over the past few weeks, FiveThirtyEight has explored who led in early primary polls of presidential cycles from 1972 to 2016 and who went on to win the nomination. And what we’ve seen is that national surveys conducted in the year before a presidential primary are relatively good indicators of which candidates will advance to the general election, especially when polling averages are adjusted to reflect how well known each candidate was. Now, in the third and final part of our series, we are going to analyze 40-plus years of polls to better understand their predictive power.

There are a number of ways to tackle this question, but one relatively easy way to see how predictive early polls are is to compare a candidate’s polling average¹ to their eventual share of the national primary vote. And we found that as a candidate’s polling average increased, their vote share in the primaries also tended to increase. In the chart below, for the calendar year before the primaries began, we averaged each candidate’s polls in the first half of the year (January through June) and in the second half of the year (July through December), and then plotted those two averages against the share of votes each person won in the next year’s primaries, for every competitive nomination process from 1972 to 2016. The correlation is pretty strong for both halves of the year,² though polls from the second half of the year matched the outcomes a little better, which is not surprising — after all, those polls were conducted closer to the start of primary season.

But it’s easier to see trends if we group some candidates together rather than looking at them all individually, so let’s sort candidates into six big buckets based on their polling average. That clearly shows us that candidates with higher polling averages were also more likely to win higher shares of the primary vote and, therefore, the nomination. Those polling at 35 percent or higher rarely lost the nomination, regardless of whether they attained those heights in the first or second half of the year. They also, on average, won more than half the national primary vote. But those polling below 20 percent in either the first half or second half of the year had at best a 1-in-10 chance of clinching the nomination, and they rarely won a sizable chunk of the popular vote.

High polling averages foreshadowed lots of primary votes

Candidates’ share of the national primary vote by average polling level in the first half of the year before the presidential primaries and polling average in the second half of that year, 1972-2016

	First half		Second half
Poll Avg.	Share who became nominee	Avg. Primary Vote share	Share who became nominee	Avg. Primary Vote share
35%+	75%	57%	83%	57%
20%-35%	36	27	25	25
10%-20%	9	8	9	12
5%-10%	3	7	10	10
2%-5%	5	5	0	4
Under 2%	1	2	1	1

We can also take these polling averages and estimate the probability of a candidate winning a party’s nomination using a logistic regression. And as you can see, candidates polling above 20 percent — whether it’s in the first half of the year (the orange line) or the second half (purple line) — have a higher probability of winning the nomination. In fact, the results for the first and second half of the year are nearly identical — in the second half of the year, candidates with the same polling average had a slightly lower win probability, but we’re talking about a maximum difference of less than 4 percentage points.³ There are certainly more sophisticated ways one could look at this data, but even these simple methods can show that polls conducted this far out in the primary season still have a reasonable amount of predictive power.

We can go a step further and improve our analysis by accounting for a candidate’s level of name recognition.⁴ In previous installments of this series, we rated candidates’ fame on a five-tier scale,⁵ and this time we’re using those previous rankings to split up our polling data into two roughly equal groups — candidates with high name recognition⁶ and those with low name recognition.⁷ This gives us a broader understanding of whether being well known influenced a candidate’s chances of winning the nomination. (We also limited this part of our analysis to just the first half of the year to see what role name recognition played very early in the cycle.)

And as you can see, well-known candidates who polled in the double digits tended to win a higher share of the primary vote. But candidates who had high name recognition while only polling in the single digits were generally in trouble. Of the 84 highly recognized candidates who polled below 10 percent in surveys from the first half of the year before the primaries, only President Trump went on to win his party’s nomination. And Trump was an unusual case — Republicans started out with strongly negative views of him but quickly changed their tune even though they were already familiar with him. Meanwhile, candidates with lower name recognition in the first half of the year only occasionally advanced to the general election, and in each case, it was on the Democratic side — George McGovern in 1972, Jimmy Carter in 1976, Michael Dukakis in 1988 and Bill Clinton in 1992.

Name recognition makes a big difference

Candidates’ share of the national primary vote by average polling level in the first half of the year before the presidential primaries and whether they had high or low name recognition, 1972-2016

	High name recognition		Low name recognition
Poll Avg.	Share who became nominee	Avg. Primary Vote share	Share who became nominee	Avg. Primary Vote share
35%+	75%	57%	—	—
20%-35%	36	27	—	—
10%-20%	9	8	—	—
5%-10%	0	4	14%	19%
2%-5%	5	3	5	6
Under 2%	0	0	2	2

In fact, we can use a logistic regression to estimate a high- and low-name-recognition candidate’s chance of winning the nomination based on their polling average (much like we did above, but last time we didn’t sort candidates into categories based on name recognition). And as you can see in the chart below, a low-name-recognition candidate didn’t stand much of a chance of winning unless they were able to climb past 10 percent in the polls in the first half of the year before the primaries. If they were able to hit that mark, then their odds of winning were slightly less than 1 in 4, which put them ahead of a high-name-recognition candidate polling at the same level.

Intuitively, this makes sense — relatively few unknown candidates could poll as high as 10 percent this far out in the election cycle. But for those who could get that much support even though only a small share of people knew about them, their polling numbers signaled a great deal of potential. Take Dukakis in the 1988 cycle: His polling average was about 8 percent in the first half of 1987, and we estimated that his average name recognition was somewhere around 20 percent. Not a bad polling average when you consider that most respondents didn’t know who he was.

In other words, a candidate’s adjusted polling average — polling average divided by name recognition, which we delved into at length in the first two parts of this series — is a decent proxy for teasing out the strength of a candidate, especially early in the election cycle. By accounting for how well known a candidate is, we can get a better read on the field in front of us, including here in the 2020 election cycle. As primary season draws nearer, we’ll be keeping an eye on any candidates with low name recognition who still manage to win a significant chunk of support in the polls.

FiveThirtyEight’s 2020 draft: Episode 2

Footnotes

Candidates were counted as having zero percent support in any poll they did not appear in. This lowers a candidate’s polling average, but it’s a way of factoring in the uncertainty about who will actually run. This analysis looks at everyone mentioned in these early polls, regardless of whether they went on to officially enter the race, so penalizing those who were asked about infrequently reflects the fact that they were probably viewed as less likely to run.
The correlation was 0.70 for the first half of the year and 0.80 for the second. Anyone who didn’t get votes in the primary was listed as a zero, even if they dropped out before the primaries or never officially entered the race. Removing people who got no votes from the analysis produced a similar result, with a correlation of 0.67 for the first half and 0.77 for the second.
Some of this difference stems from the eventual nominees polling higher in the second half, when they’d started gaining momentum, than they did in the first half. Additionally, for the 1992 election cycle, there are no Republican polls in the first half of 1991 because it looked unlikely at first that President George H.W. Bush would face a primary challenger. By the time pollsters began asking about the GOP race, Bush was already was polling above 50 percent.
We used two types of polls to estimate this: polls that asked if respondents had heard of a candidate and polls that asked if respondents had a favorable or unfavorable opinion about a candidate — the number of people who had any opinion in a favorability poll was used as a proxy for the number who recognized the candidate’s name. These estimates are inherently rough, in part because pollsters aren’t consistent when they ask about how well known a candidate is.
Candidates were classed as having 20, 40, 60, 80 or 100 percent name recognition based on polling averages and some subjective adjustments.
60, 80 or 100 percent on our five-tier scale.
20 or 40 percent on our five-tier scale.

FiveThirtyEight’s 2020 draft: Episode 2

Footnotes

Comments