We’re not big fans of national polls here at FiveThirtyEight. In the general election, they provide less information than state polls do, especially given that the presidency is determined by the Electoral College. But at least in the general election, everybody votes on the same day.1 Not so in the primaries, where the states vote sequentially. Furthermore, the rules vary substantially from state to state; in particular, some hold primaries and others have caucuses, which generally have much lower turnout. That makes it difficult to determine what a “likely voter” is in the context of a national poll.
So we’re not quite sure how much to read into national polls that show the Democratic race having tightened substantially. Our own national polling average has Hillary Clinton ahead of Bernie Sanders by 7.6 percentage points, essentially2 her narrowest margin of the campaign. That lead is down from 18.5 percentage points on Jan. 31, the day before Iowa voted.
Other polling averages show an even tighter race, with Clinton up by just a couple of percentage points. We could spend some time debating the “right” way to calculate a national polling average — given that there’s no national primary, our method is designed to be deliberate rather than rush to place a lot of weight on new polls. But no matter whose numbers you’re looking at, Sanders has gained on Clinton.
If Sanders has gained on Clinton, however, shouldn’t we also see evidence of that from the states that have voted so far? Other things held equal, we’d expect him to perform better in states voting in April than those voting in March, and better in March than in February.
Other things held equal is the tricky part. The Clinton-Sanders margin has varied massively from state to state, depending, among other things, on how many black voters a state has and whether it holds a primary or caucus. Sanders reeled off a string of wins in late March, for instance, but they were mostly in extremely white caucus states where we expected him to do well all along. (This is not just hindsight bias: “It’s possible that Bernie Sanders will win every state caucus from here on out,” I wrote on Feb. 20, after Sanders had just lost Nevada.)
We can attempt to account for this by means of regression analysis. I ran a regression to “retrodict” the results of each primary or caucus so far, where the inputs were the share of black voters, the share of Hispanic voters, how liberal or conservative the Democratic electorate was, and whether the state held a primary or caucus.3 This is essentially the same process I used to construct demographic benchmarks for each state back in February, although this time I’m using results rather than polls. I excluded Vermont and Arkansas from the analysis because the results could potentially have been affected by Sanders’s and Clinton’s ties to those states.
The thing to watch for is whether there’s any time trend in where Sanders tends to fall above or below his retrodiction. If he tends to beat his projections in the second half of the calendar (while Clinton beats hers in the first half), that would be a sign Sanders is gaining ground. If there’s no time trend after controlling for the other factors, that means we may just be fooled by the order in which the states happen to vote. Here’s what we get:
|DATE||STATE||RETRODICTION BASED ON DEMOGRAPHICS AND TYPE OF ELECTION||RESULT||DIFFERENCE ACTUAL VS. RETRODICTION|
|2/1||Iowa||Sanders +30||Clinton +0||Clinton +30|
|2/9||New Hampshire||Sanders +27||Sanders +22||Clinton +5|
|2/20||Nevada||Sanders +6||Clinton +5||Clinton +12|
|2/27||South Carolina||Clinton +44||Clinton +47||Clinton +4|
|3/1||Alabama||Clinton +56||Clinton +59||Clinton +3|
|Colorado||Sanders +20||Sanders +19||Clinton +2|
|Georgia||Clinton +50||Clinton +43||Sanders +7|
|Massachusetts||Sanders +11||Clinton +1||Clinton +12|
|Minnesota||Sanders +39||Sanders +23||Clinton +15|
|Oklahoma||Clinton +7||Sanders +10||Sanders +17|
|Tennessee||Clinton +8||Clinton +34||Clinton +26|
|Texas||Clinton +38||Clinton +32||Sanders +6|
|Virginia||Clinton +27||Clinton +29||Clinton +3|
|3/5||Kansas||Sanders +29||Sanders +36||Sanders +6|
|Louisiana||Clinton +51||Clinton +48||Sanders +3|
|Nebraska||Sanders +29||Sanders +14||Clinton +14|
|3/6||Maine||Sanders +40||Sanders +29||Clinton +11|
|3/8||Michigan||Clinton +5||Sanders +1||Sanders +6|
|Mississippi||Clinton +61||Clinton +66||Clinton +5|
|3/15||Florida||Clinton +22||Clinton +31||Clinton +9|
|Illinois||Clinton +21||Clinton +2||Sanders +19|
|Missouri||Clinton +11||Clinton +0||Sanders +10|
|North Carolina||Clinton +25||Clinton +14||Sanders +12|
|Ohio||Clinton +8||Clinton +14||Clinton +5|
|3/22||Arizona||Clinton +6||Clinton +15||Clinton +9|
|Idaho||Sanders +33||Sanders +57||Sanders +24|
|Utah||Sanders +33||Sanders +59||Sanders +26|
|3/26||Alaska||Sanders +31||Sanders +59||Sanders +28|
|Hawaii||Sanders +31||Sanders +40||Sanders +9|
|Washington||Sanders +37||Sanders +46||Sanders +9|
|4/5||Wisconsin||Sanders +6||Sanders +13||Sanders +7|
|4/9||Wyoming||Sanders +29||Sanders +11||Clinton +17|
There does seem to be a time trend in Sanders’s favor. In retrospect, given how poorly she’s fared in caucus states elsewhere, Clinton’s narrow wins in the Nevada and (especially) the Iowa caucuses look impressive. However, those wins came early in the calendar, and there’s room to question whether she’d have gotten the same results if those elections were held today. Clinton also beats her retrodiction in most Super Tuesday states, although Oklahoma is a major exception.
Since his upset win in Michigan (and his loss in Mississippi) on March 8, however, Sanders has beaten his retrodiction in nine of 13 states. His gigantic margins in states such as Washington and Idaho are impressive by the model’s standards, even accounting for the fact that we’d have expected those states to be strong for him to begin with. The model is also impressed by his narrow losses in Illinois and Missouri, which were closer than the retrodiction would have expected based on their demographics. He also won Wisconsin by a slightly larger-than-expected margin.
Not all the news has been good for Sanders, however. He doesn’t have a great excuse for Clinton’s large margins of victory in Ohio, Florida and Arizona. And in the most recent Democratic contest, the Wyoming caucuses on April 9, Sanders only narrowly beat Clinton when the model expected a blowout.
Overall, the time trend is statistically significant,4 although some caution is in order. The states haven’t voted in random order, but instead in clusters — for instance, there was a cluster of demographically similar Southern states to vote on March 1, and a cluster of Western caucus states to vote March 22 and March 26. Under conditions like these, statistical significance tests can exaggerate the certainty of an effect because we don’t have as many independent observations as we think; as Alabama goes, so goes Georgia, probably, especially if they both vote on the same day. (See Slate Star Codex for more about this.)
The combination of this regression analysis with the national polls, however, provides a reasonably persuasive case that Sanders had gained ground on Clinton. The magnitude of the effect is about the same in both cases. According to the regression, Sanders underperformed his retrodiction by an average of 6 percentage points in states through March 1, but has beaten them by an average of 5 percentage points since. That would represent an 11-point net swing to Sanders, closely matching his gain in our national polling average.
Listen to the latest episode of the FiveThirtyEight politics podcast.
That Sanders has made gains so far doesn’t necessarily mean he’ll continue to do so, however. This deserves a longer discussion, but to a first approximation a well-calibrated polling average is a random walk. That means Sanders is about equally likely to give back ground to Clinton in national polls as to continue gaining on her.
That matters, because unless Sanders continues gaining ground, it will wind up being too little, too late for him. The model — without accounting for a time trend — expects that Sanders would lose a demographically average primary state by 12 percentage points to Clinton. Caucuses are another matter: The model would have Sanders winning the same state by 7 percentage points if it held a caucus instead of a primary, but there are almost no caucuses left.
To be more specific, the model has Sanders losing New York by 7 percentage points — actually a bit better than his polling average there, although New York’s closed primary and strict voter registration requirements could hurt him.5 There are five states voting next week, and it has Sanders winning Rhode Island by 6 percentage points but losing Connecticut and Pennsylvania by 4 percentage points each, Delaware by 13 percentage points, and Maryland by 22 percentage points. And it has him losing California, which votes June 7, by 11 percentage points. Even if he beats those projections by several points, it might not be enough because Sanders needs to be winning almost every state from here on out to catch Clinton in pledged delegates.
Check out our live coverage of the New York primary elections.