Want these election updates emailed to you right when they’re published? Sign up here.
As I wrote last week, Hillary Clinton is probably going to become the next president. But there’s an awful lot of room to debate what “probably” means.
FiveThirtyEight’s polls-only model puts Clinton’s chances at 85 percent, while our polls-plus model has her at 83 percent. Those odds have been pretty steady over the past week or two, although if you squint you can see the race tightening just the slightest bit, with Clinton’s popular vote lead at 6.2 percentage points as compared to 7.1 points a week earlier. Still, she wouldn’t seem to have a lot to complain about.
Other statistical models are yet more confident in Clinton, however, variously putting her chances at 92 percent to 99 percent. Maybe that doesn’t seem like a big difference, since people (wrongly) tend to perceive odds above 80 percent as sure things. But flip those numbers around, and instead of Clinton’s chances, consider Donald Trump’s. The New York Times’s Upshot model gives Trump an 8 percent chance of winning the election. Our models say a Trump presidency is about twice a likely as The Upshot does, putting his chances at 15 percent (polls-only) and 17 percent (polls-plus). And our models think Trump is about four times as likely to win the presidency as the Huffington Post Pollster model, which puts his chances at 4 percent.
So let me explain why our forecast is a bit more conservative than some of the others you might be seeing — and why you shouldn’t give up if you’re a Trump supporter, or assume you have it in the bag if you’re voting for Clinton. We’ve touched on each of these points before, but it’s nice to have them in one place. I’ll also show you what probability our model would give to Trump and Clinton if we changed some of these assumptions.
Assumption No. 1: The high number of undecided and third-party voters indicates greater uncertainty.
Historically, there’s been a strong correlation between the number of undecided and third-party voters, and polling volatility. It also makes sense intuitively. You can think of an election as having two constraints: Candidates keep campaigning until they run out of time (Election Day), or until they run out of voters to persuade (undecideds). While the candidates are almost out of time this year, the number of undecideds is still fairly high (although it’s decreasing). In national polls, Clinton and Trump together have approximately 85 percent of the vote, while Mitt Romney and Barack Obama had about 95 percent of the vote at this time four years ago.
What if we changed this assumption? If we tweaked our model so that it only considered the number of days left until the election when calculating uncertainty, Trump’s chances would decline to 10 percent, while Clinton’s would rise to 90 percent. Most other models do not consider the number of undecided voters, so this factor explains some of the differences between FiveThirtyEight’s model and those that have Clinton’s win probability in the 90s.
Assumption No. 2: The FiveThirtyEight model is calibrated based on general elections since 1972.
Why use 1972 as the starting point? It happens to make for a logical breakpoint because 1972 marked the start of the modern primary era, when nominees were chosen in a series of caucuses and primaries instead of by party elders.
But that’s not why we start at 1972. Instead, the reason is much simpler: That’s when we begin to see a significant number of state polls crop up in our database. Since our model is based on a combination of state and national polls, we can’t get a lot of utility out of years before that. On the flip side, since elections suffer from inherently small sample sizes (this is just the 12th election since 1972), we think it’s probably a mistake to throw any of the older data out.
What if we changed this assumption? If we calibrated the model based on presidential elections since 2000 only — which have featured largely accurate polling — Clinton’s chances would rise to 95 percent, and Trump’s would fall to 5 percent.
But we think that would probably be a mistake. It’s becoming more challenging to conduct polls as response rates decline. The polls’ performance in the most recent U.S. elections — the 2014 midterms and the 2016 presidential primaries — was middling. There have also been recent, significant polling errors in democracies elsewhere around the world, such as Israel and the United Kingdom. It may be naive to expect the pinpoint precision we saw in polls of presidential elections from 2000 through 2012 — a sample of just four elections — to represent the “new normal.” Going back to 1972 takes advantage of all the data we have, and includes years such as 1980 when there were significant late polling errors.
Assumption No. 3: The FiveThirtyEight model uses a t-distribution with “fat tails,” which gives a greater likelihood of rare events.
I’ll go through this one quickly, because it makes relatively little difference. Instead of a normal distribution — what you might know as a bell curve — our model uses a student’s t-distribution instead. The t-distribution has fatter tails (think of them as big lips flaring out on the bell).
This mostly makes a difference for very low-probability events. For example, for an event that a normal distribution regarded as a 1-in-1,000 chance, our t-distribution would assign odds of 1-in-180 instead, making it about six times as likely. A t-distribution is appropriate in cases like presidential elections where you have small sample sizes.
What if we changed this assumption? It wouldn’t matter that much. If we used normal distributions instead of t-distributions, Clinton’s chances in polls-only would rise to 87 percent from 85 percent, while Trump’s would fall to 13 percent. This assumption will matter more if Clinton gains further ground; it would make it harder and harder for her to keep gaining in the forecast once she gets up into the mid- to high 90s.
Assumption No. 4: State outcomes are highly correlated with one another, so polling errors in one state are likely to be replicated in other, similar states.
In 2012, Obama beat his polling by 2 or 3 percentage points in almost every swing state. The same was true in 1980 when Ronald Reagan won in a landslide — instead of the modest lead that polls showed a few days before the election — and claimed 489 electoral votes by winning almost every competitive state. You also frequently see this in midterms — Republicans beat their polling in almost every key Senate and gubernatorial race in 2014, for example.
Basically, this means that you shouldn’t count on states to behave independently of one another, especially if they’re demographically similar. If Clinton loses Pennsylvania despite having a big lead in the polls there, for instance, she might also have problems in Michigan, North Carolina and other swing states. What seems like an impregnable firewall in the Electoral College may begin to collapse.
What if we changed this assumption? If we assumed that states had the same overall error as in the FiveThirtyEight polls-only model but that the error in each state was independent, Clinton’s chances would be … 99.8 percent, and Trump’s chances just 0.2 percent. So assumptions about the correlation between states make a huge difference. Most other models also assume that state-by-state outcomes are correlated to some degree, but based on their probability distributions, FiveThirtyEight’s seem to be more emphatic about this assumption, accounting for both the possibility a significant national polling error and other types of correlations, such as between states in different regions.
As we say frequently, the greater uncertainty in the FiveThirtyEight forecast cuts both ways. So while we show a greater likelihood of a Trump win than most other models, we’d also assign a greater possibility to a Clinton landslide, in which she wins the election by double digits. But while the campaign is almost over, the suspense isn’t quite done.