Our Election Forecast Didn’t Say What I Thought It Would

My editors are forever asking me to take the long Twitter threads I write and turn them into articles here at FiveThirtyEight. So I’m actually going to give that a try!

What follows are some follow-up thoughts on our election model, which was originally composed in the form of a V E R Y L O N G tweetstorm that I never published. (See if you can guess where the 240-character breaks would have been.)

In this thread … err, article … I’ll try to walk you through my thought process on a few elements of our model and respond to a few thoughtful critiques I’ve seen elsewhere. Before you dive in, it may help to read our summary of the state of the race, or at least skim our very detailed methodology guide.

But the basic starting point for a probabilistic, poll-driven model ought to be this: Is polling in August a highly reliable way to predict the outcome in November?

The short answer is “no.”

Polling in August is somewhat predictive. You’d much rather be ahead than behind. But there can still be some very wild swings.

You can see that in the daily threads that Nathaniel Rakich, one of our elections analysts, puts together. Here is what a national polling average would have looked like in elections dating back to 1976:

The @FiveThirtyEight nat'l polling average with 84 days until E-Day:

2020: Biden+8.3
2016: Clinton+6.6
2012: Obama+0.5
2008: Obama+2.6
2004: Kerry+2.5
2000: Bush+10.0
1996: Clinton+11.3
1992: Clinton+20.1
1988: Dukakis+5.6
1984: Reagan+16.0
1980: Reagan+22.1
1976: Carter+26.6

— Nathaniel Rakich (@baseballot) August 11, 2020

OK, I cheated a bit. I’m using a version that Nathaniel published last week, partly because this was the exact moment in the campaign when Michael Dukakis, the 1988 Democratic nominee, started to blow his large lead, which he never regained. Still, there’s some wild stuff there! John Kerry led at this point in 2004. George W. Bush had a 10-point lead at this point in the 2000 race, but, as we know, he didn’t win the popular vote that year. In other cases, the leading candidate won, but the margin was off by as much as 20 points (Jimmy Carter in 1976).

Now, as I wrote last week, there are some caveats here. Several of these polling averages were taken while one or both candidates were experiencing convention bonuses, and although there are ways to correct for those, every time you correct for something so your model fits the past data better, you raise the possibility that you’re overfitting the data and that your model won’t be as accurate as claimed when applied to situations where you don’t already know the outcome.

There are also decent arguments that polling averages have become more stable in recent years. In that case, the wild fluctuations in the polls from, say, 1976 or 1988 might not be as relevant.

Our model actually agrees with these theories, up to a point! The fact that voters are more polarized now (more polarization means fewer swing voters, which means less volatility) is encoded into our model as part of our “uncertainty index,” for instance.

But we think it’s pretty dangerous to go all in on these theories and assume that poll volatility is necessarily much lower than it was before. For one thing, the theory is not based on a ton of data. Take the five most recent elections, for instance. The 2004¹ and 2012 elections featured highly stable polling — 2012 especially so. But 2000 and 2016 (!) did not, and 2008 election polling was not especially stable, either. Small sample sizes are already an issue in election forecasting, so it seems risky to come to too many firm conclusions about polling volatility based on what amounts to two or three examples.

Meanwhile, other people have pointed out that the most recent two presidents, Trump and Barack Obama, have had highly stable approval ratings. But the president just before them, George W. Bush, did not. His approval rating went through some of the wildest fluctuations ever, in fact, even though polarization was also fairly high from 2000 to 2008.

That said, polls have been stable so far this year. Indeed, that’s another factor that our uncertainty index accounts for. But don’t get too carried away extrapolating from this stability. Case in point: Polls were extremely stable throughout most of the Democratic primaries … but when the voting started, we saw huge swings from the Iowa caucuses through Super Tuesday. Poll volatility tends to predict future volatility, but only up to a point.

Remember, too, that voters haven’t yet been exposed to the traditional set pieces of the campaign, namely the conventions and the debates, which are often associated with higher volatility.

Now, suppose that despite all the weirdness to come in the general election campaign, Biden just plows through, leads by 6 to 9 points the whole way … and then wins by that amount on Nov. 3? If that happens, then we’ve got more evidence for the hypothesis that elections have become more stable, even when voters are confronted with a lot of surprising news.

But, crucially, we don’t have that evidence yet. So some of the models that are more confident in Biden’s chances seem to be begging the question, presuming that polls will remain stable when I’m not sure we can say that yet.

Then there’s the issue of COVID-19. Sometimes — though people may not say this outright — you’ll get a sense that critics think it’s sort of cheating for a model to account for COVID-19 because it’s never happened before, so it’s too ad hoc to adjust for it now.

I don’t really agree. Models should reflect the real world, and COVID-19 is a big part of the real world in 2020. Given the choice between mild ad-hockery and ignoring COVID-19 entirely, I think mild ad-hockery is better.

However, I also think there are good ways to account for COVID-19 without being particularly ad hoc about it. If you’re designing a model, whenever you encounter an outlier or an edge case or a new complication, the question you ask yourself should be, “What lessons can I draw from this that generalize well?” That is: Are there things you can do to handle the edge case well that will also make your model more robust overall?

As an aside, when testing models on historical data I think people should pay a lot of attention to edge cases and outliers. For instance, I pay a lot of attention to how our model is handling Washington, D.C. Why Washington? Well, if you take certain shortcuts — don’t account for the fact that vote shares are constrained between 0 and 100 percent of the vote — you might wind up with impossible results, like Biden winning 105 percent of the vote there. Or when designing an NBA model, I may pay a lot of attention to a player like Russell Westbrook, who has long caused issues for statistical systems. I don’t like taking shortcuts in models; I think they come back to bite you later in ways you don’t necessarily anticipate. But if you can handle the outliers well, you’ve probably built a mathematically elegant model that works well under ordinary circumstances, too.

But back to COVID-19: What this pandemic encouraged us to do was to think even more deeply about the sources of uncertainty in our forecast. That led to the development of the aforementioned uncertainty index, which has eight components (described in more depth in our methodology post):

The number of undecided voters in national polls. More undecided voters means more uncertainty.
The number of undecided plus third-party voters in national polls. More third-party voters means more uncertainty.
Polarization, as measured elsewhere in the model, which is based on how far apart the parties are in roll call votes cast in the U.S. House. More polarization means less uncertainty since there are fewer swing voters.
The volatility of the national polling average. Volatility tends to predict itself, so a stable polling average tends to remain stable.
The overall volume of national polling. More polling means less uncertainty.
The magnitude of the difference between the polling-based national snapshot and the fundamentals forecast. A wider gap means more uncertainty.
The standard deviation of the component variables used in the FiveThirtyEight economic index. More economic volatility means more overall uncertainty in the forecast.
The volume of major news, as measured by the number of full-width New York Times headlines in the past 500 days, with more recent days weighted more heavily. More news means more uncertainty.

Previous versions of our model had basically just accounted for factors 1 and 2 (undecided and third-party voters), so there are quite a few new factors here. And indeed, factors 7 and 8 are very high thanks to COVID-19 and, therefore, boost our uncertainty measure. However, we’re also considering several factors for the first time (like polarization and poll volatility) that reduce uncertainty.

[newsletter-politics]

In the end, though, our model isn’t even saying that the uncertainty is especially high this year. The uncertainty index would have been considerably higher in 1980, for instance. Rather, this year’s uncertainty is about average, which means that the historical accuracy of polls in past campaigns is a reasonably good guide to how accurate they are this year. That seems to me like a pretty good gut check.

It might seem counterintuitive that uncertainty would be about average in such a weird year, but accounting for multiple types of uncertainty means that some can work to balance each other out. We don’t have a large sample of elections to begin with; depending on how you count, somewhere between 10 and 15 past presidential races had reasonably frequent polling. So your default position might be that you should use all of that data to calibrate your estimates of uncertainty, rather than to try to predict under which conditions polls might be more or less reliable. If you are going to try to fine-tune your margin of error, though, then we think you need to be pretty exhaustive about thinking through sources of uncertainty. Accounting for greater polarization but not the additional disruptions brought about by the pandemic would be a mistake, we think; likewise, so would be considering the pandemic but not accounting for polarization.

I’ve also seen some objections to the particular variables we’ve included in the uncertainty index. For instance, not everybody likes that our way of specifying “the volume of major news” is based on New York Times headlines. I agree that this isn’t ideal. The New York Times takes its headlining choices very seriously, but as we learned from thumbing through years of its headlines, it also makes some idiosyncratic choices.

However, I don’t think anybody would say there hasn’t been a ton of important news this year, much of which could continue to reverberate later in the race. Nor should people doubt that poll volatility is often news-driven. Polls generally don’t move on their own, but rather in response to major political events (such as debates) and news events (such as wars starting or ending). Even before COVID-19, we were trying to incorporate some of this logic into our polling averages by, for instance, having them move more aggressively after debates.

Other people have suggested that we ought to have accounted for incumbency in the uncertainty index, on the theory that when incumbents are running for reelection, they are known commodities, which should reduce volatility. That’s a smart suggestion, and something I wish I’d thought to look at, although after taking a very brief glance at it now, I’m not sure how much it would have mattered. The 1980 and 1992 elections, which featured incumbents, were notably volatile, for instance.

So if it’s too soon to be all that confident that Biden will win based on the polls — not that a 71 percent of winning the Electoral College (and an 82 percent chance of winning the popular vote) are anything to sneeze at — is there anything else that might justify that confidence?

In our view, not really.

I’ll be briefer on these points, since we covered them at length in our introductory feature. But forecasts based on economic “fundamentals” — which have never been as accurate as claimed — are a mess this year. Depending on which variables you look at (gross domestic product or disposable income?) and over what time period (third quarter or second quarter?) you could predict anything from the most epic Biden landslide in the history of elections to a big Trump win.

Furthermore, FiveThirtyEight’s version of a fundamentals model actually shows the race as a tie — it expects the race to tighten given the high polarization and projected economic improvement between now and November. So although we don’t weigh the fundamentals all that much, they aren’t exactly a reason to be more confident in Biden.

What about Trump’s approval rating? It’s been poor for a long time, obviously. And some other models do use it as part of their fundamentals calculation. But I have trouble with that for two reasons. First, the idea behind the fundamentals is that they’re … well, fundamental, meaning they’re the underlying factors (like economic conditions and political polarization) that drive political outcomes. An approval rating, on the other hand, should really be the result of those conditions.

Second, especially against a well-known opponent like Biden, approval ratings are largely redundant with the polls. That is to say, if Trump’s net approval rating (favorable rating minus unfavorable rating) is -12 or -13 in polls of registered and likely voters, then his being down 8 or 9 points in head-to-head polls against Biden is pretty much exactly what you’d expect. (Empirically, though, the spread in approval ratings are a bit wider than the spreads in head-to-head polls. A candidate with a -20 approval rating, like Carter had at the end of the 1980 campaign, wouldn’t expect to lose the election by 20 points.)

Also, models that include a lot of highly correlated variables can have serious problems, and approval ratings and head-to-head polls are very highly correlated. I’m not saying you couldn’t work your way around these issues, but unless you were very careful, they could lead to underestimates of out-of-sample errors and other problems.

One last topic: the role of intuition when building an election model. To the largest extent possible, when I build election models, I try to do it “blindfolded,” by which I mean I make as many decisions as possible about the structure of the model before seeing what the model would say about the current year’s election. That’s not to say we don’t kick the tires on a few things at the end, but it’s pretty minimal, and it’s mostly to look at bugs and edge cases rather than to change our underlying assumptions. The process is designed to limit the role my priors play when building a model.

Sometimes, though, when we do our first real model run, the results come close to my intuition anyway. But this year they didn’t. I was pretty sure we’d have Biden with at least a 75 percent chance of winning and perhaps as high as a 90 percent chance. Instead, our initial tests had Biden with about a 70 percent chance, and he stayed there until we launched the model.

Why was my intuition wrong? I suspect because it was conditioned on recent elections where polls were fairly stable — and where the races were also mostly close, making Biden’s 8-point lead look humongous by comparison. If I had vividly remembered Dukakis blowing his big lead in 1988, when I was 10 years old, maybe my priors would have been different.

But as I said earlier, I’m not necessarily sure we can expect the polls to be quite so stable this time around. And when you actually check how accurate summer polling has been historically, it yields some pretty wide margins of error.

Footnotes

Despite Kerry leading in the polls for brief moments.

FiveThirtyEight

Our Election Forecast Didn’t Say What I Thought It Would

Why we factored in COVID-19, why we ignored approval ratings, and other deep cuts from our election model

Footnotes

Comments