Pretty much every time that we issue a House forecast — like last night, for instance, when we projected that Republicans’ gains were most likely to be on the order of 52 seats — we point out that the forecast has a lot of uncertainty in it.
I realize that these reminders can seem pedantic and abstract — or that we can seem to be simply covering our butts in case something goes wrong. So in this post I want to put a little flesh on the caveats.
Here is the basic way our models work. First, we have a formula that estimates what we think is the most likely outcome in each individual race. For instance, in Washington state, we think Jaime Herrera, the Republican, will beat Denny Heck, the Democrat, by 7.5 points in the Third Congressional District.
Second, we have another formula that explicitly estimates how much error we think there could be in the forecast. This amount will differ from state to state and from district to district: we’re very precise about estimating how imprecise we are! In that same Third District in Washington, for instance, the margin of error on that forecast is ±8.7 points. The principles here are that the more data we have about a race, the more consistent that information is, and the closer we are to Election Day, the smaller the margin of error will be.
Next, we break that overall error down into two components, which we call the national component and the local component.
In each simulation that our model runs, the national component takes the form of a random number applied uniformly to every race in the country. For instance, in one simulation, this number might be “Republican +3″; that means that we give the Republican a 3-point bonus in every House race in that simulation.
The local component will be another random number that we draw individually for each race. Sometimes, the local error will run in the same direction as the national error, and sometimes it won’t. For instance, in one simulation of the Colorado Senate race, our random number generator came up with a national error of “Republican +3,” but a local error of “Democrat +4.” Add the two together, and the simulation moves the forecast toward the Democrat by 1 point overall. (Just as in the real world, sometimes local factors make more difference than national trends.)
The attractive part of this approach is that it allows us to assume that the bias in the forecasts will be correlated to some extent from race to race. If the polling is off in one particular direction in one state (say, it exaggerates the performance of the Democrat in California), it is also more likely to be so in other states (it will probably also overrate the Democrat in Washington).
Put another way, we do not assume that the races behave completely independently of one another. Errors (whether in the polling itself or in our modeling) that are made in one race will tend to replicate themselves in others.
This turns out to make quite a lot of difference.
The illustration below presents two different versions of our House forecast. The red line reflects the actual FiveThirtyEight forecast as we ran it last night. This assumes that the error from race to race is partially correlated: not wholly so or mostly so, but at least partly so: there is some chance that the Republicans will overperform in quite a number of races, and likewise for Democrats.
The blue line is a dumbed-down version of the forecast. It uses exactly the same inputs as our actual forecast, and has exactly the same overall projection (a Republican gain of 52 seats). Moreover, the forecast in each congressional district is also exactly the same as it was in the original. In the dumbed-down forecast, the Republicans still have a 4 percent chance of winning the North Carolina 4th district, for instance, and an 84 percent chance of winning in the Arizona 1st district — none of these numbers change.
The only difference is that the dumbed-down forecast assumes these Congressional districts are behaving completely independently of one another, and that the errors in each one are completely uncorrelated.
Well, what a difference one little assumption makes. In the dumbed-down forecast, the margin of error is plus or minus about 8 seats — that is, 52 ± 8 — so Republicans can almost always be counted upon to win between 44 seats and 60. This implies that the Democrats have almost no chance of losing fewer than 39 seats. Having them to hold the House under these assumptions would be like asking them to come up with heads 35 times in 40 coin flips: possible, but not very likely.
The real FiveThirtyEight forecast, however, has a margin of error of plus-or-minus 29 seats. This means that the Democrats do have a substantial chance of retaining control of the House — about 20 percent. But it also means that Republican gains in excess of 60 seats are quite possible (better than a 30 percent chance).
The blue line, the dumbed-down forecast, certainly looks a lot tidier and makes for a better headline. “The Republicans will win between 44 and 60 seats” sounds punchy and confident. “The Republicans will win between 23 and 81 seats” sounds like you don’t know what you’re talking about.
The problem is that the blue line, however attractive and well-behaved, does not reflect the situation in the real world. And it has a good chance of being wrong.
In fact, according to our more sophisticated model, a forecast of between 44 and 60 seats is at least as likely as not to be wrong: the chances are a little over 50 percent that Republicans’ gains will fall outside this range.
Intuitively, this may still seem like a very dissatisfying answer. So let’s approach it from another direction.
Our projection says that Republicans are favorites in 231 House races, which would reflect a net gain of 52 seats.
But suppose that our forecast is biased against the Democrats by one point across the country as a whole, perhaps because pollsters are overestimating the enthusiasm gap very slightly. Just one point. Well, there are 6 seats in which we have the Republican candidate projected to win by less than 1 full point (it might be a very long election night, by the way). If Democrats hold those 6 seats, the projected Republican gains would be down to 46.
Now suppose that the forecast understates Democratic support by 2 points. There are 8 seats in which we project the Republican candidate to win by a margin of between 1 and 2 points; now these would also be wiped off the board. Now the Republican gains would be reduced to just 38 seats — and the Democrats would hold the House, 218-217!
Read that again: it means that if our forecasts turn out to be biased against Democrats by just 2 points overall, the party becomes about an even-money bet to hold the House.
Now move in the other direction. Say that we’ve underestimated Republicans’ margin by 1 point across the board. There are 8 seats that we’re currently projecting Democrats to hold by less than 1 point. Give those 8 seats to Republicans, and the gains tally grows to 60.
And if the forecast is biased against Republicans by 2 points? Another 5 seats, bringing their total to 65.
We can extend this analysis as much as we want: if the forecasts lowball Republicans by 5 points overall, for instance, we’d expect them to win about 75 seats; a 5-point bias against Democrats, on the other hand, trims their losses to just 22.
Any House forecast will be extremely sensitive to small biases this year, because of the unusually large share of seats that are in play. Basically, when we say that there’s a 20 percent chance of the Democrats retaining the House, it means we think there’s a 20 percent chance that the polls are understating Democratic strength by at least 2 points on average.
We also think there’s a 20 percent chance that the polls are understating Republican strength by at least 2 points on average, which is what it would take to allow them to win 65 seats or so.
Does that sound unreasonable — a 20 percent chance of the polls being biased against one party by 2 points, and a 20 percent chance of a comparable bias against the other? I certainly hope it doesn’t.
Consider, for instance, that the spread of recent generic ballot polls runs between a 14-point Republican advantage (Gallup’s ‘traditional’ likely-voter model) and a 3-point Democratic one (Newsweek). A mere 2-point error looks quite tame compared with that range.
Consider that one study finds that including or excluding cellphones can make a difference of 4 points in polls.
Consider that the pollsters have vastly different ideas about the magnitude of the “enthusiasm gap,” ranging from being worth just a point or two for Republicans to a double-digit advantage.
Consider that there was one recent midterm year —1998 — when the polls overestimated the standing of one party (the Republicans) by about 5 points overall, and that party underperformed their polls in virtually every individual race.
So our two points seems like a pretty reasonable hedge to me. And since any biases in the forecasts are likely to get turbocharged this year because of the wide number of seats in play, it necessitates a wide forecast range.
This is why we say the Democrats have a decent shot at holding the House. Not because the enthusiasm gap is going to close substantially at the last minute (although it might be tightening very slightly), or because the Democrats will suddenly come up with some winning message, or anything else that makes for a particularly sexy story — rather, it is simply because there’s at least some uncertainty about where the polling in this election stands in the first place (as there is every year), and it wouldn’t take a very large mistake in the polling to produce a large error in the seat count.—