FiveThirtyEight

We’ve been publishing election models for more than 10 years now, and FiveThirtyEight’s 2018 House model is probably the most work we’ve ever put into one of them. That’s mostly because it just uses a lot of data. We collected data for all 435 congressional districts in every House race since 1998, and we’ve left few stones unturned, researching everything from how changes in district boundary lines could affect incumbents in Pennsylvania to how ranked-choice voting could change outcomes in Maine.

Not all of that detail is apparent upon launch. You can see the topline national numbers, as well as a forecast of the outcome in each district. But we’ll be adding a lot more features within the next few weeks, including detailed pages for each district. You may want to clip and save this methodology guide for then. In the meantime, here’s a fairly detailed glimpse at how the model works.

Overview

The principles behind the House forecast should be familiar to FiveThirtyEight readers. It takes lots of polls, performs various types of adjustments to them, and then blends them with other kinds of empirically useful indicators (what we sometimes call “the fundamentals”) to forecast each race. Then it accounts for the uncertainty in the forecast and simulates the election thousands of times. Our models are probabilistic in nature; we do a lot of thinking about these probabilities, and the goal is to develop probabilistic estimates that hold up well under real-world conditions. For instance, Democrats’ chances of winning the House are between 7 in 10 and 3 in 4 in the various versions of the model upon launch — right about what Hillary Clinton’s chances were on election night two years ago! — so ignore those probabilities at your peril.

Nonetheless, if you’re used to the taste of our presidential forecasts, the House model will have a different flavor to it in two important respects:

Three versions of the model: Lite, Classic, Deluxe

In 2016, we published what we described as two different election models: “polls-only” and “polls-plus.” This year, we’re running what we think of as three different versions of the same model, which we call Lite, Classic and Deluxe. I realize that’s a subtle distinction — different models versus different versions of the same model.

But the Lite, Classic and Deluxe versions of the House model somewhat literally build on top of one another, like different layers of toppings on an increasingly fancy burger. I’ll describe these methods in more detail in the sections below. First, a high-level overview of what the different versions account for.

The layers in FiveThirtyEight’s House forecast
View more!

Lite is as close as you get to a “polls-only” version of the forecast — except, the problem is that a lot of congressional districts have little or no polling. So it uses a system we created called CANTOR to fill in the blanks. It uses polls from districts that have polling, as well as national generic ballot polls, to infer what the polls would say in districts that don’t have polling.

The Classic version also uses local polls but layers a bunch of non-polling factors on top of it, the most important of which are incumbency, past voting history in the district, fundraising and the generic ballot. These are the “fundamentals.” The more polling in a district, the more heavily Classic relies on the polls as opposed to the “fundamentals.” Although Lite isn’t quite as simple as it sounds, the Classic model is definitely toward the complex side of the spectrum. With that said, it should theoretically increase accuracy. In the training data, Classic miscalled 3.3 percent of races, compared with 3.8 percent for Lite. You should think of Classic as the preferred or default version of FiveThirtyEight’s forecast unless we otherwise specify.

Finally, there’s the Deluxe flavor of the model, which takes everything in Classic and sprinkles in one more key ingredient: expert ratings. Specifically, Deluxe uses the race ratings from the Cook Political Report, Nathan Gonzales’s Inside Elections and Sabato’s Crystal Ball, all of which have published forecasts for many years and have an impressive track record of accuracy.

Within-sample accuracy of forecasting methods

Share of races called correctly based on House elections from 1998 to 2016

View more!

* Based on the average ratings from Cook Political Report, Inside Elections/The Rothenberg Political Report, Sabato’s Crystal Ball and CQ Politics. Where the expert rating averages out to an exact toss-up, the experts are given credit for half a win.

So if we expect the Deluxe forecast to be (slightly) more accurate, why do we consider Classic to be our preferred version, as I described above? Basically, because we think it’s kind of cheating to borrow other people’s forecasts and make them part of our own. Some of the fun of doing this is in seeing how our rigid but rigorous algorithm stacks up against more open-ended but subjective ways of forecasting the races. If our lives depended on calling the maximum number of races correctly, however, we’d go with Deluxe.

Collecting, weighting and adjusting polls

Our House forecasts use almost all the polls we can find, including partisan polls put out by campaigns or other interested parties. (We have not traditionally used partisan polls in our Senate or presidential forecasts, but they are a necessary evil for the House.) However, as polling has gotten more complex, including attempts to create fake polls, there are an increasing number of exceptions:

These cases are rare — so if you don’t see a poll on our “latest polls” page, there’s a good chance that we’ve simply missed it. (House polls can be a lot harder to track down than presidential ones.) Please drop us a line if there’s a poll you think we’ve missed.

Polls are weighted based on their sample size, their recency and their pollster rating (which in turn is based on the past accuracy of the pollster, as well as its methodology). These weights are determined by algorithm; we aren’t sticking our fingers in the wind and rating polls on a case-by-case basis. In a slight change this year, the algorithm emphasizes the diversity of polls more than it has in the past; in any particular race, it will insist on constructing an average of polls from at least two or three distinct polling firms even if some of the polls are less recent.

There are also three types of adjustments to each poll:

The House model does use partisan and campaign polls, which typically make up something like half of the overall sample of congressional district polling. Partisanship is determined by who sponsors the poll, rather than who conducts it. Polls are considered partisan if they’re conducted on behalf of a candidate, party, campaign committee, or PAC, super PAC, 501(c)(4), 501(c)(5) or 501(c)(6) organization that conducts a large majority of its political activity on behalf of one political party.

Partisan polls are subject to somewhat different treatment than nonpartisan polls in the model. They receive a lower weight, as partisan-sponsored polls are historically less accurate. And the house effects adjustment starts out with a prior that assumes these polls are biased by about 4 percentage points toward their preferred candidate or party. If a pollster publishing ostensibly partisan polls consistently has results that are similar to nonpartisan polls of the same districts, the prior will eventually be overridden.

CANTOR: Analysis of polls in similar districts

CANTOR is essentially PECOTA or CARMELO (the baseball and basketball player forecasting systems we designed) for congressional districts. It uses a k-nearest neighbors algorithm to identify similar congressional districts based on a variety of demographic, geographic and political factors. For instance, the district where I was born, Michigan 8, is most comparable to other upper-middle-income Midwestern districts such as Ohio 12, Indiana 5 and Minnesota 2 that similarly contain a sprawling mix of suburbs, exurbs and small towns.

The goal of CANTOR is to impute what polling would say in unpolled or lightly polled districts, given what it says in similar districts. It attempts to accomplish this goal in two stages. First, it comes up with an initial guesstimate of what the polls would say based solely on FiveThirtyEight’s partisan lean metric (FiveThirtyEight’s version of a partisan voting index, which is compiled based on voting for president and state legislature) and incumbency. For instance, if Republican incumbents are polling poorly in the districts where we have polling, it will assume that Republican incumbents in unpolled districts are vulnerable as well. Then, it adjusts the initial estimate based on the district-by-district similarity scores. For instance, that Republican incumbent Carlos Curbelo is polling surprisingly well in Florida’s 26th Congressional District will also help Republicans in similar congressional districts.

All of this sounds pretty cool, but there’s one big drawback. Namely, there’s a lot of selection bias in which districts are polled. A House district usually gets surveyed only if one of the campaigns or a media organization has reason to think the race is close — so unpolled districts are less competitive than you’d infer from demographically similar districts that do have polls. CANTOR projections are adjusted to account for this.

Overall, CANTOR is an interesting method that heavily leans into district polling and gets as close as possible to a “polls-only” view of the race. However, in terms of accuracy, it is generally inferior to using …

The fundamentals

The data-rich environment in House elections — 435 individual races every other year, compared with just one race every four years for the presidency — is most beneficial when it comes to identifying reliable non-polling factors for forecasting races. There’s enough data, in fact, that rather than using all districts to determine which factors were most predictive, I instead focused the analysis on competitive races (using a fairly broad definition of “competitive”). In competitive districts with incumbents, the following factors have historically best predicted election results, in roughly declining order of importance:

In addition, in Pennsylvania, which underwent redistricting in 2018, the model accounts for the degree of population overlap between the incumbent’s old and new district. And in California and Washington state, it accounts for the results of those states’ top-two primaries.

In open-seat races, the model uses the factors from the list above that aren’t dependent on incumbency, namely the generic ballot, fundraising, FiveThirtyEight partisan lean, scandals, experience and (where applicable) top-two primary results. It also uses the results of the previous congressional election in the district, but this is a much less reliable indicator than when an incumbent is running for re-election.

But wait — there’s more! In addition to combining polls and fundamentals, the Classic model compares its current estimate of the national political climate to a prior based on the results of congressional elections since 1946, accounting for historic swings in midterms years and presidential approval ratings. This prior has little effect on the projections this year, however, as it implies that Democrats should be ahead by about 8 points in the popular vote — similar to what the generic ballot and other indicators show. To put it another way, the results we’re seeing in the data so far are consistent with what’s usually happened in midterms under unpopular presidents.

Incorporating expert ratings

Compared with the other steps, incorporating expert ratings and creating the Deluxe version of the model is fairly straightforward. We have a comprehensive database of ratings from Cook and other groups since 1998, so we can look up how a given rating corresponded, on average, with a margin of victory. For instance, candidates who were deemed to be “likely” winners in their races won by an average of about 12 points:

What do ratings like “lean Republican” really mean?
View more!

Based on House races since 1998.

But, of course, there are complications. One is that there’s an awful lot of territory covered by the “solid” and “safe” categories: everything from races that could almost be considered competitive to others where the incumbent wins every year by a 70-point margin. Therefore, the Deluxe forecast doesn’t adjust its projections much when it encounters “solid” or “safe” ratings from the experts, except in cases where the rating comes as a surprise because other factors indicate that the race should be competitive.

Also, although the expert raters are really quite outstanding at identifying “micro” conditions on the ground, including factors that might otherwise be hard to measure, they tend to be lagging indicators of the macro political environment. Several of the expert raters shifted their projections sharply toward the Democrats throughout 2018, for instance, even though the generic ballot has been fairly steady over that period. Thus, the Deluxe forecast tries to blend the relative order of races implied by the expert ratings with the Classic model’s data-driven estimate of national political conditions. Deluxe and Classic will usually produce relatively similar forecasts of the overall number of seats gained or lost by a party, therefore, even though they may have sharp disagreements on individual races.

Simulating the election and accounting for uncertainty

Sometimes what seem like incredibly pedantic questions can turn out to be important. For years, we’ve tried to design models that account for the complicated, correlated structure of error and uncertainty in election forecasting. Specifically, that if a candidate or a party overperforms the polls in one swing state, they’re also likely to do so in other states, especially if they’re demographically similar. Understanding this principle was key to understanding why Clinton’s lead wasn’t nearly as safe as it seemed in 2016.

Fortunately, this is less of a problem in constructing a House forecast; there are different candidates on the ballot in every district, instead of just one presidential race, and the model relies on a variety of inputs, instead of depending so heavily on polls. Nonetheless, the model accounts for four potential types of error in an attempt to self-diagnose the various ways in which it could go off the rails:

Error becomes smaller as Election Day approaches. In particular, there’s less possibility of a sharp national swing as you get nearer to the election because there’s less time for news events to intervene.

Nonetheless, you shouldn’t expect pinpoint precision in a House forecast, and models that purport to provide it are either fooling you or fooling themselves. Even if you knew exactly what national conditions were, there would still be a lot of uncertainty based on how individual races play out.

Odds and ends

OK, that’s almost everything. Just a few final notes:


Filed under