How FiveThirtyEight’s House Model Works

We’ve been publishing election models for more than 10 years now, and FiveThirtyEight’s 2018 House model is probably the most work we’ve ever put into one of them. That’s mostly because it just uses a lot of data. We collected data for all 435 congressional districts in every House race since 1998, and we’ve left few stones unturned, researching everything from how changes in district boundary lines could affect incumbents in Pennsylvania to how ranked-choice voting could change outcomes in Maine.

Not all of that detail is apparent upon launch. You can see the topline national numbers, as well as a forecast of the outcome in each district. But we’ll be adding a lot more features within the next few weeks, including detailed pages for each district. You may want to clip and save this methodology guide for then. In the meantime, here’s a fairly detailed glimpse at how the model works.

Overview

The principles behind the House forecast should be familiar to FiveThirtyEight readers. It takes lots of polls, performs various types of adjustments to them, and then blends them with other kinds of empirically useful indicators (what we sometimes call “the fundamentals”) to forecast each race. Then it accounts for the uncertainty in the forecast and simulates the election thousands of times. Our models are probabilistic in nature; we do a lot of thinking about these probabilities, and the goal is to develop probabilistic estimates that hold up well under real-world conditions. For instance, Democrats’ chances of winning the House are between 7 in 10 and 3 in 4 in the various versions of the model upon launch — right about what Hillary Clinton’s chances were on election night two years ago! — so ignore those probabilities at your peril.

Nonetheless, if you’re used to the taste of our presidential forecasts, the House model will have a different flavor to it in two important respects:

As compared with the presidential model, the House model is less polling-centric. Instead, it uses a broader consensus of indicators. That’s partly out of necessity: House districts don’t get much polling, and the polling they do get often isn’t much good. It’s also partly out of opportunity: With 435 separate races every other year, it’s possible to make fairly robust empirical assessments of which factors really predict House races well and which don’t.
House races are far more localized than presidential races, and this is reflected in the design of the model. In presidential elections, outcomes are extremely correlated from state to state. It wasn’t a surprise that President Trump won Michigan given that he also won demographically similar states such as Wisconsin and Pennsylvania, for instance. Sometimes that sort of thing happens in congressional elections too; although Democrats are favored in our initial forecast, even a relatively minor polling error could tilt the race back toward Republicans. Nonetheless, about three-quarters of the uncertainty in the House forecasts comes from local, district-by-district factors. If the presidential model is laser-focused on how the polls are changing from day to day and what they say about the Electoral College, the House model’s approach is more diffuse, with the goal being to shine some light into the darker corners of the electoral landscape.

Three versions of the model: Lite, Classic, Deluxe

In 2016, we published what we described as two different election models: “polls-only” and “polls-plus.”¹ This year, we’re running what we think of as three different versions of the same model, which we call Lite, Classic and Deluxe. I realize that’s a subtle distinction — different models versus different versions of the same model.

But the Lite, Classic and Deluxe versions of the House model somewhat literally build on top of one another, like different layers of toppings on an increasingly fancy burger. I’ll describe these methods in more detail in the sections below. First, a high-level overview of what the different versions account for.

The layers in FiveThirtyEight’s House forecast

			Which versions use it?
	Layer	Description	Lite	Classic	Deluxe
1a	Polling	District-by-district polling, adjusted for house effects and other factors.	✓	✓	✓
1b	CANTOR	A system which infers results for districts with little or no polling from comparable districts that do have polling.	✓	✓	✓
2	Fundamentals	Non-polling factors such as fundraising and past election results that historically help in predicting congressional races.		✓	✓
3	Expert forecasts	Ratings of each race published by the Cook Political Report, Inside Elections and Sabato’s Crystal Ball			✓

Lite is as close as you get to a “polls-only” version of the forecast — except, the problem is that a lot of congressional districts have little or no polling. So it uses a system we created called CANTOR² to fill in the blanks. It uses polls from districts that have polling, as well as national generic ballot polls, to infer what the polls would say in districts that don’t have polling.

The Classic version also uses local polls³ but layers a bunch of non-polling factors on top of it, the most important of which are incumbency, past voting history in the district, fundraising and the generic ballot. These are the “fundamentals.” The more polling in a district, the more heavily Classic relies on the polls as opposed to the “fundamentals.” Although Lite isn’t quite as simple as it sounds, the Classic model is definitely toward the complex side of the spectrum. With that said, it should theoretically increase accuracy. In the training data,⁴ Classic miscalled 3.3 percent of races, compared with 3.8 percent for Lite.⁵ You should think of Classic as the preferred or default version of FiveThirtyEight’s forecast unless we otherwise specify.

Finally, there’s the Deluxe flavor of the model, which takes everything in Classic and sprinkles in one more key ingredient: expert ratings. Specifically, Deluxe uses the race ratings from the Cook Political Report, Nathan Gonzales’s Inside Elections and Sabato’s Crystal Ball, all of which have published forecasts for many years and have an impressive track record of accuracy.⁶

Within-sample accuracy of forecasting methods

Share of races called correctly based on House elections from 1998 to 2016

Forecast	100 Days Before Election	Election Day
Lite model (poll-driven)	94.2%	96.2%
Fundamentals alone	95.4	95.7
Classic model (Lite model + fundamentals)	95.4	96.7
Expert ratings alone*	94.8	96.6
Deluxe model (Classic model + expert ratings)	95.7	96.9

So if we expect the Deluxe forecast to be (slightly) more accurate, why do we consider Classic to be our preferred version, as I described above? Basically, because we think it’s kind of cheating to borrow other people’s forecasts and make them part of our own. Some of the fun of doing this is in seeing how our rigid but rigorous algorithm stacks up against more open-ended but subjective ways of forecasting the races. If our lives depended on calling the maximum number of races correctly, however, we’d go with Deluxe.

Collecting, weighting and adjusting polls

Our House forecasts use almost all the polls we can find, including partisan polls put out by campaigns or other interested parties. (We have not traditionally used partisan polls in our Senate or presidential forecasts, but they are a necessary evil for the House.) However, as polling has gotten more complex, including attempts to create fake polls, there are an increasing number of exceptions:

We don’t use polls if we have significant concerns about their veracity or if the pollster is known to have faked polls before.
We don’t use DIY polls commissioned by nonprofessional hobbyists on online platforms such as Google Surveys. (This is a change in policy since 2016. Professional or campaign polls using these platforms are still fine.)
We don’t treat subsamples of multistate polls as individual “polls” unless certain conditions are met.⁷
We don’t use “polls” that blend or smooth their data using methods such as MRP. These can be perfectly fine techniques — but if you implement them, you’re really running a model rather than a poll. We want to do the blending and smoothing ourselves rather than inputting other people’s models into ours.

These cases are rare — so if you don’t see a poll on our “latest polls” page, there’s a good chance that we’ve simply missed it. (House polls can be a lot harder to track down than presidential ones.) Please drop us a line if there’s a poll you think we’ve missed.

Polls are weighted based on their sample size, their recency and their pollster rating (which in turn is based on the past accuracy of the pollster, as well as its methodology). These weights are determined by algorithm; we aren’t sticking our fingers in the wind and rating polls on a case-by-case basis. In a slight change this year, the algorithm emphasizes the diversity of polls more than it has in the past; in any particular race, it will insist on constructing an average of polls from at least two or three distinct polling firms even if some of the polls are less recent.

There are also three types of adjustments to each poll:

First, a likely voter adjustment takes the results of polls of registered voters or all adults and attempts to translate them to a likely-voter basis. Traditionally, Republican candidates gain ground in likely voter polls relative to registered voter ones, but the gains are smaller in midterms with a Republican president. Furthermore, some polls this year actually show Democrats gaining in likely voter models. The likely voter adjustment is dynamic; it starts with a prior that likely voter polls slightly help Republicans, but this prior is updated as pollsters publish polls that directly compare likely and registered voter results. (If you’re a pollster, please follow Monmouth University’s lead and do this!) In mid-August, for example, the model treats likely-voter and registered-voter polls as roughly equivalent to each other, but this could change as we collect more data.
Second, a timeline adjustment adjusts for the timing of the poll, based on changes in the generic congressional ballot. For instance, if Democrats have gained a net of 5 percentage points on the generic ballot since a certain district was polled, the model will adjust the poll upward toward the Democratic candidate (but not by the full 5 points; instead, by roughly half that amount — 2.5 points — depending on the elasticity score⁸ of the district). As compared with the timeline adjustment in our presidential model, which can be notoriously aggressive, the one in the House model is pretty conservative.⁹
A house effects adjustment corrects for persistent statistical bias from a pollster. For instance, if a polling firm consistently shows results that are 2 points more favorable for Democrats than other polls of the same district, the adjustment will shift the poll part of the way back toward Republicans.¹⁰

The House model does use partisan and campaign polls, which typically make up something like half of the overall sample of congressional district polling. Partisanship is determined by who sponsors the poll, rather than who conducts it. Polls are considered partisan if they’re conducted on behalf of a candidate, party, campaign committee, or PAC, super PAC, 501(c)(4), 501(c)(5) or 501(c)(6) organization that conducts a large majority of its political activity on behalf of one political party.

Partisan polls are subject to somewhat different treatment than nonpartisan polls in the model. They receive a lower weight, as partisan-sponsored polls are historically less accurate. And the house effects adjustment starts out with a prior that assumes these polls are biased by about 4 percentage points toward their preferred candidate or party. If a pollster publishing ostensibly partisan polls consistently has results that are similar to nonpartisan polls of the same districts, the prior will eventually be overridden.

CANTOR: Analysis of polls in similar districts

CANTOR is essentially PECOTA or CARMELO (the baseball and basketball player forecasting systems we designed) for congressional districts. It uses a k-nearest neighbors algorithm to identify similar congressional districts based on a variety of demographic,¹¹ geographic¹² and political¹³ factors. For instance, the district where I was born, Michigan 8, is most comparable to other upper-middle-income Midwestern districts such as Ohio 12, Indiana 5 and Minnesota 2 that similarly contain a sprawling mix of suburbs, exurbs and small towns.

The goal of CANTOR is to impute what polling would say in unpolled or lightly polled districts, given what it says in similar districts. It attempts to accomplish this goal in two stages. First, it comes up with an initial guesstimate of what the polls would say based solely on FiveThirtyEight’s partisan lean metric (FiveThirtyEight’s version of a partisan voting index, which is compiled based on voting for president and state legislature) and incumbency. For instance, if Republican incumbents are polling poorly in the districts where we have polling, it will assume that Republican incumbents in unpolled districts are vulnerable as well. Then, it adjusts the initial estimate based on the district-by-district similarity scores. For instance, that Republican incumbent Carlos Curbelo is polling surprisingly well in Florida’s 26th Congressional District will also help Republicans in similar congressional districts.

All of this sounds pretty cool, but there’s one big drawback. Namely, there’s a lot of selection bias in which districts are polled. A House district usually gets surveyed only if one of the campaigns or a media organization has reason to think the race is close — so unpolled districts are less competitive than you’d infer from demographically similar districts that do have polls. CANTOR projections are adjusted to account for this.

Overall, CANTOR is an interesting method that heavily leans into district polling and gets as close as possible to a “polls-only” view of the race. However, in terms of accuracy, it is generally inferior to using …

The fundamentals

The data-rich environment in House elections — 435 individual races every other year, compared with just one race every four years for the presidency — is most beneficial when it comes to identifying reliable non-polling factors for forecasting races. There’s enough data, in fact, that rather than using all districts to determine which factors were most predictive, I instead focused the analysis on competitive races (using a fairly broad definition of “competitive”). In competitive districts with incumbents, the following factors have historically best predicted election results, in roughly declining order of importance:

The incumbent’s margin of victory in his or her previous election, adjusted for the national political environment and whom the candidate was running against in the prior election.
The generic congressional ballot.
Fundraising, based on the share of individual contributions for the incumbent and the challenger as of the most recent filing period.¹⁴
FiveThirtyEight partisan lean, which is based on how a district voted in the past two presidential elections and (in a new twist this year) state legislative elections. In our partisan lean formula, 50 percent of the weight is given to the 2016 presidential elections, 25 percent to the 2012 presidential election and 25 percent to state legislative elections.
Congressional approval ratings, which are a measure of the overall attitude toward incumbents.¹⁵
Whether either the incumbent or the challenger was involved in a scandal.
The incumbent’s roll call voting record — specifically, how often the incumbent voted with his or her party in the past three Congresses. “Maverick-y” incumbents who break party ranks more often outperform those who don’t.
Finally, the political experience level of the challenger, based on whether the challenger has held elected office before.

In addition, in Pennsylvania, which underwent redistricting in 2018, the model accounts for the degree of population overlap between the incumbent’s old and new district. And in California and Washington state, it accounts for the results of those states’ top-two primaries.

In open-seat races, the model uses the factors from the list above that aren’t dependent on incumbency, namely the generic ballot, fundraising, FiveThirtyEight partisan lean, scandals, experience and (where applicable) top-two primary results. It also uses the results of the previous congressional election in the district, but this is a much less reliable indicator than when an incumbent is running for re-election.

But wait — there’s more! In addition to combining polls and fundamentals, the Classic model compares its current estimate of the national political climate to a prior based on the results of congressional elections since 1946, accounting for historic swings in midterms years and presidential approval ratings. This prior has little effect on the projections this year, however, as it implies that Democrats should be ahead by about 8 points in the popular vote — similar to what the generic ballot and other indicators show.¹⁶ To put it another way, the results we’re seeing in the data so far are consistent with what’s usually happened in midterms under unpopular presidents.

Incorporating expert ratings

Compared with the other steps, incorporating expert ratings and creating the Deluxe version of the model is fairly straightforward. We have a comprehensive database of ratings from Cook and other groups since 1998, so we can look up how a given rating corresponded, on average, with a margin of victory. For instance, candidates who were deemed to be “likely” winners in their races won by an average of about 12 points:

What do ratings like “lean Republican” really mean?

Expert Rating	Average margin of victory
Toss-up	0 points
“Tilts” toward candidate	4 points
“Leans” toward candidate	7 points
“Likely” for candidate	12 points
“Solid” or “safe” for candidate	34 points

But, of course, there are complications. One is that there’s an awful lot of territory covered by the “solid” and “safe” categories: everything from races that could almost be considered competitive to others where the incumbent wins every year by a 70-point margin. Therefore, the Deluxe forecast doesn’t adjust its projections much when it encounters “solid” or “safe” ratings from the experts, except in cases where the rating comes as a surprise because other factors indicate that the race should be competitive.

Also, although the expert raters are really quite outstanding at identifying “micro” conditions on the ground, including factors that might otherwise be hard to measure, they tend to be lagging indicators of the macro political environment. Several of the expert raters shifted their projections sharply toward the Democrats throughout 2018, for instance, even though the generic ballot has been fairly steady over that period.¹⁷ Thus, the Deluxe forecast tries to blend the relative order of races implied by the expert ratings with the Classic model’s data-driven estimate of national political conditions. Deluxe and Classic will usually produce relatively similar forecasts of the overall number of seats gained or lost by a party, therefore, even though they may have sharp disagreements on individual races.

Simulating the election and accounting for uncertainty

Sometimes what seem like incredibly pedantic questions can turn out to be important. For years, we’ve tried to design models that account for the complicated, correlated structure of error and uncertainty in election forecasting. Specifically, that if a candidate or a party overperforms the polls in one swing state, they’re also likely to do so in other states, especially if they’re demographically similar. Understanding this principle was key to understanding why Clinton’s lead wasn’t nearly as safe as it seemed in 2016.

Fortunately, this is less of a problem in constructing a House forecast; there are different candidates on the ballot in every district, instead of just one presidential race, and the model relies on a variety of inputs, instead of depending so heavily on polls. Nonetheless, the model accounts for four potential types of error in an attempt to self-diagnose the various ways in which it could go off the rails:

First, there’s local error — that is, error pertaining to individual districts. Forecasts are more error-prone in districts where there’s less polling or in districts where the various indicators disagree with one another. (In West Virginia 3, for example, the fundamentals regression thinks Democrat Richard Ojeda should be a huge underdog — but the only poll of the race has him ahead!) Some districts are also swingier (or more elastic) than others; conditions tend to change fairly quickly in New Hampshire, for instance, but more slowly in the South, where electorates are often bifurcated between very liberal and very conservative voters.
Second, there’s error based on regional or demographic characteristics. For instance, it’s possible that Democrats will systematically underperform expectations in districts with large numbers of Hispanic voters or overperform them in the rural Midwest. The model uses CANTOR similarity scores to simulate these possibilities.
Third, there can be error driven by incumbency status. In some past elections, polls have systematically underestimated Republican incumbents, for example, even if they were fairly accurate in open-seat races. The model accounts for this possibility as well.
Fourth and finally, the model accounts for the possibility of a uniform national swing — i.e., when the polls are systematically off in one party’s direction in almost every race.

Error becomes smaller as Election Day approaches. In particular, there’s less possibility of a sharp national swing as you get nearer to the election because there’s less time for news events to intervene.

Nonetheless, you shouldn’t expect pinpoint precision in a House forecast, and models that purport to provide it are either fooling you or fooling themselves. Even if you knew exactly what national conditions were, there would still be a lot of uncertainty based on how individual races play out.

Odds and ends

OK, that’s almost everything. Just a few final notes:

We’ve made educated guesses about the identity of the nominees in states that haven’t yet held their primaries. We’ll definitely be wrong about a few of these, and we’ll change them once the primaries are held. For the time being, we’re also assuming that incumbent Republican Chris Collins will successfully be able to withdraw from the race in New York’s 27th District, but we’ll reinsert him if it looks like he won’t be able to.
Once we publish the race-by-race pages, you’ll notice that we also project turnout in each district, based on factors such as the citizen voting-age population and turnout in past midterms and presidential races. This is important in understanding the relationship between the national popular vote and the number of seats that each party might gain or lose. As of forecast launch, our model implies that Democrats would need to win the House popular vote by 5 to 6 percentage points to have a break-even chance of winning a majority of seats.
Finally, I should emphasize that we do not make ad-hoc adjustments to the forecasts in individual races. They’re all done strictly by algorithm. Nor do we implement major changes in the program once the model has been released. With that said, we will correct bugs, especially in the first week or two after the model is out.¹⁸ So if you see something that looks awry, please just let us know.

[newsletter-politics]

Footnotes

Never mind the dastardly “now-cast,” which wasn’t a forecast and just confused people.
In the tradition of PECOTA and CARMELO, we created a stupid backronym for it: Congressional Algorithm using Neighboring Typologies to Optimize Regression.
And CANTOR, although CANTOR gets little weight in the Classic model.
That is, in House elections since 1998, excluding races where one of the major parties failed to nominate a candidate or where there were multiple candidates from the same major party.
The low rate of missed races isn’t quite as impressive as it sounds given that most congressional races aren’t competitive at all because of how districts are drawn these days.
Inside Elections was formerly the Rothenberg Political Report.
Specifically, (i) that there is some method to verify the geographic location of the respondent and (ii) each state or district in the poll is weighted individually. (This is an evolution in our policy since 2016.) For instance, in a national poll with 2,000 respondents, we wouldn’t use a 150-person subsample of Texas responses as a Texas poll unless the above conditions were met. We do treat congressional district breakouts of single-state polls as individual polls of those congressional districts, provided that the pollster intends them to be used in this way and changes the names of candidates in the poll to reflect the ones the voter will see on the ballot in her district.
That is, how responsive a district is to changes in political conditions.
In building the House model, I discovered that the version of the generic ballot average we’ve been publishing on our generic ballot interactive is too aggressive and overcommits to short-term polling fluctuations; we’ll be fixing that soon. In the meantime, you should know that the House model’s version of the generic ballot average is tuned to a slightly more conservative, slower-moving setting.
Or almost all the way if we have a large enough sample of surveys from that polling firm.
Specifically: race, gender, income, education and immigration status.
Latitude, longitude, population density, urban/rural and geographic region.
Voting for president and for state legislature.
Challenger fundraising generally lags incumbent fundraising, so the model adjusts the challenger’s fundraising upward until both candidates have filed their final, pre-general election fundraising reports.
Congressional approval ratings have been low for the past decade or so, which corresponds to a significantly smaller incumbency advantage than a generation ago.
Moreover, the prior is designed in such a way that it phases it out completely by Election Day.
Although with lots of short-term fluctuations up and down.
There’s also a handful of unfinished business; the model is not yet accounting for the possibility of runoffs in Louisiana and Georgia.