Sep. 17, 2014, at 7:30 AM

How The FiveThirtyEight Senate Forecast Model Works

UPDATE (Sept. 21, 2016; 9 a.m.): The article below describes the methodology for our 2014 Senate forecasts. In all the important ways, our model for predicting the 2016 Senate elections works the same way and abides by the same principles.

The FiveThirtyEight Senate forecast model launched earlier this month. Right now, it shows Republicans with about a 53 percent chance of picking up the Senate next year. We owe you a lot more detail about how that forecast is calculated and how it might change between now and Nov. 4 — and how our model differs from some of the others out there.

This article, which outlines the model’s methodology, is going to be on the detailed side. I’ve tried to keep the descriptions in plain language as often as possible (the footnotes get somewhat more technical). But it’s meant to be a reasonably comprehensive reference guide rather than breezy bedtime reading.

First, however, I want to describe the principles behind the model. Some of these are more philosophical and abstract — they describe what I think of as best practices for applied statistical modeling. I can get passionate about this stuff — but somewhat contrary to the media portrayal of election forecasters as wizards who conjure up spells from their spreadsheets, our goal is not to divine some magic formula that miraculously predicts every election. Instead, it’s to make sense of publicly available information in a rigorous and disciplined way.

Principle 1: A good model should be probabilistic, not deterministic.¹

The FiveThirtyEight model produces probabilistic forecasts as opposed to hard-and-fast predictions. In the same sense a weather forecaster might tell you there’s a 20 percent chance of rain tomorrow, the FiveThirtyEight model might estimate the Democrat has a 20 percent chance of winning the Senate race in Kentucky.

My view is that this is often the most important part of modeling — and often the hardest part. Predictions of the most likely outcome (“The Democrat will win the race by 3 percentage points”) are sometimes relatively immune to changes in methodology. But probabilistic forecasts can be very sensitive to them. Does that 3-point lead translate into a 60 percent chance of winning? Or a 95 percent chance? Or what?

This can be tricky. Quick-and-dirty assumptions, like that the margin of error expressed in a poll is a complete reflection of its accuracy, sometimes don’t hold up well in the real world. In our database of polls conducted in the final three weeks of campaigns since 1998, the actual results fell outside of the poll’s reported margin of error² almost 25 percent of the time.³ So it’s important to model the error empirically — based on how well the polls have done in past races — instead of taking shortcuts.

In fact, while FiveThirtyEight’s forecasts are sometimes seen as extremely bold when compared with news media coverage of campaigns,⁴ they are fairly conservative compared with some other forecasting models. For a variety of reasons,⁵ statistical models are prone toward overconfidence unless they’re designed carefully.

The best test of a probabilistic forecast is whether it’s well calibrated. By that I mean: Out of all FiveThirtyEight forecasts that give candidates about a 75 percent shot of winning, do the candidates in fact win about 75 percent of the time over the long run? It’s a problem if these candidates win only 55 percent of the time. But from a statistical standpoint, it’s just as much of a problem if they win 95 percent of the time.

Fortunately, FiveThirtyEight’s Senate forecasts have historically been well calibrated. We’ve posted the data on GitHub so that you can check them out for yourself. For example, out of the 12 instances⁶ where we gave a candidate between an 85 percent and a 95 percent chance of winning on Election Day, the favored candidate won in 11 cases, or 92 percent of the time.⁷ We also have a track record of being well calibrated in other types of election forecasts and sports forecasts.

Principle 2: A good model ought to be empirical.

Simplicity can be a virtue in model-building, but every model must be reasonably consistent with the evidence. It’s one thing to have stress-tested your model, determined that only a few things really matter, and removed all the superfluous bits. But if you don’t account for a certain variable or some statistical property in your model, you may be making an implicit assumption that it doesn’t matter much — when sometimes it does.

For instance, it’s probably wrong to treat registered voter polls the same as likely voter polls when there’s reasonably clear historical evidence that registered voter polls tend to overrate the standing of Democrats. As I mentioned, it’s almost certainly wrong to assume the error in a poll is fully captured by its reported margin of error. It’s also wrong to assume that the error in one poll is independent from the next — about as often as not, the polls all miss in the same direction, which can lead to late-breaking “waves” toward one or the other party.

It’s also important to check whether modeling choices have a sound basis in theory. One variable our model uses is an ideology score for each candidate, which seeks to estimate how liberal or conservative he or she is relative to voters in his or her state. This variable is highly statistically significant in predicting election outcomes — but just as important is that it has a strong basis in the political science literature (in this case, see the median voter theorem). By contrast, if we’d found that past election outcomes had been well predicted by the number of consonants a candidate had in her middle name, we’d strongly suspect this was a statistical fluke and we wouldn’t include it in our model.

Principle 3: A good model ought to respond sensibly to changes in inputs.

Models that gyrate around wildly with the slightest provocation should be viewed skeptically. So should those unmoved by even important-seeming information. The FiveThirtyEight Senate model tends to produce fairly stable forecasts. That can make things a bit dull from day to day; most of the time, the new polls and data we collect have little effect on the bottom line. But now and then almost all the polls on a particular day favor one or the other party, and the overall forecast for Senate control moves by a few percentage points in that party’s direction.

There will sometimes be more volatility in the forecast toward the end of the campaign because late changes in the polls can’t be reversed before Election Day. A sports analogy is helpful here: An NFL team that kicks a field goal with two minutes to play in the first quarter becomes only a 59 percent favorite to win,⁸ but one that does so with two minutes to play in the fourth quarter becomes an 83 percent favorite. Likewise, a 2-point shift in the polls can produce a larger change in win probabilities late in an election.

Principle 4: A good model ought to avoid changing its rules in midstream.

We’re sometimes guilty of talking about the FiveThirtyEight model as though it has a mind of its own. It doesn’t. It’s just a computer program — and we wrote the program.

However we don’t “tweak” the forecast in a given state just because we don’t like the outcome. And we avoid changing the program once we’ve launched the forecasts.⁹ Nor do we change it that much from year to year — the Senate model is perhaps 80 percent to 90 percent the same as when we launched it in 2008, and is also largely similar to our presidential forecast model. All versions of these models have used polling along with non-polling data, have been probabilistic rather than deterministic, and so forth.

Speaking of that model — what does it do, exactly? There are seven major steps.

Step 1: Weighted polling average

We start by collecting polls — lots of polls. There are only a few types of polls we discard. First are those from pollsters that we know or suspect to have faked their results or to have engaged in other gross ethical misconduct — for instance, Strategic Vision and Research 2000.¹⁰ Next, we exclude internal polls conducted directly on behalf of candidates or party organizations like the Democratic Senatorial Campaign Committee and the Republican National Committee, which tend to be inaccurate and biased.¹¹

We err strongly on the side of inclusiveness; the threshold for excluding a poll is high.¹² The model has a lot of other defense mechanisms, particularly in minimizing the effect of polls that show signs of partisan bias through our “house effects” adjustment (which I’ll describe in Step 2). For more about our philosophy on this, see this discussion.

A poll is weighted based on three factors:

How recently it was conducted. Older polls are penalized through an exponential decay formula. The penalty becomes stiffer — that is, more emphasis is placed on recency — the closer we get to the election.¹³ Our research suggests that the news media place too much emphasis on recency and would be better off looking at a broader range of polls.

The poll’s sample size. Polls that sample more voters receive a larger weight, although there are diminishing returns. In particular, we’ve found that the improvement in accuracy from sampling more voters is not as large in practice as it’s supposed to be in theory.¹⁴ The reasons for this are interesting — we’ll discuss them more when we release our pollster ratings. But the implication is that one should be careful about weighting an otherwise dubious poll heavily just because it took a large sample. This reflects a slight refinement from previous years, when the model placed more emphasis on sample size.¹⁵

The pollster rating. We’ve released an entirely new set of pollster ratings for 2014 and explain the process for calculating them in much more depth in a separate article. The method is similar to one we used in 2010, however, in that pollsters are rated on the basis of both their past accuracy¹⁶ and on two easily measurable proxies for methodological quality. First, is whether the polling firm is a member of industry groups and initiatives like the AAPOR Transparency Initiative.¹⁷ And second is whether the firm regularly calls cellphones in addition to landlines.¹⁸ These factors don’t tell you everything you need to know about a poll — but they tend to be correlated with other strong methodological practices. More importantly, these methodological variables are strong predictors of more accurate polling results going forward.

Truth be told, the poll weights don’t always make a huge impact — although this year could be an exception given how many close races there are. The main way to go wrong is probably in placing too much emphasis on the most recent polls, which can lead to unwarranted volatility.

A few bits of housekeeping on other polling situations that come up from time to time:

Pollsters routinely poll the same races multiple times. In these cases, we don’t “throw out” the old poll, but it gets a lower weight; see here for how that works.
If a pollster lists both likely voter and registered voter results, we use the likely voter version.¹⁹
In other cases where the pollster releases multiple versions of the same poll — for instance, results drawn from two different turnout models, or results with and without a minor candidate included²⁰ — we simply average all applicable versions together.
Tracking polls, which contain overlapping dates in their samples, are weighted based on the number of new respondents in each edition of the poll.²¹

Step 2: Adjustments to the polling average

The FiveThirtyEight model performs three sets of adjustments to the polls: a likely voter adjustment, a house effects adjustment and (usually the least important of three) a trend line adjustment.

The rationale for the likely voter adjustment is explained at some length here. Polls of likely voters are almost always more favorable to Republicans than polls of broader samples, like registered voters. But polls of likely voters also tend to be more accurate and less biased, especially in midterm years.

So as a default, the FiveThirtyEight model shifts registered voter polls (and polls of all adults) toward Republicans to make them more comparable to likely voter surveys. In particular, the model defaults to shifting polls of registered voters toward Republicans by 2.7 percentage points, which is the historical average difference between likely voter and registered voter polls in midterm years. However, the magnitude of the shift is updated based on polls like this one that list both registered voter and likely voter results in the same survey.²² So far this year, the average²³ gap has been just above 3 points.

The house effects adjustment accounts for the tendency of some polling firms to consistently show more favorable results for one or the other party.²⁴ It works by means of a regression analysis on all Senate and generic congressional ballot polls.²⁵ This is one of a number of reasonable ways of comparing a poll against others of the same state.²⁶

However, one or two (or even a few) polls may not tell you that much about a pollster’s house effect; variation from other pollsters may just be statistical noise instead. Furthermore, a model without some tolerance for differences of opinion among pollsters may deprive itself of the benefits of aggregating polls together in the first place. The FiveThirtyEight model handles this by calculating a “buffer zone”²⁷ based on the number of polls a firm has released. For instance, a firm with relatively few polls might have a buffer zone of 2 percentage points. Any house effect beyond that buffer zone is subtracted from the poll, so a polling firm with a 5-point Republican house effect and a 2-point buffer zone will have its results adjusted toward Democrats by 3 points.²⁸

A new feature in the model this year is that house effects from past years²⁹ are used to help calibrate the house effects adjustment.³⁰ However, their influence is relatively minor.³¹ House effects are generally fairly consistent from election to election, but there are exceptions; for instance, the firm Rasmussen Reports, which had a strong Republican house effect the past, has little house effect so far this year. The main case where this is helpful is when a firm with a long history of partisan polling drops in to poll a few races after having been dormant for most of the cycle.

Another question is how to calculate the baseline that other polls are compared against. We use a weighted average, where the weight is based on the number of polls a firm has released and its pollster rating. This means that the baseline is determined mostly by what the stronger polling firms are saying.³² In 2012, this worked to Democrats’ benefit — the higher-rated polling firms tended to show stronger results for them — but we’ve observed no such consistent pattern this year.

The trend line adjustment is an important part of the FiveThirtyEight presidential model, but not so important in the Senate model. In forecasting the presidential race, you can make accurate inferences about how the polls are changing in one state based on how they’re changing in other states. For instance, if Barack Obama had gained several points relative to Mitt Romney in both Michigan and Minnesota, you could be almost certain that he’d also gained ground in Wisconsin even if Wisconsin hadn’t been polled recently.

In Senate elections, however, there are different candidates on the ballot in each state — so the inferences are much weaker. Instead, the FiveThirtyEight trend line adjustment is calculated solely based on generic congressional ballot polls. It works by looking for changes in the generic congressional ballot as tracked by the same polling firms over the same sample populations — for instance, Quinnipiac polls of registered voters — and then backing out a time trend by means of a lowess smoothing regression. See here for more detail.

The trend line adjustment currently detects some Republican movement on the generic ballot. However, the adjustment is applied conservatively in the Senate model.³³ It currently shifts the polling average in each state toward Republicans by an average of only 0.2 percentage points.

Let’s interrupt here to draw some probability distributions. One narrative holds that there are big differences between those Senate models that look only at polls and those that look at polls along with other factors. But it’s more complicated than that. In recent days, running our model based on the adjusted polling average alone (after Step 2) would reduce the GOP’s chances of controlling the Senate by about 5 percent — it doesn’t make a huge difference.

What’s a more complete story? Small differences matter this year because both individual states and the overall Senate race are so close. Most election models (including ours) work in something like the following way: First, they calculate the most likely outcome in a particular state (“The Republican wins by 1 point”) and then they determine the degree of uncertainty around that estimate. Most models do this by means of a normal distribution or something similar to it. In this type of statistical distribution, all outcomes within the margin of error are not equally likely; instead, those closer to the mean of the distribution are more probable.

The graphic below, for example, illustrates a normal distribution with a mean of +1 (as in, a candidate is ahead by 1 point in the polls) and a standard deviation of 5. In this example, we’ll take positive values to mean the Republican wins the race and negative values to mean the Democrat does. According to the normal distribution, the Republican will win 58 percent of the time.

But if we shift the center of the distribution by just 1 point toward the Republican — say, our model averages the polls together a little differently than someone else’s, and it projects her to win by 2 points instead of 1 — it has a noticeable effect on the probabilities. Not huge, but noticeable: She’s gone from being a 58 percent favorite to a 66 percent favorite.

By contrast, if we’d given the Republican an additional point when she was already well ahead, it wouldn’t make much difference. If she were up by 10 points in the polls, for instance, she’d already be a 97.7 percent favorite according to the normal distribution; putting her up by 11 points instead would only increase that chance to 98.6 percent.

However, there’s another way we can affect the candidate’s win probability: by changing the standard deviation. In the example below, I’ve kept the Republican’s lead at 2 points. But I’ve reduced the standard deviation to 2 points instead of 5. Now, with that mere 2-point lead she’s suddenly an 84 percent favorite to win.

In my view, far too little attention is paid to those questions. What is the uncertainty in the forecast, as opposed to the most likely result?

I don’t like to call out other forecasters by name unless I have something positive to say about them — and we think most of the other models out there are pretty great. But one is in so much perceived disagreement with FiveThirtyEight’s that it requires some attention. That’s the model put together by Sam Wang, an associate professor of molecular biology at Princeton.

That model is wrong — not necessarily because it shows Democrats ahead (ours barely shows any Republican advantage), but because it substantially underestimates the uncertainty associated with polling averages and thereby overestimates the win probabilities for candidates with small leads in the polls. This is because instead of estimating the uncertainty empirically — that is, by looking at how accurate polls or polling averages have been in the past — Wang makes several assumptions about how polls behave that don’t check out against the data.³⁴

There’s a rich record of those assumptions failing and resulting in highly overconfident forecasts. In 2010, for example, Wang’s model made Sharron Angle the favorite in Nevada against Harry Reid; it estimated she was 2 points ahead in the polls, but with a standard error of just 0.5 points. If we drew a graphic based on Wang’s forecast like the ones we drew above,³⁵ it would have Angle winning the race 99.997 percent of the time, meaning that Reid’s victory was about a 30,000-to-1 long shot. To be clear, the FiveThirtyEight model had Angle favored also, but it provided for much more uncertainty. Reid’s win came as a 5-to-1 underdog in our model instead of a 30,000-to-1 underdog in Wang’s; those are very different forecasts.

There are a number of other examples like this. Wang projected a Republican gain of 51 seats in the House in 2010, but with a margin of error of just plus or minus two seats. His forecast implied that odds against Republicans picking up at least 63 seats (as they actually did) were trillions-and-trillions-to-1 against.³⁶ If you want a “polls only” model that estimates the uncertainty more rigorously, I’d recommend The Huffington Post’s or Drew Linzer’s.

I wanted to get that out of the way before proceeding to the state fundamentals calculation, which is one of the more complicated and “controversial” parts of the FiveThirtyEight model — but also one that ultimately doesn’t have that much influence on the forecast.

Step 3: Calculate state fundamentals

In presidential elections, as I mentioned earlier, you can take advantage of the fact that the same two candidates are on the ballot in each state. This makes it much easier to make comparisons from one state to the next.³⁷ That isn’t true for Senate races, where the state fundamentals are a rough guide; they miss the final margin in the race by an average of something like 9 percentage points.

Why bother at all? One reason is that you sometimes have no alternative; the occasional Senate race gets literally no polling. Or it gets very limited polling.³⁸ In states like Alaska and Kansas this year, we have little idea of what’s going on from the polls alone. It helps to have some backstop, like knowing that both states are extremely Republican-leaning.

Another reason to look beyond polls is to prevent abrupt shifts in the forecast. For instance, the recent strong polling for Republicans in Kentucky or for Democrats in Michigan put those races more in line with how our fundamentals calculation has them.

In any event, the state fundamentals estimate is based on a series of non-polling indicators that have historically shown some predictive power in Senate races; their relative importance is determined by regression analysis. The indicators are as follows:

The generic congressional ballot. This provides an indication of the overall partisan mood in the country. As of this writing, the FiveThirtyEight model has the generic ballot favoring the Republicans by about 3 percentage points.³⁹

Congressional approval ratings.⁴⁰ This is the other national indicator. It doesn’t work toward the benefit of either party — instead, it informs the model about the overall amount of antipathy toward incumbents regardless of their party. Right now, congressional approval ratings remain near their historic lows, which mitigates some of the incumbency advantage.⁴¹

Fundraising totals. Fundraising data is a useful indicator for a number of reasons: It can reflect the grassroots support for a candidate, or a candidate’s overall level of organization — and money can be exchanged for goods and services like advertisements and a better turnout operation. Our model specifies this variable as the proportion of funds raised by each major-party candidate. For instance, if the Democrat has raised $3 million and the Republican has raised $1 million, the Democrat has raised 75 percent of the money. This definition accounts for the diminishing returns associated with additional fundraising.⁴² The FiveThirtyEight model looks only at the sum of individual public contributions — as opposed to funds raised through PACs or “Super PACs,” funds donated by the parties, or funds contributed by the candidates themselves. So far this year, this is one of the reasons for Democrats to be optimistic — they’ve outraised Republicans by our definition in almost all of the most important Senate races.

Highest elected office held. This is among the less important variables⁴³ but it has some influence. We rate candidates on a 4-point scale based on the highest office they’ve been elected to:

3 points for current or former governors or senators — by definition including all elected incumbent senators⁴⁴;
2 points for members of the House of Representatives, candidates holding statewide elected office (like state attorneys general and lieutenant governors) and mayors of large cities⁴⁵;
1 point for other nontrivial elected offices, such as state senator or state representative;
0 points for candidates who have never been elected to any substantive position.⁴⁶

Margin of victory in most recent Senate election. This variable applies to elected incumbents only.⁴⁷ Past victory margin is not a terribly reliable indicator — the political mood can shift a lot in six years — but it does tell you something. Victory margins are adjusted relative to the national climate⁴⁸ in the re-election year. That hurts this year’s crop of Democratic incumbents, since most of them were last elected in 2008, a high-water mark for the party. For instance, Sen. Kay Hagan of North Carolina won her race by an impressive 8.5 percentage points against Elizabeth Dole in 2008 — but that came in an environment when Democrats won the national popular vote for the U.S. House by 10.6 percentage points. That implies Hagan might not have won her election in a neutral political environment. Candidates who did not face major-party opposition in their last re-election bid, such as Mark Pryor of Arkansas, are treated as having won re-election by about 40 percentage points.

Candidate ideology and state partisanship. You can think of these as two variables or as one — the FiveThirtyEight model links them together. It estimates the conservative-liberal ideology of a candidate and then compares it against the estimated ideology of voters in the state. The larger the difference between them, the worse the candidate is expected to perform.

The candidate ideology scores are based on an unweighted average of three systems,⁴⁹ which are described at more length here:

DW-Nominate scores, which reflect a candidate’s voting record in Congress;
CFscores — created by Adam Bonica of Stanford University — which estimate left-right ideology based on the identity of a candidate’s donors;
OnTheIssues.org scores, which reflect public statements made by the candidate on a series of policy issues ranging from gay marriage to tax policy.⁵⁰

The score from each system is normalized such that each has the same average and standard deviation — this allows for a direct comparison among them.

We in turn estimate the ideology of voters in each state based on two variables:

Presidential results relative to the national average in 2012 and 2008;
The winners of recent past congressional races in the state — as measured by the average DW-Nominate score of the state’s congressional delegation over the past four Congresses.⁵¹ This helps to account for states — for example, Arkansas — that vote very Republican for president but sometimes still elect Democrats to Congress.

This variable can make some difference. In a purple state that votes exactly in line with the national average, a “mainstream” Republican⁵² would be expected to perform a net of 4 percentage points better than a more conservative, so-called tea party Republican.⁵³ This variable, for instance, helped to predict Sen. Claire McCaskill’s victory over the conservative Republican Todd Akin in Missouri in 2012. However, the Republican nominees this year are more moderate.

Among the more important Senate races, the state fundamentals estimate slightly hurts Democrats in Alaska, Kentucky, Louisiana, Minnesota and North Carolina, and slightly helps them in Arkansas, Georgia and Iowa. Its most important effect is in Kansas, where a center-left independent candidate, Greg Orman, is polling slightly ahead of the Republican incumbent, Pat Roberts, but where the fundamentals calculation has Roberts as a heavy favorite. However, the polling average and the fundamentals calculation have some tendency to converge toward one another, as has already happened in some states, such as Michigan.⁵⁴

Step 4: Now-cast/snapshot

This part is pretty simple. The adjusted polling average (Step 2) and the state fundamentals estimate (Step 3) are combined into a single number that projects what would happen in an election held today. We’ve sometimes referred to this as the “now-cast” or “snapshot.”

This works by treating the state fundamentals estimate as equivalent to a “poll” with a weight of 0.35. What does that mean? Our poll weights are designed such that a 600-voter poll from a firm with an average pollster rating gets a weight of 1.00 (on the day of its release⁵⁵; this weight will decline as the poll ages). Only the lowest-rated pollsters will have a weight as low as 0.35. So the state fundamentals estimate is treated as tantamount to a single bad (though recent) poll. This differs from the presidential model, where the state fundamentals estimate is more reliable and gets a considerably heavier weight.

In states with abundant recent polling, the state fundamentals calculation makes almost no difference. As of this writing, for instance, it gets only 6 percent of the overall weight in Kentucky and 7 percent in Iowa. However, it has more influence on states with less polling; the state fundamentals currently get 15 percent of the weight in Alaska, for example, and 23 percent in Delaware. As the election approaches, the state fundamentals tend to get less and less weight because the volume of polling increases.⁵⁶

Step 5: Election Day forecast

The FiveThirtyEight model is explicitly meant to be a forecast of how the election will turn out on Nov. 4, 2014⁵⁷ — rather than an estimate of what would happen in an election held today.

Looking toward the future means there’s more uncertainty in the forecast — we’ll discuss that in Step 6. In addition, the model might anticipate an overall shift toward one party or another, although this has only a minor influence on the forecast at this point in the race.

In our presidential model, we calculate a projection of the national popular vote based on an economic index. This estimate also reflects the benefit of incumbency.⁵⁸ You might think of this as an estimate of the “national fundamentals” (as opposed to the state fundamentals). In 2012, it suggested that Obama would win the national popular vote by about 2 percentage points.⁵⁹ Whenever Obama established a polling lead of more than 2 percentage points, such as after the Democratic convention, this calculation subtracted something from his lead in the polls to forecast the Election Day result. And whenever Obama’s lead fell below 2 points, such after the first presidential debate in Denver, it added something to it. Put more technically, the model assumed that Obama’s result would revert toward the mean of how past incumbents had performed under similar economic conditions.

However, the model was designed such that the weight placed on the national fundamentals declined over the course of the campaign — until it was zero by Election Day. If the polls had shown Obama with an 11-point lead by Election Day, or Romney with a 9-point lead, that’s what the model would have shown — even though such a result would have been inconsistent with how economic conditions had affected presidential races in the past.

We’ve introduced a similar step into our Senate model this year in order to make it more consistent with our presidential model. However, the Senate version is quite a bit simpler and has less overall influence on the forecast. Specifically, the model assumes the generic congressional ballot will revert toward a mean of favoring the opposition party — in this case, Republicans — by about 5 percentage points as of Election Day, which is the ballot’s historic average performance in midterm elections since 1990.⁶⁰ It does not account for economic conditions, presidential approval ratings, the favorability of the parties or any other factor.

As in the presidential model, however, the weight placed on this historic average decreases as the election year goes on — it’s already quite small, and it will be zero by Election Day. Furthermore, the Democrats’ position on the generic ballot has already declined to show a deficit with Republicans of about 3 percentage points, not much different from the long-term average.⁶¹ For these reasons, Step 5 barely changes the results — currently, it shows Republicans doing only about 0.2 percentage points better in each state than they would otherwise.⁶² It would make more difference earlier in the election year, or if the generic ballot were way out of line with historical trends.

Step 6: Estimate margin of error

If you’ve gotten this far, you’ll know that I think this step is important. As I’ve said, the goal of the FiveThirtyEight model is not to “call” races but instead to estimate the probability that each candidate will win. And some races are associated with more uncertainty than others.

This is nicely illustrated by our interactive Senate forecast — the picture below reflects how it looked as of Tuesday evening. The dots in the chart represent the most likely outcomes. The gray bars indicate the uncertainty (more precisely, the 90 percent prediction interval). In Colorado, for instance, while the most likely outcome is a win for the Democratic candidate by about 3 percentage points, the prediction interval runs all the way from a 12-point Democratic win to a 6-point win for the Republican. That’s already a fairly wide range — and remember, it captures only 90 percent of the cases. There’s also a 5 percent chance that the Democrat will win by more than 12 points and a 5 percent chance that the Republican will win by more than 6.

However, the prediction interval is even wider in some states, especially Kansas, Alaska and Louisiana. How are these intervals determined?

We’ve done it by looking at which factors are historically correlated with larger errors in the forecast⁶³ — and we’ve identified six important ones. Most of these ought to be pretty intuitive.

Uncertainty is larger the when there are more days to go until the election.⁶⁴ This means the model will become more confident as Election Day approaches.
Uncertainty is larger when there are fewer polls. This year’s election has featured fewer polls than elections in the recent past (and also lower-quality polls⁶⁵), which makes the uncertainty higher than it was in 2008, 2010 or 2012. Note that the uncertainty is estimated on a state-by-state basis, so relatively well-polled states like Georgia are associated with less uncertainty than thinly polled ones like Alaska.
Uncertainty is larger when the polls disagree more with one another. Take one state where two polls have the Democrat ahead by 5 points and another where the Democrat is tied with the Republican in one poll but 10 points ahead in another. The Democrat has a 5-point lead in the polling average in both states. But there is considerably more uncertainty, we’ve found, in the state where the polls disagree with one another.
Uncertainty is larger when the polling average disagrees more with the state fundamentals. Another reason for calculating the “state fundamentals” estimate — however much weight you place on it — is to get a sense for whether it tells a consistent story with the polls. We’ve found that in states where there is more divergence between polls and fundamentals — as in Kansas this year — the uncertainty is much higher.
Uncertainty is larger when there are more undecideds or third-party voters in the polls. This is another intuitive assumption that checks out in the data. A lot of Senate races this year feature high numbers of undecided voters, something that contributes to high uncertainty about their outcomes. Races with viable third-party candidates are also associated with very high volatility.
Uncertainty is larger when the race is more lopsided. This is the one counterintuitive-seeming finding; isn’t the outcome more in doubt when the polls show a close race? Of course it is — if you’re concerned about who will win. But as measured by the difference between the polled and the actual margin in the race, the error tends to be larger when there is a bigger gap separating the candidates. It’s fairly common, for instance, for a candidate up by 40 points in the polls to win her race by 30 points or 50 points instead.

Step 7: Simulate outcomes and estimate the probability of each party controlling the Senate

Once we’ve completed Step 6, we’ve calculated what amounts to a mean (“The Republican is ahead by 2 points”) and a standard deviation (“plus or minus 5 points”) for the forecast in each state. As I mentioned, you can use a normal distribution to calculate a candidate’s win probability from these two factors alone. So why not just stop there?

One minor reason is because the FiveThirtyEight model does not quite use a normal distribution; instead it uses a transformation of the normal distribution with slightly fatter tails.⁶⁶ The transformation gives extreme long-shot candidates slightly shorter odds; it might mean, for example, that we would have a candidate with a 0.5 percent chance to win his race instead of a 0.05 percent chance. But this process is not complicated and makes little difference. Instead, we run simulations to deal with a couple of more important problems.

One is that the error in the polls is not independent from state to state. In a number of recent elections, one party has either gained considerable ground in the closing stages of the race (as Democrats did in 2006), or the polls have had a strong overall bias toward one party or another on Election Day itself (as in 1994, 1998 and 2012). This property is not as pronounced as in presidential races, where the same two candidates are on the ballot in each state. But it happens often enough to worry about.

As I mentioned in Step 6, the model estimates the overall amount of error in each state based on the number of days until the election, the volume of polling there, the number of undecided voters and other factors. Before running the simulations, the model breaks the error down into two subcomponents: national error and state-specific error.⁶⁷ National error⁶⁸ affects every state in the same way; state-specific error, as its name implies, affects one state at a time.

In each simulation, the program draws a series of random numbers.⁶⁹ The first number it draws represents national error; in one simulation, for instance, the draw might be “Republicans +2.” This means that Republicans will outperform their forecasts by 2 percentage points in every state in that simulation. Then it draws another number in each state. Perhaps in Arkansas, for instance, it comes up with “Democrats +5.”

These numbers are then added together to produce a simulated result in each state. In Arkansas in this example, it would mean the Democrat, Mark Pryor, outperformed his projection by 3 percentage points (despite Democrats having a poor night nationally). If Pryor had trailed his opponent by fewer than 3 points (or led by any margin), that would be enough to win him the race in that simulation. If he’d been behind more than 3 points, he’d still come up short.⁷⁰

In this way, we can estimate not only each candidate’s chance of winning but also the overall number of seats each party will control, specifically accounting for the possibility that it could be a year like 1994 or 2012 when almost all of the races broke in the same direction.

The simulation is also helpful for handling viable third-party candidates such as Larry Pressler of South Dakota⁷¹ who present a couple of additional challenges.

One is that the range of outcomes for third-party candidates is not symmetric. A third-party candidate polling at 15 percent with some time to go in the race has some chance (not a lot) of gaining 20 points and finishing at 35 percent, in which case he could win. (As I’ve mentioned; vote shares for third-party candidates can be volatile.⁷²) But he has no chance of losing 20 points and finishing at -5 percent.⁷³ So we model the vote shares for third-party candidates based on a log-normal distribution, which accounts for this type of asymmetry.

Also, third-party candidates are often closer to one of the major-party candidates ideologically and therefore are more likely to “trade” votes with that candidate. In the Maine gubernatorial election in 2010, for example, the independent, Eliot Cutler, was left of center and much closer to the Democrat, Libby Mitchell, than to the conservative Republican candidate, Paul LePage. When Cutler suddenly began to gain ground late in the race, almost all of his votes were taken from Mitchell rather than from LePage. The model accounts for this by using the ideology scores we calculated in Step 4. In South Dakota, for example, Pressler’s vote share (Pressler is a former Republican) is more correlated with the Republican, Mike Rounds, than the Democrat, Rick Weiland. So in those simulations where Pressler does well, he takes more of his votes from Rounds. Likewise, when Pressler does poorly, he gives back most of his votes to Rounds.⁷⁴

The very last step is simply counting up the number of seats won in each simulation, and adding them to the baseline of 34 Democratic seats and 30 Republican seats that are not up for grabs this year.⁷⁵ The model assigns any third-party winners to one of the major parties; see here for more on how we do that. Greg Orman of Kansas has said he’ll caucus with whichever party is “clearly in the majority.” (This produces a kink in our probability distribution.) But based on Orman’s ideology score, he’s assigned a 75 percent probability of caucusing with Democrats in the event his vote would determine majority control.

Finally, we can tally the results across thousands⁷⁶ of simulations to estimate the likelihood of a party finishing with a given number of seats. That produces a probability distribution that looks like this:

Simulations with 50-50 ties are resolved as producing Democratic control because of the tiebreaking vote of Vice President Joe Biden. So we count up the percentage of simulations in which Democrats finished with at least 50 seats; that represents their chances of retaining the Senate. The remaining cases go to the Republicans.

Got any other questions? Just drop me a line. We’ll have more in the coming days, including new pollster ratings and more detail on our forecast page.

CORRECTION (Sept. 17, 9:23 a.m.): An earlier version of this story incorrectly referred to the Congress serving in the years 2007-08 as the 109th Congress. It was the 110th.

CORRECTION (Sept. 17, 11:34 a.m.): An earlier version of a footnote to this story gave the wrong state for where the Republican Bill Cassidy is running for Senate. He is running in Louisiana, not Kentucky.

CORRECTION (Sept. 17, 3:50 p.m.): An earlier version of this article misstated the percentage of cases in which a candidate we favored between 85 percent and 95 percent actually won. The article correctly stated that it was 11 in 12 cases, but that percentage is 92 percent, not 89 percent.

Footnotes

At least when it comes to domains with noisy data like elections forecasting.
I calculate the margin of error based on the difference in means formula as described here.
This is far higher than the 5 percent of the time this is supposed to happen on account of sampling error alone.
This coverage often portrays races as “tossups” when one candidate has a clear advantage.
See my book for more on this.
It’s important to note that these reflect actual, published results; not backtested figures.
The exception was Democrat Heidi Heitkamp of North Dakota, who won her race in 2012 when our model gave her an 8 percent chance of doing so.
As according to the win probability calculator at the website Advanced Football Analytics. The calculation assumes the opposing team has the ball after the ensuing kickoff with first-and-10 at its own 20-yard line.
The exception would be if we caught a bug — and that could happen. We think it’s fair game to fix the program if it wasn’t doing what it was supposed to do in the first place. In four election cycles and counting, we’ve never caught a major problem — but if we do, we’ll fix it and eat the appropriate amount of crow.
The other two banned “pollsters” are Pharos Research Group and TCJ Research. Extensive due diligence on these firms has failed to resolve our concerns about them. We no longer ban polls from firms that are merely bad rather than ethically untoward.
Another requirement is that the poll needs to be published somewhere in the public domain before we include it in the model. Now and then, a pollster will send us results before they’re published for the general public. We appreciate this and we respect embargoes, but the polls won’t be included in the model until they’ve been published elsewhere.
For instance, a poll conducted on behalf of a liberal or conservative PAC or interest group would be included. With that said — I think I’ve included this language every year but never actually applied it — we reserve the right to exclude pollsters if it appears they’re “flooding the zone” with dubious polls to manipulate the polling averages. And we’ll permanently ban any pollster who is later revealed to have released polls at the behest of a candidate or party organization without disclosing those ties.
The formula for a poll’s half-life, as measured in days, is roughly 14 + 0.2 * daysuntil, where daysuntil is the number of days until the election. For instance, with 50 days to go until the election, the half-life for polls is 24 days, so a poll conducted 24 days ago will be given half the weight of one conducted today.
It’s only about one-third as large, in fact.
However, the adjustment is asymmetric. Whereas polls with especially large sample sizes aren’t given that much extra credit by the model, those with especially small sample sizes are punished.
Accuracy ratings are regressed to the mean based on the number of polls a firm has conducted. Pollsters new to the database generally receive a below-average weight, although it depends on whether they pass one or both methodological tests.
A polling firm also qualifies if it’s a member of the National Council of Public Polls or if it releases its raw data to the Roper Center archive.
Some pollsters skirt this definition by communicating with cellphone voters by means other than a phone call, such as through a website that can be accessed on a mobile phone, or by including an “opt-in” sample of cellphone voters rather than calling them at random. We don’t consider such polls to have passed the cellphone standard.
Similarly, registered voter results take priority over polls of all adults. Some pollsters like Public Policy Polling use the ambiguous term “voters” for polls early in the election cycle; we treat these polls as being midway between likely voter and registered voter polls.
Not all independent or third-party candidates are treated as “minor” candidates. If an independent candidate is usually included in polls of the race, and polls in the double digits in several surveys, the model will designate the candidate as viable and his vote share will be projected along with those of the Democrat and the Republican. So far this year, the third-party candidates we’ve designated as viable are Greg Orman of Kansas, Larry Pressler of South Dakota, and Sean Haugh of North Carolina.
For example, a three-day tracking poll might survey 300 new voters each on Monday, Tuesday and Wednesday and report the results as a 900-voter poll. Then on Thursday, the pollster will survey 300 new voters and report results from Tuesday through Thursday, removing Monday interviews from the sample. The FiveThirtyEight model will treat this new edition of the poll as having a sample size of 300, not 900. The exception is the most recent edition of the tracking poll, which gets its full weight. (Technically, the model starts with the most recent edition of the tracking poll — giving it full weight — and then moves backward giving the other editions partial weight.)
In theory, the adjustment could favor Democrats in a year where they were the party doing better in likely voter surveys.
This is calculated as a weighted average; the weights are determined by the number of polls a firm has released with both likely voter and registered voter results and its pollster rating.
House effects are not quite the same thing as bias; house effects refer to how a poll compares against other polls of the same race, while bias refers to how it compares against the actual results. In 2012, for example, polls from Marist College had a modest Democratic house effect — they were more favorable to Democrats than other surveys of the same states. But they actually wound up having a modest Republican bias, since almost all polling firms in 2012 understated the performance of Democrats, Marist included.
The regression includes a dummy variable for each state (with generic ballot polls treated as their own “state”) and each polling firm. The coefficient associated with the dummy variable for each polling firm is a reflection of its house effect.
The adjustment is calculated after the likely voter and trend line adjustments are applied, so in theory a pollster won’t be credited with a house effect just because it surveys a different sample population or because it surveys the race at a different time.
The buffer zone, in more technical language, is the 90 percent confidence interval associated with the model’s estimate of each firm’s house effect.
The model also applies a house effects adjustment to account for the fact that some polling firms consistently show more undecided voters than others.
In particular, house effects from polls in our pollster ratings database, which covers polls released in the final three weeks of campaigns since 1998.
In calculating the house effects adjustment, we also include internal polls.
Polls from past years are treated as being one-tenth as important as polls from the current year.
In calculating the house effects baseline, the model also excludes firms that do not call cellphones.
It uses a lowess smoothing parameter of .85, which means that the curves it generates are relatively flat rather than trying to capture every bounce in the polls. Furthermore, we’ve found that there isn’t a one-to-one correspondence between movement in the generic ballot and movement of Senate polls of individual states; instead, each 1-point shift in the generic ballot produces about a half-point shift in the average head-to-head Senate survey.
Wang estimates the uncertainty in his state-by-state forecasts by calculating something called the standard error of the median. However, this is a measure of how much a set of polls differ from one another — not how much polls differ from actual election results. In the real world, as I’ve described, there are many elections in which all or almost all polls are biased in the same direction, such as in Nevada in 2010.

Furthermore, Wang’s model assumes that the error in polls is independent from state to state. This is also false: in many elections, the polls have missed in the same direction in most or almost all competitive states. Put more colloquially, Wang’s model assumes that errors in polls “cancel out” when they often do not. He therefore underestimates the chance for the underdog to win because of systematic errors in the polls, better-than-expected turnout, and so forth.
Wang says he calculates win probabilities by converting his median and standard error to a normal distribution, much as in the examples above.
I determine this by calculating a z-score — seeing how many standard deviations the actual result was from Wang’s prediction. In this case, Wang’s forecast was off by 11.76 standard deviations. My statistical software won’t calculate the possibility of an 11.76 standard deviation error, but it should be on the order of one chance in a nonillion (a 1 followed by 30 zeroes).
In fact, in 2012, the state fundamentals calculation in our presidential model did better than our polling average.
Our model is “trained” based on elections going back to 1990, including many for which we have limited or incomplete polling data. That may make the model better equipped to handle years like this one when polling data is fairly sparse.
Generic ballot polls are subject to the same adjustments, such as the likely voter adjustment, as the state polls are.
More precisely, this variable is specified as an interaction between incumbency status and congressional approval ratings. It has no effect in races where there is no incumbent running.
To be clear, the notion of the “anti-incumbent wave” is something of a myth: Partisan forces are normally much stronger than pro- or anti-incumbent forces. But incumbents from both parties have a modestly tougher job getting re-elected at times when the public thinks Congress has performed poorly.
In the example above, say that the Republican raises an additional $2 million while the Democrat brings in nothing more. Then the fundraising proportion is 50:50 — a big difference from the 75:25 advantage to the Democrat. But if the Democrat raises $2 million while the Republican raises nothing, the Democrat’s advantage increases only to 83:17 — not much different from 75:25.
Its impact is only about 2 percentage points per additional “tier” of experience achieved — a maximum of 6 or 7 percentage points for the largest differences.
Appointed incumbents perform much worse than elected incumbents and do not receive credit for those offices. In fact, the FiveThirtyEight model treats races involving appointed incumbents as open-seat races.
We define a large city as one with a population at least half the size of a congressional district. Currently, that means a city population of at least 360,000.
They might have impressive resumes in the private or political sector, but historically such candidates don’t perform well relative to their polling or fundraising totals.
The model also includes a dummy variable for incumbency itself, which has a negative coefficient. Does that mean it’s worse to be an incumbent? Not really. Incumbents usually raise more money than their opponents. They were usually elected or re-elected by a substantial margin. They rate at the top of our 4-point scale for past elected office held. So this negative coefficient serves to compensate for the other variables the model uses, several of which tend to help incumbents. In some circumstances, a party might be worse off nominating an incumbent than a new candidate according to the model, but these cases are rare.
As measured by the aggregate popular vote for the U.S. House.
Not all systems are available for all candidates; we average together whichever are available.
OnTheIssues.org scores treat candidates who have not expressed a clear opinion about an issue as being “moderate” on that issue. We instead remove such issues from the denominator and recalculate the scores.
This year, for example, we’re using the 110th (2007-08) through 113th (2013-14) Congresses. Figures for a state’s delegation are taken relative to the average DW-Nominate score of all members of that Congress.
One with a DW-Nominate score of +.400.
One with a DW-Nominate score of +.600.
Over the final 60 days of the campaign, the polling average moves toward the fundamentals estimate about 60 percent of the time.
More precisely, based on the median field date of the poll. If some time has passed between the poll’s median field date and when we run the model — it almost always has — its weight will already be diminished some.
In 2012, the model used a slightly different method wherein the weight assigned to the state fundamentals calculation varied based on the number of days until the election. However, this method was putting too much weight on the state fundamentals toward the end of the race. We’ve reverted back to making it a constant number, as it was in 2008 and 2010. To be clear, although the absolute weight assigned to the state fundamentals is constant throughout the race, its relative weight tends to decline because there are a larger number of polls toward the end of the race.
Or Dec. 6, 2014, in the case of the probable runoff in Louisiana.
The calculation will show the incumbent as a favorite in a year with an average economy. This applies to elected incumbents only, like Barack Obama in 2012, and not to the incumbent’s party, so the model will not assume that Democrats have any advantage from incumbency in 2016.
The average elected incumbent president wins re-election by more than 2 points. So the fundamentals estimate assumed that the economy would hurt Obama — but not by enough to make him an underdog in the race.
It’s important to note that the tendency of the president’s party to perform poorly at the midterms is not just a recent phenomenon — it has a very long history. So we’re comfortable that this is a “fundamental” characteristic of midterm elections and not just a temporary trend — in the same way that there is a long-term tendency for incumbent presidents to be re-elected.
In addition, as I’ve mentioned previously, we’ve found that each 1-point shift in the generic ballot produces only about a half-point shift in the average head-to-head Senate survey. This is another reason why this step has a limited influence on the forecast. The model does assume that some states are “swingier” than others and respond by greater or lesser amounts to changes in the national environment; see here for more on how we account for this.
For example, the Republican Bill Cassidy is favored to win his race by 3.5 percentage points in Louisiana; he would be ahead by 3.3 percentage points without this step.
This is accomplished by means of a regression analysis on all Senate races since 1990, where the dependent variable is the absolute value of the difference between the model’s forecast outcome and the actual result.
This accounts for some of why Louisiana has an especially wide confidence interval. There, we’re projecting the results of the probable Dec. 6 runoff election rather than the Nov. 4 primary, which means an extra month of campaigning.
Technically, the model calculates this factor based on the cube root of the sum of poll weights in a particular state, and poll weights are affected by our pollster ratings along with other factors. So a state with five highly rated polls will be associated with somewhat less uncertainty than one with five poorly rated ones. The reason we use the cube root is to account for diminishing returns; since errors in polls are correlated, the forecast does not become infinitely more accurate as you add more and more polls to it.
Historically, the error in polls has been slightly kurtotic, loosely meaning that there are more really, really bad polling errors (like this one) than you’d expect if the errors were normally distributed. But this is probably not worth worrying about. It’s more of a concern in cases like multi-candidate primaries, where error distributions are subject to high degrees of skewness and kurtosis.
This decomposition works by means of a sum of squares formula.
National error is calculated based on the overall bias of the FiveThirtyEight forecasts in past Senate elections since 1990. For instance, if in some past year it expected Republicans to win Senate races by an average of 4 points and they won by just 1 point instead, that would represent a 3-point bias.
It draws them from the probability distribution I described above, which resembles a normal distribution, instead of assuming the error is uniformly distributed.
What about an exact tie? That almost never happens, since the random number is drawn with lots of decimal places. But by default, exact ties are resolved for the incumbent party — in this case the Democrats, since they control the Senate.
There are certain cases, like Kansas this year or Connecticut in 2006, where the race is a de facto two-candidate race between a major-party candidate and an independent candidate. In these cases, the model handles the independent candidate by its usual rules, instead of the special logic it applies in true three-way races.
We’ve modeled them by looking at the performance of third-party candidates in Senate and gubernatorial races since 1998.
Of course, this is true for all candidates — election results are bounded between 0 percent and 100 percent. However, our research has found that the error distributions for third-party candidates are asymmetric. Most of the time, these candidates will either lose a little ground in the polls or hold steady. But now and then they’ll gain a lot of ground and become threats to win the race.
As a third-party candidate, you’d rather have your vote be more correlated with the major-party candidate running higher in the polls. That way, every vote you gain may come at his expense, and it’s easier to close the deficit in a hurry.
The model assumes these 64 seats are static; it’s beyond our scope to consider whether some current senators might switch parties after the election.
We typically run 70,000 simulations each day — sometimes slightly more or fewer. These are associated with a small amount of statistical noise — a margin of error of about plus or minus 0.4 percent on the estimate of the chance that each party will control the Senate. So you shouldn’t sweat changes in the forecast to the decimal place.

Nate Silver founded and was the editor in chief of FiveThirtyEight. @natesilver538

Comments

Filed under

Polls (511 posts) Senate (314) 2014 Midterms (167) Methodology (124) 2014 Senate Elections (70) Pollsters (69) Ideology (58) FiveThirtyEight Senate Model (41) Likely Voters (25) FiveThirtyEight (7) Registered Voters (3) Weighting (3)

Principle 1: A good model should be probabilistic, not deterministic.1